Documentary Linguistics and Language Graveyards: an evening with Paul Newman

On 18th October 2013 David Nathan, as Director of the Endangered Languages Archive (ELAR), wrote the following response to Paul Newman’s seminar on “The Law of Unintended Consequences: How the Endangered Languages Movement Undermines Field Linguistics as a Scientific Enterprise” that was published on the earlier incarnation of EL Blog. We republish David’s post here (with updates to links etc., and an adapted re-presentation of user-generated comments). The original blog post was archived by the Internet Archive and retrieved from there on 19th April 2015.

Listen to the Paul Newman seminar on SoundCloud
Watch the Paul Newman seminar on YouTube

On Tuesday 15th October a large audience at SOAS was treated to a thoughtful and entertaining lecture by well-known linguist and Distinguished Professor Emeritus Paul Newman, from Indiana University (see a list of his publications  including ‘The endangered languages issue as a hopeless cause’). Paul is known best for his work on African languages and has also been active in recent years in discussion of the goals of linguistics, language documentation and fieldwork. Paul’s talk was provocatively titled: ‘The Law of Unintended Consequences: How the Endangered Languages Movement Undermines Field Linguistics as a Scientific Enterprise’, not least because he was addressing many people associated with the (Hans Rausing) Endangered Languages Project which is, as he described it himself, ‘one of the most prestigious centres’ for advocating  for endangered languages and in particular for a documentary linguistics approach. Documentary linguistics came in for Professor Newman’s special attention; he described it as studying the wrong languages, for the wrong reasons, using the wrong methods.

One of Professor Newman’s charges was that too many people are documenting ‘large numbers of random languages’, driven by ‘bleeding heart’ ideology or the invocations of the usual suspects (Nikolaus Himmelmann, Peter Austin et al). His two suggested antidotes to this situation, however, did not sound that convincing: (a) linguists should go and research languages that they find ‘interesting’ (how much better than ‘random’ is that?), or, in a 180 degree turn, (b) linguists should be ‘sent out’ to study designated languages of strategic scientific interest, in the style of Soviet-era fieldwork expeditions (it’s also ‘interesting’ to note that in earlier, more colonial days of SOAS that is exactly what was done here).

Another of Professor Newman’s challenges was his assertion that documenters collect too many texts – thereby wasting time that they could have spent analysing. In practice it’s hard to see how that could be so, since collecting but not analysing (or transcribing, glossing etc.) texts does not consume that much time, precisely for the reason of there being no analysis. Our colleague Anthony Jukes (who was not at the lecture but read the abstract online) thought the challenge amounted to a ‘zero-sum fallacy’ – i.e. a proposition that if more people do more documentation then there must be fewer people doing ‘real’ linguistics; a proposition that could be proven by showing that until language documentation came along everyone was doing ‘correct’ linguistics.

For ELAR, the chief excitement was when Professor Newman called language archives ‘linguistic graveyards’. There was some validity in his characterising swathes of unannotated recordings as not completely useful – however Peter Austin countered during after-lecture questions that language speakers and communities themselves could be fully appreciative of the careful preservation of recorded performances in their languages, regardless of the absence of linguistic analysis (see also Gary Holton’s recent paper ‘Language archives: They’re not just for linguists any more’). In addition, Lutz Marten described how he found useful data in Malin Petzell’s deposit in ELAR on a phenomenon that the depositor Petzell had not herself been investigating but which incidentally was represented in her archived materials. This was precisely because the data was discoverable online; Lutz also pointed out that the true linguistic graveyards are researchers’ collections of tapes stashed away in their attics (perhaps recorded by the very same researchers praised for thoroughly analysing interesting topics …). Unable to resist the funereal analogy, I challenged being labeled, in Professor Newman’s terms, as a ‘linguistic undertaker’, and noted that archives such as ELAR, DoBeS, the Alaska Native Language Archive, and so many others,  have thousands of users, who make thousands of downloads, and that we can thus demonstrate that we are thoroughly ‘living archives’.

Professor Newman’s lecture was given in a spirit of lively, thoughtful and affectionate provocation, and the large and enthusiastic audience was as entertained as much as we were challenged to think about what we do.

6 Responses to Documentary Linguistics and Language Graveyards: an evening with Paul Newman

Response from David Nathan on 31 October 2013 at 11:11 pm:

There are further reactions to the video of Paul Newman’s lecture over at Reddit.

Response from abbodomo on 28 October, 2013 at 6:48 am:

‘Documentary linguistics’ = ‘linguistic necrology’:
I have three comments on the Newman Language Documentation talk at SOAS in October 2013 from an African perspective. First, note that the organizers have now quietly changed from labeling him as the ‘foremost’ scholar on Hausa to labeling him as a ‘well-known’ linguist after some of us protested. This is important. There is no denying the fact that Paul is a great linguist, but he is not the foremost scholar on the African language, Hausa. From now onwards, in this 21st century, Africans should never sit quietly while Columbus- and Mungo Park-like imperialist westerners appropriate African culture and science to themselves as the foremost scholars on Africa. No matter how much you and me as Africans try we will never be described as the ‘foremost’ on any western cultural structure. Can you imagine an African (or Asian) being described as ‘foremost’ scholar on English, French, German, or whatever European language in a serious way? So ALWAYS ask questions when these cultural imperialists try to appropriate scientific eminence in the study of African languages and cultures to themselves. We are no longer in the 17th or 18th or 19th or 20th centuries; we are in the 21st century! We should never allow cultural appropriation and linguistic imperialism to reign!

Second point, I actually agree with Paul on many aspects of his talk. I don’t think the term and whole idea of ‘documentary linguistics’ is very intellectual and I think it will run out-of-date very fast and die out very quickly. Descriptive and theoretical/explanatory linguistics are still far more intellectual and superior to ‘documentary linguistics’. Not that it is not a worthwhile endeavour, but the main point is that, (i) it fails in its stated goals – you do NOT save a language by documenting it; (ii) its intellectual methodological discovery procedure is not well-founded – you do not do science by just being a hunter-gatherer of data only without analysis. So Newman and Lehman are right in calling these archives ‘linguistic graveyards’. Actually, the African linguist, John Mugane of Harvard University was the first, in my opinion, to refer to what is being done by these western documentary linguists as people doing ‘linguistic necrology’:-)!

Third point, I have what I consider to be a better solution than what Paul proposed as an alternative to ‘documentary linguistics’. Even though I do work on language documentation of African and Asian languages, I never use the term ‘documentary linguistics’, I prefer to use the term ‘language revitalization’, not in the sense of Hinton’s approach to reviving native American language, which is best described as ‘language revivalization’ but in the sense of carrying out a linguistic description and analytic research programme that ultimately involves the increased use of the target language. You save a language by encouraging its use, you do not save it by just documenting it. And in my works on West African languages and Southwestern Chinese languages I have developed a methodology for language description and REVITALIZATION which I call ‘laboratory-to-field experimentation’.

Response from Dafydd Gibbon on 20 October 2013 at 7:24 am:

The issue is a serious one: What are endangered language data collected for, exactly? For linguists? For anthropologists, musicologists, biologists, pharmacists? For archivists? For language communities? For language and speech technology development? For random, maybe as yet unknown purposes? The goals have multiplied since I started in linguistics in the early 1960s, and many more different paradigms and interests are involved. Descriptive linguistics alone is no longer the gold standard.

Incidentally, a random element in the search for data is a Good Thing! This is clear from from the many chance events in the history of scientific discovery: one main point in empirical research, whether about language or about asteroids, is that we do not know exactly what we are going to find.

And it’s no accident, as we know from computer science, that the best sorting, searching and learning algorithms, while being theoretically well-grounded, start with a random seed.

So my impression – albeit only on the basis of blog hearsay and not personal interaction! – is that Newman is serious, but in respect of endangered languages nowhere near ‘the truth, the whole truth, and nothing but the truth’.

Response from Peter Austin on 19 October 2013 at 3:25 pm:

Newman specifically mentioned Christian Lehmann as the originator of the term ‘data cemeteries’ in linguistics (see p86 of Christian Lehmann. 2001. Language documentation. A program. In Walter Bisang (ed.) Aspects of Typology and Universals, 83-97. Berlin: Akademie Verlag, Studia Typologica, 1 [available online here]): ‘There are, in linguistics just as in other sciences, data cemeteries, large amounts of data assembled by people who thought they served a purpose in themselves and not used by other people’ (see also Christian Lehman. 2004. Data in Linguistics. The Linguistic Review 21(3/4), 275-310 [available here]. Lehmann goes on, however, to distinguish clearly between ‘linguistic data collection’ (bad) and ‘linguistic documentation’ (good), stating: ‘The immediate lesson from this is that linguistic documentation does not reduce to linguistic data collection’. One of the main points of Lehmann’s definition of language (or linguistic) documentation (and also of Himmelmann’s and of mine, in our own ways) is that documentary corpora should have the potential of being multi-functional and be designed for use by a diverse range of audiences, not just the ‘scientific linguists’ that Newman so lauded in his talk.

4 thoughts on “Documentary Linguistics and Language Graveyards: an evening with Paul Newman”

  1. Adams Bodomo comments above that “[t]he organizers have now quietly changed from labeling [Paul Newman] as the ‘foremost’ scholar on Hausa to labeling him as a ‘well-known’ linguist after some of us protested. This is important. There is no denying the fact that Paul is a great linguist, but he is not the foremost scholar on the African language, Hausa.” This is a fallacious argumentum ad hominem that does not respond to Newman’s criticism. Furthermore, the “imperialist” argumentation is rather counterproductive in this case: many scholars have already demonstrated the post-colonial ideology lying behind language documentation and revitalization.

  2. There’s been quite a few times I unsuccessfully requested permission for various languages/projects on ELAR, and based on this experience I’ve definitely found some truth to the graveyard idea 🙁 Quite a few researchers deposit the data, set the default permission as “must ask for access”, and then either change their email or never bother checking it again, so the data just sits there. ELAR can’t do much about it, because the linguist is the one who has to give the permission, so if they go off radar, then data -> graveyard. It’s possible that I’ve just gotten unlucky in this respect, and that there are fewer projects of this type than I think. I hope so at any rate.

    Metadata is also another problematic area, in more areas than just the commonly discussed tehcnical ones. One documentary linguist I know of uses four different metadata languages (somewhat randomly) in his data labelling, and sees no problem in this. I see a huge issue. Without having a clear audience in mind, and without actually doing the work to translate the rest (which is admittedly a lot of work), you can’t really solve this. The community? Will only know the target language and one of the metalanguages. Typologists? Will know two metalanguages. Local specialists? Probably will know at least three metadata languages, but there are so few of them that it’s not really worth having your target audience as only a handful of people, and we’re back to a variant of something you critiqued back in 2013, an audience of a handful of specialists. Very very few people will speak all of the languages necessary to truly make use of it. The solution he proposed is just use Google translate (or some other automatic translator), but I can already tell you that will NOT work very well.

    I know of some other projects where all the written data is in one language, but all the translations of the recordings are in another. Again, the community won’t understand the written work, and the researchers probably will not understand the translations, so very few people, again, will be able to use it properly.

    So, while I wouldn’t write off the whole idea, I do think there are some design-related issues that need to be discussed early on, when students begin to learn what language documentation is. Most of the discussion I’ve seen revolves around technical issues, such as formatting, xml structures, file types, etc. I haven’t heard much discussion, for instance, about what language the metadata is in, nor have I seen a solution for the ‘disappearing linguist causes data cemetary’ issues. If the permission is set to “ask”, and the linguist disappears, it unfortunately really doesn’t matter how well the data is annotated, if no one ever gets access to it.

  3. For a discussion of the history of archiving in language documentation from the 19th century to today see Henke, Ryan & Andrea L. Berez-Kroeker. 2016. A Brief History of Archiving in Language Documentation, with an Annotated Bibliography. Language Documentation & Conservation 10, 411-457. The authors note: “in the current period, conversations have arisen toward participatory models for archiving, which break traditional boundaries to expand the audiences and uses for archives while involving speaker communities directly in the archival process”. This paper also includes a detailed bibliography.

Leave a Reply

Your email address will not be published. Required fields are marked *