Language documentation’s grey literature

Originally published 27th August 2012, republished with updates to links etc., and an adapted re-presentation of user-generated comments. The original blog post was archived by the Internet Archive and retrieved from there on 12th February 2017.

Researchers and communities working within the framework of language documentation (as enunciated by Himmelmann 1998, 2012, Woodbury 2003, 2011, and others) often distinguish two types of outcomes from their work:

  1. published analyses of language structures and/or use and/or papers on the theory and practice of language documentation, with the ‘gold standard’ being publication in refereed journals or with publishers who have a high international reputation. Steps on the way to these publications may include MA/PhD dissertations and conference papers (increasingly made available via online institutional repositories)
  2. a documentary corpus, usually understood as “large multimedia corpora of spoken endangered languages” (as the workshop) on Potentials of Language Documentation: Methods, Analyses, and Utilization described it). There has increasingly been an expectation, or indeed a requirement in the case of work funded by Volkswagen Foundation’s DoBeS programme or ELDP’s grants, to deposit the corpus materials with an established archive, such as ELAR or TLA.

There is, however, often another type of product that I have been seeing increasingly which falls outside these two and belongs to the world of grey (or gray) literature. These are typically products that are community-oriented and locally produced in limited numbers, perhaps in an orthography that is not yet fully established, or distributed specifically as “in progress” outcomes for the speakers and others with whom the researcher has worked. These may lack bibliographic features, even to the point that identification of the author, publication date or publishing body may not be easy. Similarly, they may be laid out and formatted non-professionally, and produced cheaply (e.g. by local photocopying) with less than sturdy binding. There may be only 50 or 100 copies in existence.

Such grey literature may be very important outcomes from documentation work, representing the only products that local communities can afford and readily access (especially when the ‘gold standard’ products are in a language they do not understand and sell for prices well beyond their reach, or the archived corpora are not accessible due to internet access limitations or lack of technical knowledge or software tools). However, often they remain unknown and inaccessible, except to the linguist and those who obtained one of the limited print run (and occasionally us at SOAS when a student, colleague or grantee sends us a spare copy).

I believe there is a place for this grey literature within collections that archives like ELAR and TLA preserve and distribute, and that archive protocol (access and use) systems provide a simple way to recognise the temporary or restricted distribution nature of such materials. It is possible to use, for example, the ELAR S ‘subscriber’ category where potential users must request access and explain why they want to use the materials, thereby giving the depositor (or their delegate) a chance to explain any limitations, such as its ‘in progress’ nature or community-required restrictions. That is, researchers who are making this kind of grey literature should be depositing an electronic version with their archive and assigning it S protocol level. This way we can ensure preservation of important documentation outcomes that are often overlooked by depositors and the funders who support (and evaluate) them.


Himmelmann, Nikolaus P. 1998. Documentary and descriptive linguistics, Linguistics 36: 161-195.

Himmelmann, Nikolaus P. 2012. Linguistic Data Types and the Interface between Language Documentation and Description. Language Documentation & Conservation 6: 187-207

Woodbury, Tony. 2003. Defining documentary linguistics. In Peter K.Austin (ed.) Language Documentation and Description, Vol. 1, 35-51. London: SOAS.

Woodbury, Anthony C. 2011. Language documentation. In Peter K. Austin and Julia Sallabank (eds.), The Cambridge handbook of Endangered Languages, 159-186. Cambridge: Cambridge University Press.

2 Responses to Language documentation’s grey literature

Response from Claire Bowern on 27 August 2012

Yes, definitely. My Pama-Nyungan project has a huge amount of variably-shaded data, from publications by major companies to archives who have said “you are allowed a copy of this but you can’t tell anyone you have it.” AIATSIS has been very helpful with getting access to limited-run publications from language centres (and I now have contacts with several language centres who will sell and post me things) but these contacts have taken many years to acquire.

Response from Peter Austin on 28 August 2012

Thanks for the comment, Claire. Interesting that an archive would allow you to copy materials in its collection and then instruct you that “you can’t tell anyone you have it”. ELAR would never do that!
My Sasak corpus includes another kind of “grey literature” in the form of BA theses produced by third year students in the English Department at the University of Mataram (Unram). Each year several students write theses about their own local dialect of Sasak (recent titles include “Denominal verbs: a case study in Sasak Ngeno-ngene dialect spoken in RT 06 Cepak Lauq Aikmel”, “Sasak intensifiers: a study on Meno-mene dialects in Sakra”, and “Language choice of Sasak children: a case study of Sukaraja Timur Ampenan”). I now have a collection of 23 of these (some as Word files, and others as jpeg photos of hard copies that I took during fieldwork last month). The theses contain original data collected by the students in their home villages, most of which we have no other sources for. The staff at Unram discuss thesis plans with their students but they are heavily overburdened and have to sign off on the final versions of scores of BA theses each year, without really having an opportunity to read and correct them thoroughly. The English in the theses is sometimes less than perfect, and the Sasak examples occasionally mistranscribed, but I have found them to be very valuable sources of information about varieties of Sasak that I have not been able to work on directly myself.

Postscript: Several colleagues and commentators have mentioned that they thought the title of post was an allusion to the erotic novel Fifty Shades of Grey by E. L. James. I was blissfully unaware of the novel’s existence when I wrote the post (though aware, of course, of the much older term ‘grey literature’). Thanks for bringing it to my attention, but, no, I don’t think I’ll add it to my must read list.