Language documentation (also: documentary linguistics) is a subfield of linguistics which aims to describe the grammar and use of human languages. It aims to provide a comprehensive record of the linguistic practices characteristic of a given speech community. [1] [2] [3] Language documentation seeks to create as thorough a record as possible of the speech community for both posterity and language revitalization. This record can be public or private depending on the needs of the community and the purpose of the documentation. In practice, language documentation can range from solo linguistic anthropological fieldwork to the creation of vast online archives that contain dozens of different languages, such as FirstVoices or OLAC. [4]
Language documentation provides a firmer foundation for linguistic analysis in that it creates a corpus of materials in the language. The materials in question can range from vocabulary lists and grammar rules to children's books and translated works. These materials can then support claims about the structure of the language and its usage. [5] This should be seen as a basic taxonomic task for linguistics, identifying the range of languages and their characteristics.
Typical steps involve recording, maintaining metadata, transcribing (often using the International Phonetic Alphabet and/or a "practical orthography" made up for that language), annotation and analysis, translation into a language of wider communication, archiving and dissemination. [6] Critical is the creation of good records in the course of doing language description. The materials can be archived, but not all archives are equally adept at handling language materials preserved in varying technological formats, and not all are equally accessible to potential users. [7]
Language documentation complements language description, which aims to describe a language's abstract system of structures and rules in the form of a grammar or dictionary. By practising good documentation in the form of recordings with transcripts and then collections of texts and a dictionary, a linguist works better and can provide materials for use by speakers of the language. New technologies permit better recordings with better descriptions which can be housed in digital archives such as AILLA, Pangloss, or Paradisec. These resources can then be made available to the speakers. The first example of a grammar with a media corpus is Thieberger's grammar of South Efate (2006). [8]
Language documentation has also given birth to new specialized publications, such as the free online and peer-reviewed journal Language Documentation & Conservation and the SOAS working papers Language Documentation & Description .
The digitization of archives is a critical component of language documentation and revitalization projects. [9] There are descriptive records of local languages that could be put to use in language revitalization projects that are overlooked due to obsolete formatting, incomplete hard-copy records, or systematic inaccessibility. Local archives in particular, which may have vital records of the area's indigenous languages, are chronically underfunded and understaffed. [10] Historic records relating to language that have been collected by non-linguists such as missionaries can be overlooked if the collection is not digitized. [11] Physical archives are naturally more vulnerable to damage and information loss. [9]
Language documentation can be beneficial to individuals who would like to teach or learn an endangered language. [12] If a language has limited documentation this also limits how it can be used in a language revitalization context. Teaching with documentation and linguist's field notes can provide more context for those teaching the language and can add information they were not aware of. [12] Documentation can be useful for understanding culture and heritage, as well as learning the language. Important components when teaching a language includes: Listening, reading, speaking, writing, and cultural components. Documentation gives resources to further the skills for learning a language. [12] For example, the Kaurna language was revitalized through written resources. [13] These written documents served as the only resource and were used to re-introduce the language and one way was through teaching, which also included the making of a teaching guide for the Kaurna language. [13] Language documentation and teaching have a relationship because if there are no fluent speakers of a language, documentation can be used as a teaching resource.
Language description, as a task within linguistics, may be divided into separate areas of specialization:
The following outline is provided as an overview and topical guide to linguistics:
In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used by a speech community.
An endangered language or moribund language is a language that is at risk of disappearing as its speakers die out or shift to speaking other languages. Language loss occurs when the language has no more native speakers and becomes a "dead language". If no one can speak the language at all, it becomes an "extinct language". A dead language may still be studied through recordings or writings, but it is still dead or extinct unless there are fluent speakers. Although languages have always become extinct throughout human history, they are currently dying at an accelerated rate because of globalization, mass migration, cultural replacement, imperialism, neocolonialism and linguicide.
Language revitalization, also referred to as language revival or reversing language shift, is an attempt to halt or reverse the decline of a language or to revive an extinct one. Those involved can include linguists, cultural or community groups, or governments. Some argue for a distinction between language revival and language revitalization. There has only been one successful instance of a complete language revival, the Hebrew language, creating a new generation of native speakers without any pre-existing native speakers as a model.
The Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) is a digital archive of records of some of the many small cultures and languages of the world. They digitise reel-to-reel field tapes, have a mass data store and use international standards for metadata description. PARADISEC is part of the worldwide community of language archives. PARADISEC's main motivation is to ensure that unique recordings of small languages are preserved for the future, and that researchers consider the future accessibility of their materials for other researchers, community members, or anyone who has an interest in such materials.
Lyle Richard Campbell is an American scholar and linguist known for his studies of indigenous American languages, especially those of Central America, and on historical linguistics in general. Campbell is professor emeritus of linguistics at the University of Hawaiʻi at Mānoa.
LACITO is a multidisciplinary research organisation, principally devoted to the study of cultures and languages of oral tradition.
The Pangloss Collection is a digital library whose objective is to store and facilitate access to audio recordings in endangered languages of the world. Developed by the LACITO centre of CNRS in Paris, the collection provides free online access to documents of connected, spontaneous speech, in otherwise little-documented languages of all continents.
Internet linguistics is a domain of linguistics advocated by the English linguist David Crystal. It studies new language styles and forms that have arisen under the influence of the Internet and of other new media, such as Short Message Service (SMS) text messaging. Since the beginning of human–computer interaction (HCI) leading to computer-mediated communication (CMC) and Internet-mediated communication (IMC), experts, such as Gretchen McCulloch have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. Such study aims to benefit both linguists and web users combined.
Khamyang is a critically endangered Tai language of India, spoken by the Khamyang people. Approximately fifty people speak the language; all reside in the village of Powaimukh, located seven miles downstream of Margherita in the Tinsukia district. It is closely related to the other Tai languages in the Assam region: Aiton, Khamti, Phake, and Turung.
Linguistics is the scientific study of language. The modern-day scientific study of linguistics takes all aspects of language into account — i.e., the cognitive, the social, the cultural, the psychological, the environmental, the biological, the literary, the grammatical, the paleographical, and the structural.
Kaipuleohone is a digital ethnographic archive that houses audio and visual files, photographs, as well as hundreds of textual material such as notes, dictionaries, and transcriptions relating to small and endangered languages. The archive is stored in the ScholarSpace repository of the University of Hawai‘i at Mānoa and maintained by the Department of Linguistics of the University's College of Languages, Linguistics and Literature. Kaipuleohone was established by Nick Thieberger in 2008. It is a member of the Digital Endangered Languages and Musics Archiving Network (DELAMAN). The term kaipuleohone means 'gourd of sweet words' and symbolizes the impression of an accumulation of language material.
Nicholas Thieberger is an Australian linguist and an Associate Professor in the School of Languages and Linguistics at the University of Melbourne. He helped to establish the PARADISEC archive in 2003 and currently serves as its Director. Thieberger was the Editor of Language Documentation & Conservation (2011-2021), an academic journal which focuses on language documentation and conservation. He was elected a Fellow of the Australian Academy of the Humanities in 2021.
Margaret Florey is an Australian linguist whose work focuses on the revitalization and maintenance of Indigenous Australian languages. She has documented changes in contemporary speech, such as the expression Yeah, no which is becoming more prevalent in Australia.
Shobhana Chelliah is an Indian-American linguist who specializes in Sino-Tibetan languages. She is Distinguished Professor of Linguistics and Associate Dean of Research and Advancement at the College of Information, University of North Texas. Her research focuses on the documentation of the Tibeto-Burman languages of Northeast India.
The Institute on Collaborative Language Research or CoLang is a biennial training institute in language documentation for any person interested in community-based, collaborative language work. CoLang has been described as part of a modern collaborative model in community-based methodologies of language revitalization and documentation.
Canadian Indigenous Languages and Literacy Development Institute (CILLDI) - an intensive annual "summer school for Indigenous language activists, speakers, linguists, and teachers" - hosted at the University of Alberta, Edmonton - is a "multicultural, cross-linguistic, interdisciplinary, inter-regional, inter-generational" initiative. CILLDI was established in 1999 with one Cree language course offered by Cree speaker Donna Paskemin. By 2016 over 600 CILLDI students representing nearly 30 Canadian Indigenous languages had participated in the program and it had become the "most national of similar language revitalization programs in Canada aimed at the promotion of First Peoples languages." CILLDI - a joint venture between the University of Alberta and the University of Saskatchewan - responds to "different sociolinguistic situations in language communities under threat" and includes three faculties at the University of Alberta in Edmonton - Arts, Education, and Native Studies. CILLDI provides practical training to students which is "directly implemented back in the community." Initiatives like CILLDI were formed against the backdrop of a projection of a catastrophic and rapid decline of languages in the twenty-first century.
Within the linguistic study of endangered languages, sociolinguists distinguish between different speaker types based on the type of competence they have acquired of the endangered language. Often when a community is gradually shifting away from an endangered language to a majority language, not all speakers acquire full linguistic competence; instead, speakers have varying degrees and types of competence depending on their exposure to the minority language in their upbringing. The relevance of speaker types in cases of language shift was first noted by Nancy Dorian, who coined the term semi-speaker to refer to those speakers of Sutherland Gaelic who were predominantly English-speaking and whose Gaelic competence was limited and showed considerable influence from English. Later studies added additional speaker types such as rememberers, and passive speakers. In the context of language revitalization, new speakers who have learned the endangered language as a second language are sometimes distinguished.
The Endangered Languages Project (ELP) is a worldwide collaboration between indigenous language organizations, linguists, institutions of higher education, and key industry partners to strengthen endangered languages. The foundation of the project is a website, which launched in June 2012.
Andrea Berez-Kroeker is a documentary linguist and professor in the Department of Linguistics at the University of Hawaiʻi at Mānoa. She is the director of the Kaipuleohone archive of endangered languages. She is an expert on the practices of reproducibility and management of data in the field of linguistics.