Developer(s) | Brian MacWhinney, Leonid Spektor |
---|---|
Stable release | 28-Mar-2021 11:00 / March 28, 2021 [1] |
Operating system | Microsoft Windows, Linux, macOS |
Type | Qualitative data analysis |
License | GPL2 license |
Website | dali |
The CLAN (Computerized Language ANalysis) program is a cross-platform program designed by Brian MacWhinney and written by Leonid Spektor for the purpose of creating and analyzing transcripts in the Child Language Exchange System (CHILDES) database. CLAN is open source software and can be freely downloaded.
From 1984 until 2000, CLAN was used exclusively for the analysis of child language data. However, beginning with the funding of the TalkBank system by the National Science Foundation (NSF) in 2000, the scope of CLAN has broadened. CLAN is now being used to create and analyze a wide variety of corpora in the context of these databanks: CHILDES for child language, [2] AphasiaBank for aphasia, [3] PhonBank for phonology, [4] FluencyBank for fluency disorders, [5] HomeBank for daylong recordings in the home, [6] and SLABank for second language acquisition. [7] The TalkBank website [8] also provides data for seven other spoken language banks dealing with CA (Conversation Analysis), RHD (right hemisphere damage), TBI (traumatic brain injury), LangBank (classical languages), ClassBank (classroom interactions), SamtaleBank (Danish), and BilingBank (bilingualism).
All the data in each of these banks is formatted in the CHAT transcription format which is designed for analysis by CLAN. The CLAN programs include facilities in five different domains:
Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.
In linguistics, a corpus or text corpus is a language resource consisting of a large and structured set of texts. In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
A communication disorder is any disorder that affects an individual's ability to comprehend, detect, or apply language and speech to engage in discourse effectively with others. The delays and disorders can range from simple sound substitution to the inability to understand or use one's native language.
In spoken language analysis, an utterance is the smallest unit of speech. It is a continuous piece of speech beginning and ending with a clear pause. In the case of oral languages, it is generally, but not always, bounded by silence. Utterances do not exist in written language; only their representations do. They can be represented and delineated in written language in many ways.
Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances or preexisting text in another writing system.
Phonetic transcription is the visual representation of speech sounds by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the International Phonetic Alphabet.
Conversation analysis (CA) is an approach to the study of social interaction, embracing both verbal and non-verbal conduct, in situations of everyday life. CA originated as a sociological method, but has since spread to other fields. CA began with a focus on casual conversation, but its methods were subsequently adapted to embrace more task- and institution-centered interactions, such as those occurring in doctors' offices, courts, law enforcement, helplines, educational settings, and the mass media, and focus on nonverbal activity in interaction, including gaze, body movement and gesture. As a consequence, the term 'conversation analysis' has become something of a misnomer, but it has continued as a term for a distinctive and successful approach to the analysis of sociolinguistic interactions. CA and ethnomethodology are sometimes considered one field and referred to as EMCA.
Transcortical motor aphasia (TMoA), also known as commissural dysphasia or white matter dysphasia, results from damage in the anterior superior frontal lobe of the language-dominant hemisphere. This damage is typically due to cerebrovascular accident (CVA). TMoA is generally characterized by reduced speech output, which is a result of dysfunction of the affected region of the brain. The left hemisphere is usually responsible for performing language functions, although left-handed individuals have been shown to perform language functions using either their left or right hemisphere depending on the individual. The anterior frontal lobes of the language-dominant hemisphere are essential for initiating and maintaining speech. Because of this, individuals with TMoA often present with difficulty in speech maintenance and initiation.
Dr. Hermann Moisl is a retired Senior Lecturer and Visiting Fellow in Linguistics at Newcastle University. He was educated at various institutes, including Trinity College Dublin and the University of Oxford.
Victoria Alexandra Fromkin was an American linguist who taught at UCLA. She studied slips of the tongue, mishearing, and other speech errors, which she applied to phonology, the study of how the sounds of a language are organized in the mind.
The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition. Its earliest transcripts date from the 1960s, and it now has contents in 26 languages from 130 different corpora, all of which are publicly available worldwide. Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and directed to the child speech of adults.
Brian James MacWhinney is a Professor of Psychology and Modern Languages at Carnegie Mellon University. He specializes in first and second language acquisition, psycholinguistics, and the neurological bases of language, and he has written and edited several books and over 100 peer-reviewed articles and book chapters on these subjects. MacWhinney is best known for his competition model of language acquisition and for creating the CHILDES and TalkBank corpora. He has also helped to develop a stream of pioneering software programs for creating and running psychological experiments, including PsyScope, an experimental control system for the Macintosh; E-Prime, an experimental control system for the Microsoft Windows platform; and System for Teaching Experimental Psychology (STEP), a database of scripts for facilitating and improving psychological and linguistic research.
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time.
Clinical linguistics is a sub-discipline of applied linguistics involved in the description, analysis, and treatment of language disabilities, especially the application of linguistic theory to the field of Speech-Language Pathology. The study of the linguistic aspect of communication disorders is of relevance to a broader understanding of language and linguistic theory.
A speech corpus is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models. In linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.
TalkBank is a multilingual corpus established in 2002 and currently directed and maintained by Brian MacWhinney. The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It contains sample databases from within several subfields of communication, including first language acquisition, second language acquisition, conversation analysis, classroom discourse, and aphasic language. It uses these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary linguistic materials via networked computers.
ToBI is a set of conventions for transcribing and annotating the prosody of speech. The term "ToBI" is sometimes used to refer to the conventions used for describing American English specifically, which was the first ToBI system, developed by Mary Beckman and Janet Pierrehumbert, among others. Other ToBI systems have been defined for a number of languages; for example, J-ToBI refers to the ToBI conventions for Tokyo Japanese, and an adaptation of ToBI to describe Dutch intonation was developed by Carlos Gussenhoven, and called ToDI. Another variation of ToBI, called IViE, was established in 1998 to enable comparison between several dialects of British English.
EXMARaLDA is a set of free software tools for creating, managing and analyzing spoken language corpora. It consists of a transcription tool, a tool for administering corpus meta data and a tool for doing queries on spoken language corpora. EXMARaLDA is used for doing conversation and discourse analysis, dialectology, phonology and research into first and second language acquisition in children and adults. EXMARaLDA is based on the open standards XML and Unicode and programmed in Java.
Speech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language. This differs from phrase translation, which is where the system only translates a fixed and finite set of phrases that have been manually entered into the system. Speech translation technology enables speakers of different languages to communicate. It thus is of tremendous value for humankind in terms of science, cross-cultural exchange and global business.
ELAN is computer software, a professional tool to manually and semi-automatically annotate and transcribe audio or video recordings. It has a tier-based data model that supports multi-level, multi-participant annotation of time-based media. It is applied in humanities and social sciences research for the purpose of documentation and of qualitative and quantitative analysis. It is distributed as free and open source software under the GNU General Public License, version 3.