CLAN program

Last updated
CLAN (Child Language Analysis)
Developer(s) Brian MacWhinney, Leonid Spektor
Stable release
28-Mar-2021 11:00 / March 28, 2021;12 months ago (2021-03-28) [1]
Operating system Microsoft Windows, Linux, macOS
Type Qualitative data analysis
License GPL2 license
Website dali.talkbank.org/clan/

The CLAN (Computerized Language ANalysis) program is a cross-platform program designed by Brian MacWhinney and written by Leonid Spektor for the purpose of creating and analyzing transcripts in the Child Language Exchange System (CHILDES) database. CLAN is open source software and can be freely downloaded.

Contents

History

From 1984 until 2000, CLAN was used exclusively for the analysis of child language data. However, beginning with the funding of the TalkBank system by the National Science Foundation (NSF) in 2000, the scope of CLAN has broadened. CLAN is now being used to create and analyze a wide variety of corpora in the context of these databanks: CHILDES for child language, [2] AphasiaBank for aphasia, [3] PhonBank for phonology, [4] FluencyBank for fluency disorders, [5] HomeBank for daylong recordings in the home, [6] and SLABank for second language acquisition. [7] The TalkBank website [8] also provides data for seven other spoken language banks dealing with CA (Conversation Analysis), RHD (right hemisphere damage), TBI (traumatic brain injury), LangBank (classical languages), ClassBank (classroom interactions), SamtaleBank (Danish), and BilingBank (bilingualism).

Features

All the data in each of these banks is formatted in the CHAT transcription format which is designed for analysis by CLAN. The CLAN programs include facilities in five different domains:

  1. CLAN includes an editor that focuses on the creation of links between words and utterances in the transcript and segment of the related audio or video media. CLAN provides four methods to facilitate this process. The SoundWalker facility emulates the back and forth actions of a transcribers foot pedal, but using keystrokes. Sonic CHAT provides careful segment link from a waveform. Transcriber mode uses the pressing of a space bar after the completion of an utterance to facilitate transcription, and finally time marks can be edited and produced by hand entry.
  2. CLAN provides all the basic tools of corpus analysis such as key-word and line, concordance, frequency counting, partial regular expression search, and so on.
  3. CLAN provides additional analysis programs for between-speak contingency patterns, utterance and word length, cooccurrence clusters, and so on.
  4. For Qualitative Data Analysis CLAN has program such as GEM for marking special segments of text, Coder's Editor for applying a coding system, CA format for Jeffersonian CA transcription.
  5. To support the use of TalkBank data in the clinical setting, CLAN includes programs like EVAL and KIDEVAL that compare individual subject and groups with a large comparison database in one or more of the TalkBank corpora databases.

Related Research Articles

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.

In linguistics, a corpus or text corpus is a language resource consisting of a large and structured set of texts. In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

A communication disorder is any disorder that affects an individual's ability to comprehend, detect, or apply language and speech to engage in discourse effectively with others. The delays and disorders can range from simple sound substitution to the inability to understand or use one's native language.

Utterance Smallest unit of speech

In spoken language analysis, an utterance is the smallest unit of speech. It is a continuous piece of speech beginning and ending with a clear pause. In the case of oral languages, it is generally, but not always, bounded by silence. Utterances do not exist in written language; only their representations do. They can be represented and delineated in written language in many ways.

Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances or preexisting text in another writing system.

Phonetic transcription is the visual representation of speech sounds by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the International Phonetic Alphabet.

Conversation analysis

Conversation analysis (CA) is an approach to the study of social interaction, embracing both verbal and non-verbal conduct, in situations of everyday life. CA originated as a sociological method, but has since spread to other fields. CA began with a focus on casual conversation, but its methods were subsequently adapted to embrace more task- and institution-centered interactions, such as those occurring in doctors' offices, courts, law enforcement, helplines, educational settings, and the mass media, and focus on nonverbal activity in interaction, including gaze, body movement and gesture. As a consequence, the term 'conversation analysis' has become something of a misnomer, but it has continued as a term for a distinctive and successful approach to the analysis of sociolinguistic interactions. CA and ethnomethodology are sometimes considered one field and referred to as EMCA.

Transcortical motor aphasia (TMoA), also known as commissural dysphasia or white matter dysphasia, results from damage in the anterior superior frontal lobe of the language-dominant hemisphere. This damage is typically due to cerebrovascular accident (CVA). TMoA is generally characterized by reduced speech output, which is a result of dysfunction of the affected region of the brain. The left hemisphere is usually responsible for performing language functions, although left-handed individuals have been shown to perform language functions using either their left or right hemisphere depending on the individual. The anterior frontal lobes of the language-dominant hemisphere are essential for initiating and maintaining speech. Because of this, individuals with TMoA often present with difficulty in speech maintenance and initiation.

Dr. Hermann Moisl is a retired Senior Lecturer and Visiting Fellow in Linguistics at Newcastle University. He was educated at various institutes, including Trinity College Dublin and the University of Oxford.

Victoria Alexandra Fromkin was an American linguist who taught at UCLA. She studied slips of the tongue, mishearing, and other speech errors, which she applied to phonology, the study of how the sounds of a language are organized in the mind.

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition. Its earliest transcripts date from the 1960s, and it now has contents in 26 languages from 130 different corpora, all of which are publicly available worldwide. Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and directed to the child speech of adults.

Brian James MacWhinney is a Professor of Psychology and Modern Languages at Carnegie Mellon University. He specializes in first and second language acquisition, psycholinguistics, and the neurological bases of language, and he has written and edited several books and over 100 peer-reviewed articles and book chapters on these subjects. MacWhinney is best known for his competition model of language acquisition and for creating the CHILDES and TalkBank corpora. He has also helped to develop a stream of pioneering software programs for creating and running psychological experiments, including PsyScope, an experimental control system for the Macintosh; E-Prime, an experimental control system for the Microsoft Windows platform; and System for Teaching Experimental Psychology (STEP), a database of scripts for facilitating and improving psychological and linguistic research.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time.

Clinical linguistics is a sub-discipline of applied linguistics involved in the description, analysis, and treatment of language disabilities, especially the application of linguistic theory to the field of Speech-Language Pathology. The study of the linguistic aspect of communication disorders is of relevance to a broader understanding of language and linguistic theory.

A speech corpus is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models. In linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.

TalkBank is a multilingual corpus established in 2002 and currently directed and maintained by Brian MacWhinney. The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It contains sample databases from within several subfields of communication, including first language acquisition, second language acquisition, conversation analysis, classroom discourse, and aphasic language. It uses these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary linguistic materials via networked computers.

ToBI is a set of conventions for transcribing and annotating the prosody of speech. The term "ToBI" is sometimes used to refer to the conventions used for describing American English specifically, which was the first ToBI system, developed by Mary Beckman and Janet Pierrehumbert, among others. Other ToBI systems have been defined for a number of languages; for example, J-ToBI refers to the ToBI conventions for Tokyo Japanese, and an adaptation of ToBI to describe Dutch intonation was developed by Carlos Gussenhoven, and called ToDI. Another variation of ToBI, called IViE, was established in 1998 to enable comparison between several dialects of British English.

EXMARaLDA is a set of free software tools for creating, managing and analyzing spoken language corpora. It consists of a transcription tool, a tool for administering corpus meta data and a tool for doing queries on spoken language corpora. EXMARaLDA is used for doing conversation and discourse analysis, dialectology, phonology and research into first and second language acquisition in children and adults. EXMARaLDA is based on the open standards XML and Unicode and programmed in Java.

Speech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language. This differs from phrase translation, which is where the system only translates a fixed and finite set of phrases that have been manually entered into the system. Speech translation technology enables speakers of different languages to communicate. It thus is of tremendous value for humankind in terms of science, cross-cultural exchange and global business.

ELAN is computer software, a professional tool to manually and semi-automatically annotate and transcribe audio or video recordings. It has a tier-based data model that supports multi-level, multi-participant annotation of time-based media. It is applied in humanities and social sciences research for the purpose of documentation and of qualitative and quantitative analysis. It is distributed as free and open source software under the GNU General Public License, version 3.

References

  1. "Using CLAN". TalkBank. Retrieved 27 April 2021.
  2. "CHILDES for child language" . Retrieved 27 February 2017.
  3. "AphasiaBank for aphasia" . Retrieved 27 February 2017.
  4. "PhonBank for phonology" . Retrieved 27 February 2017.
  5. "FluencyBank for fluency disorders" . Retrieved 27 February 2017.
  6. "HomeBank for daylong recordings in the home" . Retrieved 27 February 2017.
  7. "SLABank for second language acquisition" . Retrieved 27 February 2017.
  8. "TalkBank website" . Retrieved 27 February 2017.