CHILDES

Last updated

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 [1] by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition. [2] [1] Its earliest transcripts date from the 1960s, and as of 2015 has contents (transcripts, audio, and video) in 26 languages from 230 different corpora, all of which are publicly available worldwide. [3] Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and directed to the child speech of adults. [4]

Contents

During the early 1990s, as computational resources capable of easily manipulating the data volumes found in CHILDES became commonly available, there was a significant increase in the number of studies of child language acquisition that made use of it. [1] CHILDES is currently directed and maintained by Brian MacWhinney at Carnegie Mellon University.

Database Format

There are a variety of languages and ages represented in the CHILDES transcripts. The majority of the transcripts are from spontaneous interactions and conversations. [3] [5] The transcriptions are coded in the CHAT (Codes for the Human Analysis of Transcripts) transcription format, which provides a standardized format for producing conversational transcripts. [6] This system can be used to transcribe conversations with any type of language learner: children, second-language learners, and recovering aphasics. In addition to discourse level transcription, the CHAT system also has options for phonological and morphological analysis. The CLAN program was developed by Leonid Spektor and aids in transcription and analysis of the child language data. [7]

Use in Research

To date, over 4500 published studies cite CHILDES. [8] CHILDES reports this number in their manuals [9] and Google Scholar contains 5451 citations as of July 2017. [10]

Related Research Articles

Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language, as well as to produce and use words and sentences to communicate.

A second language (L2) is a language spoken in addition to one's first language (L1). A second language may be a neighbouring language, another language of the speaker's home country, or a foreign language. A speaker's dominant language, which is the language a speaker uses most or is most comfortable with, is not necessarily the speaker's first language. For example, the Canadian census defines first language for its purposes as "the first language learned in childhood and still spoken", recognizing that for some, the earliest language may be lost, a process known as language attrition. This can happen when young children start school or move to a new language environment.

Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances or preexisting text in another writing system.

<span class="mw-page-title-main">Conversation analysis</span> Approach to the study of social interaction

Conversation analysis (CA) is an approach to the study of social interaction that empirically investigates the mechanisms humans achieve mutual understanding with. It focuses on both verbal and non-verbal conduct, especially in situations of everyday life. CA originated as a sociological method, but has since spread to other fields. CA began with a focus on casual conversation, but its methods were subsequently adapted to embrace more task- and institution-centered interactions, such as those occurring in doctors' offices, courts, law enforcement, helplines, educational settings, and the mass media, and focus on multimodal and nonverbal activity in interaction, including gaze, body movement and gesture. As a consequence, the term conversation analysis has become something of a misnomer, but it has continued as a term for a distinctive and successful approach to the analysis of interactions. CA and ethnomethodology are sometimes considered one field and referred to as EMCA.

Elizabeth Ann Bates was a professor of cognitive science at the University of California, San Diego. She was an internationally renowned expert and leading researcher in child language acquisition, psycholinguistics, aphasia, and the neurological bases of language, and she authored 10 books and over 200 peer-reviewed articles and book chapters on these subjects. Bates was well known for her assertion that linguistic knowledge is distributed throughout the brain and is subserved by general cognitive and neurological processes.

Second-language acquisition (SLA), sometimes called second-language learning — otherwise referred to as L2acquisition, is the process by which people learn a second language. Second-language acquisition is also the scientific discipline devoted to studying that process. The field of second-language acquisition is regarded by some but not everybody as a sub-discipline of applied linguistics but also receives research attention from a variety of other disciplines, such as psychology and education.

Catherine Elizabeth Snow is an educational psychologist and applied linguist. In 2009 Snow was appointed to the Patricia Albjerg Graham Professorship in the Harvard Graduate School of Education, having previously held the Henry Lee Shattuck Professorship also in the Harvard Graduate School of Education. Snow is past president of the American Educational Research Association (2000-2001). She chaired the RAND Corporation 'reading study group' from 1999.

Brian James MacWhinney is a Professor of Psychology and Modern Languages at Carnegie Mellon University. He specializes in first and second language acquisition, psycholinguistics, and the neurological bases of language, and he has written and edited several books and over 100 peer-reviewed articles and book chapters on these subjects. MacWhinney is best known for his competition model of language acquisition and for creating the CHILDES and TalkBank corpora. He has also helped to develop a stream of pioneering software programs for creating and running psychological experiments, including PsyScope, an experimental control system for the Macintosh; E-Prime, an experimental control system for the Microsoft Windows platform; and System for Teaching Experimental Psychology (STEP), a database of scripts for facilitating and improving psychological and linguistic research.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistic for analysis of corpora

TalkBank is a multilingual corpus established in 2002 and currently directed and maintained by Brian MacWhinney. The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It contains sample databases from within several subfields of communication, including first language acquisition, second language acquisition, conversation analysis, classroom discourse, and aphasic language. It uses these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary linguistic materials via networked computers.

The Competition Model is a psycholinguistic theory of language acquisition and sentence processing, developed by Elizabeth Bates and Brian MacWhinney (1982). The claim in MacWhinney, Bates, and Kliegl (1984) is that "the forms of natural languages are created, governed, constrained, acquired, and used in the service of communicative functions." Furthermore, the model holds that processing is based on an online competition between these communicative functions or motives. The model focuses on competition during sentence processing, crosslinguistic competition in bilingualism, and the role of competition in language acquisition. It is an emergentist theory of language acquisition and processing, serving as an alternative to strict innatist and empiricist theories. According to the Competition Model, patterns in language arise from Darwinian competition and selection on a variety of time/process scales including phylogenetic, ontogenetic, social diffusion, and synchronic scales.

LENA is a developer of advanced technology and programs to accelerate language development of children 0–3 and to close opportunity gaps.

Social interactionist theory (SIT) is an explanation of language development emphasizing the role of social interaction between the developing child and linguistically knowledgeable adults. It is based largely on the socio-cultural theories of Soviet psychologist, Lev Vygotsky.

EXMARaLDA is a set of free software tools for creating, managing and analyzing spoken language corpora. It consists of a transcription tool, a tool for administering corpus meta data and a tool for doing queries on spoken language corpora. EXMARaLDA is used for doing conversation and discourse analysis, dialectology, phonology and research into first and second language acquisition in children and adults. EXMARaLDA is based on the open standards XML and Unicode and programmed in Java.

The main purpose of theories of second-language acquisition (SLA) is to shed light on how people who already know one language learn a second language. The field of second-language acquisition involves various contributions, such as linguistics, sociolinguistics, psychology, cognitive science, neuroscience, and education. These multiple fields in second-language acquisition can be grouped as four major research strands: (a) linguistic dimensions of SLA, (b) cognitive dimensions of SLA, (c) socio-cultural dimensions of SLA, and (d) instructional dimensions of SLA. While the orientation of each research strand is distinct, they are in common in that they can guide us to find helpful condition to facilitate successful language learning. Acknowledging the contributions of each perspective and the interdisciplinarity between each field, more and more second language researchers are now trying to have a bigger lens on examining the complexities of second language acquisition.

The Cambridge English Corpus (CEC) (formerly the Cambridge International Corpus, CIC), is a multi-billion word corpus of English language (containing both text corpus and spoken corpus data). The Cambridge English Corpus contains data from a number of sources including written and spoken, British and American English. The CEC also contains the Cambridge Learner Corpus, a 40m word corpus made up from English exam responses written by English language learners.

Anat Ninio is a professor emeritus of psychology at the Hebrew University of Jerusalem, Israel. She specializes in the interactive context of language acquisition, the communicative functions of speech, pragmatic development, and syntactic development.

The following outline is provided as an overview of and topical guide to second-language acquisition:

The CLAN (Computerized Language ANalysis) program is a cross-platform program designed by Brian MacWhinney and written by Leonid Spektor for the purpose of creating and analyzing transcripts in the Child Language Exchange System (CHILDES) database. CLAN is open source software and can be freely downloaded.

In language acquisition, negative evidence is information concerning what is not possible in a language. Importantly, negative evidence does not show what is grammatical; that is positive evidence. In theory, negative evidence would help eliminate ungrammatical constructions by revealing what is not grammatical. Direct negative evidence refers to comments made by an adult language-user in response to a learner's ungrammatical utterance. Indirect negative evidence refers to the absence of ungrammatical sentences in the language that the child is exposed to. There is debate among linguists and psychologists about whether negative evidence can help children determine the grammar of their language. Negative evidence, if it is used, could help children rule out ungrammatical constructions in their language.

References

  1. 1 2 3 Sanchez, Alesssandro; Meylan, Stephan C; Braginsky, Mika; MacDonald, Kyle E; Yurovsky, Daniel; Frank, Michael C (2019). "childes-db: A Flexible and reproducible interface to the child language data exchange system". Behavior Research Methods. 51 (4): 1928–1941. doi:10.3758/s13428-018-1176-7. PMID   30623390. S2CID   256203840. Archived from the original on February 20, 2020. Retrieved January 19, 2023.
  2. "Introduction to the Database" (PDF). Archived from the original (PDF) on 2015-09-22. Retrieved 2015-07-08.
  3. 1 2 MacWhinney, Brian. "CHILDES: Child Language Data Exchange System". Language Technologies Institute. Archived from the original on September 6, 2015. Retrieved January 19, 2023.
  4. Biber, Douglas (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press. ISBN   9780521499576.
  5. "Introduction to the Database" (PDF). Archived from the original (PDF) on 2015-09-22. Retrieved 2015-07-08.
  6. MacWhinney, Brian; Snow, Catherine (1990). "The Child Language Data Exchange System: an update". J Child Lang. 17 (2): 457–72. doi:10.1017/s0305000900013866. PMC   9807025 . PMID   2380278.
  7. "The CHILDES Project: Tools for Analyzing Talk" (PDF). Archived from the original (PDF) on 2009-02-20.
  8. "Articles based on usage of CHILDES" (PDF). Retrieved 2014-04-06.
  9. "The CHILDES Project: Tools for Analyzing Talk" (PDF). Archived from the original (PDF) on 2015-07-10. Retrieved 2015-07-08.
  10. "Google Scholar CHILDES Search" . Retrieved 2015-07-08.