The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 [1] by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition. [2] [1] Its earliest transcripts date from the 1960s, and as of 2015 has contents (transcripts, audio, and video) in 26 languages from 230 different corpora, all of which are publicly available worldwide. [3] Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and speech of adults directed to them. [4]
During the early 1990s, as computational resources capable of easily manipulating the data volumes found in CHILDES became commonly available, there was a significant increase in the number of studies of child language acquisition that made use of it. [1] CHILDES is currently directed and maintained by Brian MacWhinney at Carnegie Mellon University.
There are a variety of languages and ages represented in the CHILDES transcripts. The majority of the transcripts are from spontaneous interactions and conversations. [3] [5] The transcriptions are coded in the CHAT (Codes for the Human Analysis of Transcripts) transcription format, which provides a standardized format for producing conversational transcripts. [6] This system can be used to transcribe conversations with any type of language learner: children, second-language learners, and recovering aphasics. In addition to discourse level transcription, the CHAT system also has options for phonological and morphological analysis. The CLAN program was developed by Leonid Spektor and aids in transcription and analysis of the child language data. [7]
To date, over 4500 published studies cite CHILDES. [8] CHILDES reports this number in their manuals [9] and Google Scholar contains 5451 citations as of July 2017. [10]
Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language. In other words, it is how human beings gain the ability to be aware of language, to understand it, and to produce and use words and sentences to communicate.
A second language (L2) is a language spoken in addition to one's first language (L1). A second language may be a neighbouring language, another language of the speaker's home country, or a foreign language. A speaker's dominant language, which is the language a speaker uses most or is most comfortable with, is not necessarily the speaker's first language. For example, the Canadian census defines first language for its purposes as "the first language learned in childhood and still spoken", recognizing that for some, the earliest language may be lost, a process known as language attrition. This can happen when young children start school or move to a new language environment.
Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances or preexisting text in another writing system.
Conversation analysis (CA) is an approach to the study of social interaction that empirically investigates the mechanisms by which humans achieve mutual understanding. It focuses on both verbal and non-verbal conduct, especially in situations of everyday life. CA originated as a sociological method, but has since spread to other fields. CA began with a focus on casual conversation, but its methods were subsequently adapted to embrace more task- and institution-centered interactions, such as those occurring in doctors' offices, courts, law enforcement, helplines, educational settings, and the mass media, and focus on multimodal and nonverbal activity in interaction, including gaze, body movement and gesture. As a consequence, the term conversation analysis has become something of a misnomer, but it has continued as a term for a distinctive and successful approach to the analysis of interactions. CA and ethnomethodology are sometimes considered one field and referred to as EMCA.
Elizabeth Ann Bates was a professor of cognitive science at the University of California, San Diego. She was an internationally renowned expert and leading researcher in child language acquisition, psycholinguistics, aphasia, and the neurological bases of language, and she authored 10 books and over 200 peer-reviewed articles and book chapters on these subjects. Bates was well known for her assertion that linguistic knowledge is distributed throughout the brain and is subserved by general cognitive and neurological processes.
Second-language acquisition (SLA), sometimes called second-language learning—otherwise referred to as L2acquisition, is the process by which people learn a second language. Second-language acquisition is also the scientific discipline devoted to studying that process. This involves learning an additional language after the first language is established, typically through formal instruction or immersion. A central theme in SLA research is that of interlanguage: the idea that the language that learners use is not simply the result of differences between the languages that they already know and the language that they are learning, but a complete language system in its own right, with its own systematic rules. This interlanguage gradually develops as learners are exposed to the targeted language. The order in which learners acquire features of their new language stays remarkably constant, even for learners with different native languages and regardless of whether they have had language instruction. However, languages that learners already know can have a significant influence on the process of learning a new one. This influence is known as language transfer.
Catherine Elizabeth Snow is an educational psychologist and applied linguist. In 2009 Snow was appointed to the Patricia Albjerg Graham Professorship in the Harvard Graduate School of Education, having previously held the Henry Lee Shattuck Professorship also in the Harvard Graduate School of Education. Snow is past president of the American Educational Research Association (2000–2001). She chaired the RAND Corporation 'reading study group' from 1999.
Brian James MacWhinney is a Professor of Psychology and Modern Languages at Carnegie Mellon University. He specializes in first and second language acquisition, psycholinguistics, and the neurological bases of language, and he has written and edited several books and over 100 peer-reviewed articles and book chapters on these subjects. MacWhinney is best known for his competition model of language acquisition and for creating the CHILDES and TalkBank corpora. He has also helped to develop a stream of pioneering software programs for creating and running psychological experiments, including PsyScope, an experimental control system for the Macintosh; E-Prime, an experimental control system for the Microsoft Windows platform; and System for Teaching Experimental Psychology (STEP), a database of scripts for facilitating and improving psychological and linguistic research.
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.
TalkBank is a multilingual corpus established in 2002 and currently directed and maintained by Brian MacWhinney. The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It contains sample databases from within several subfields of communication, including first language acquisition, second language acquisition, conversation analysis, classroom discourse, and aphasic language. It uses these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary linguistic materials via networked computers.
The Competition Model is a psycholinguistic theory of language acquisition and sentence processing, developed by Elizabeth Bates and Brian MacWhinney (1982). The claim in MacWhinney, Bates, and Kliegl (1984) is that "the forms of natural languages are created, governed, constrained, acquired, and used in the service of communicative functions." Furthermore, the model holds that processing is based on an online competition between these communicative functions or motives. The model focuses on competition during sentence processing, crosslinguistic competition in bilingualism, and the role of competition in language acquisition. It is an emergentist theory of language acquisition and processing, serving as an alternative to strict innatist and empiricist theories. According to the Competition Model, patterns in language arise from Darwinian competition and selection on a variety of time/process scales including phylogenetic, ontogenetic, social diffusion, and synchronic scales.
Language Environment Analysis (LENA) is a developer of advanced technology and programs to accelerate the language development of children 0-3 and to close opportunity gaps.
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax, semantics (meaning), morphology, phonetics, phonology, and pragmatics. Subdisciplines such as biolinguistics and psycholinguistics bridge many of these divisions.
Social interactionist theory (SIT) is an explanation of language development emphasizing the role of social interaction between the developing child and linguistically knowledgeable adults. It is based largely on the socio-cultural theories of Soviet psychologist, Lev Vygotsky.
EXMARaLDA is a set of free software tools for creating, managing and analyzing spoken language corpora. It consists of a transcription tool, a tool for administering corpus meta data and a tool for doing queries on spoken language corpora. EXMARaLDA is used for doing conversation and discourse analysis, dialectology, phonology and research into first and second language acquisition in children and adults. EXMARaLDA is based on the open standards XML and Unicode and programmed in Java.
The main purpose of theories of second-language acquisition (SLA) is to shed light on how people who already know one language learn a second language. The field of second-language acquisition involves various contributions, such as linguistics, sociolinguistics, psychology, cognitive science, neuroscience, and education. These multiple fields in second-language acquisition can be grouped as four major research strands: (a) linguistic dimensions of SLA, (b) cognitive dimensions of SLA, (c) socio-cultural dimensions of SLA, and (d) instructional dimensions of SLA. While the orientation of each research strand is distinct, they are in common in that they can guide us to find helpful condition to facilitate successful language learning. Acknowledging the contributions of each perspective and the interdisciplinarity between each field, more and more second language researchers are now trying to have a bigger lens on examining the complexities of second language acquisition.
The Cambridge International Corpus (CIC) is a collection of over 800 million words of real spoken and written English. The texts are stored in a database that can be searched to see how English is used. The CIC also contains the Cambridge Learner Corpus, a unique collection of over 60,000 exam papers from Cambridge ESOL. It shows real mistakes students make and highlights the parts of English which cause problems for students.
Anat Ninio is a professor emeritus of psychology at the Hebrew University of Jerusalem, Israel. She specializes in the interactive context of language acquisition, the communicative functions of speech, pragmatic development, and syntactic development.
The following outline is provided as an overview of and topical guide to second-language acquisition:
The CLAN (Computerized Language ANalysis) program is a cross-platform program designed by Brian MacWhinney and written by Leonid Spektor for the purpose of creating and analyzing transcripts in the Child Language Exchange System (CHILDES) database. CLAN is open source software and can be freely downloaded.