CHILDES

Last updated August 26, 2025

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition.^[1]^[2] CHILDES is currently directed and maintained by Brian MacWhinney at Carnegie Mellon University.

The earliest transcripts in CHILDES date from the 1960s. As of 2024, the repository included content (transcripts, audio, and video) in 48 languages from 436 different corpora, all of which are publicly available worldwide.^[3] Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and speech of adults directed to them.^[4]

During the early 1990s, as computational resources capable of easily manipulating the data volumes found in CHILDES became commonly available, there was a significant increase in the number of studies of child language acquisition that made use of it.^[2]

Database format

There are a variety of languages and ages represented in the CHILDES transcripts. The majority of the transcripts are from spontaneous interactions and conversations.^[5]^[6] The transcriptions are coded in the CHAT (Codes for the Human Analysis of Transcripts) transcription format, which provides a standardized format for producing conversational transcripts.^[1] This system can be used to transcribe conversations with any type of language learner: children, second-language learners, and recovering aphasics. In addition to discourse level transcription, the CHAT system also has options for phonological and morphological analysis. The CLAN program was developed by Leonid Spektor and aids in transcription and analysis of the child language data.^[7]

Use in research

As of June 2025, Google Scholar reported 11,221 citations of CHILDES in its database.^[8]

References

1 2 MacWhinney, Brian; Snow, Catherine (1990). "The Child Language Data Exchange System: an update". J Child Lang. 17 (2): 457–72. doi:10.1017/s0305000900013866. PMC 9807025 . PMID 2380278.
1 2 Sanchez, Alesssandro; Meylan, Stephan C; Braginsky, Mika; MacDonald, Kyle E; Yurovsky, Daniel; Frank, Michael C (2019). "childes-db: A Flexible and reproducible interface to the child language data exchange system". Behavior Research Methods. 51 (4): 1928–1941. doi:10.3758/s13428-018-1176-7. hdl: 1721.1/131922 . PMID 30623390. S2CID 256203840. Archived from the original on February 20, 2020. Retrieved January 19, 2023.
↑ Kempe, Vera; Brooks, Patricia J.; Gillis, Steven (2024). "Four Decades of Open Language Science: The CHILDES Project" . Language Teaching Research Quarterly. 44: 15–30. doi:10.32038/ltrq.2024.44.04. ISSN 2667-6753.
↑ Biber, D. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press. ISBN 9780521499576.
↑ MacWhinney, Brian. "CHILDES: Child Language Data Exchange System". Language Technologies Institute. Archived from the original on September 6, 2015. Retrieved January 19, 2023.
↑ "Introduction to the Database" (PDF). Archived from the original (PDF) on 2015-09-22. Retrieved 2015-07-08.
↑ "The CHILDES Project: Tools for Analyzing Talk" (PDF). Archived from the original (PDF) on 2015-07-10. Retrieved 2015-07-08.
↑ "Google Scholar citations for "The CHILDES project: Tools for analyzing talk, Volume I: Transcription format and programs"". scholar.google.com. Retrieved 2025-06-30.

External links

This article about language acquisition is a stub. You can help Wikipedia by expanding it.

This text corpus or speech corpus-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 MacWhinney, Brian; Snow, Catherine (1990). "The Child Language Data Exchange System: an update". J Child Lang. 17 (2): 457–72. doi:10.1017/s0305000900013866. PMC 9807025 . PMID 2380278.

[db-2] 1 2 Sanchez, Alesssandro; Meylan, Stephan C; Braginsky, Mika; MacDonald, Kyle E; Yurovsky, Daniel; Frank, Michael C (2019). "childes-db: A Flexible and reproducible interface to the child language data exchange system". Behavior Research Methods. 51 (4): 1928–1941. doi:10.3758/s13428-018-1176-7. hdl: 1721.1/131922 . PMID 30623390. S2CID 256203840. Archived from the original on February 20, 2020. Retrieved January 19, 2023.

[3] Kempe, Vera; Brooks, Patricia J.; Gillis, Steven (2024). "Four Decades of Open Language Science: The CHILDES Project" . Language Teaching Research Quarterly. 44: 15–30. doi:10.32038/ltrq.2024.44.04. ISSN 2667-6753.

[4] Biber, D. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press. ISBN 9780521499576.

[brian-5] MacWhinney, Brian. "CHILDES: Child Language Data Exchange System". Language Technologies Institute. Archived from the original on September 6, 2015. Retrieved January 19, 2023.

[6] "Introduction to the Database" (PDF). Archived from the original (PDF) on 2015-09-22. Retrieved 2015-07-08.

[7] "The CHILDES Project: Tools for Analyzing Talk" (PDF). Archived from the original (PDF) on 2015-07-10. Retrieved 2015-07-08.

[8] "Google Scholar citations for "The CHILDES project: Tools for analyzing talk, Volume I: Transcription format and programs"". scholar.google.com. Retrieved 2025-06-30.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

v t e Corpus linguistics
Text corpora, English	American National Corpus Bank of English Bergen Corpus of London Teenage Language British National Corpus Brown Corpus Buckeye Corpus Cambridge English Corpus Corpus of Contemporary American English Enron Corpus EnTenTen International Corpus of English Lancaster-Oslo-Bergen Corpus Oxford English Corpus PropBank Spoken English Corpus Switchboard Telephone Speech Corpus TIMIT VerbNet Wellington Corpus of Spoken New Zealand English
Text corpora, non-English	Bijankhan Corpus CHILDES CorCenCC National Corpus of Contemporary Welsh Croatian Language Corpus Croatian National Corpus Czech National Corpus Europarl Corpus German Reference Corpus Hamshahri Corpus National Corpus of Polish Neo-Assyrian Text Corpus Project Persian Speech Corpus Quranic Arabic Corpus Russian National Corpus Somali Corpus Scottish Corpus of Texts and Speech Slovenian National Corpus TalkBank Tatoeba Tekstaro de Esperanto TenTen Corpus Family Thesaurus Linguae Graecae
Organizations	BNC consortium COBUILD Sketch Engine

CHILDES

Contents

Database format

Use in research

References

External links