TalkBank

Last updated

TalkBank is a multilingual corpus established in 2002 and currently directed and maintained by Brian MacWhinney. The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It contains sample databases from within several subfields of communication, including first language acquisition, second language acquisition, conversation analysis, classroom discourse, and aphasic language. It uses these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary linguistic materials via networked computers. [1] [2] [3]

Contents

TalkBank contains CHILDES (Child Language Data Exchange System), a corpus of first language acquisition data. It also hosts the CLAN (CHILDES Language Analyzer) software used to transcribe, handle and play media, in the CHAT format. [4]

See also

Related Research Articles

Language Communication using symbols (such as words) structured with grammar

A language is a structured system of communication used by humans, based on speech and gesture, sign, or often writing. The structure of language is its grammar and the free components are its vocabulary. Many languages, including the most widely-spoken ones, have writing systems that enable sounds or signs to be recorded for later reactivation. Human language is unique among known systems of animal communication in that it is not dependent on a single mode of transmission, it is highly variable between cultures and across time, and affords a much wider range of expression than other systems. It has the properties of productivity and displacement, and relies on social convention and learning.

Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language, as well as to produce and use words and sentences to communicate.

Universal grammar Theory in linguistics, usually credited to Noam Chomsky, proposing that the ability to learn grammar is hard-wired into the brain

Universal grammar (UG), in modern linguistics, is the theory of the genetic component of the language faculty, usually credited to Noam Chomsky. The basic postulate of UG is that a certain set of structural rules are innate to humans, independent of sensory experience. With more linguistic stimuli received in the course of psychological development, children then adopt specific syntactic rules that conform to UG. It is sometimes known as "mental grammar", and stands contrasted with other "grammars", e.g. prescriptive, descriptive and pedagogical. The advocates of this theory emphasize and partially rely on the poverty of the stimulus (POS) argument and the existence of some universal properties of natural human languages. However, the latter has not been firmly established, as some linguists have argued languages are so diverse that such universality is rare. It is a matter of empirical investigation to determine precisely what properties are universal and what linguistic capacities are innate.

Baby sign language Signed language systems used with hearing infants/toddlers

Baby sign language is the use of manual signing allowing infants and toddlers to communicate emotions, desires, and objects prior to spoken language development. With guidance and encouragement signing develops from a natural stage in infants development known as gesture. These gestures are taught in conjunction with speech to hearing children, and are not the same as a sign language. Some common benefits that have been found through the use of baby sign programs include an increased parent-child bond and communication, decreased frustration, and improved self-esteem for both the parent and child. Researchers have found that baby sign neither benefits nor harms the language development of infants. Promotional products and ease of information access have increased the attention that baby sign receives, making it pertinent that caregivers become educated before making the decision to use baby sign.

Baby talk is a type of speech associated with an older person speaking to a child. It is also called caretaker speech, infant-directed speech (IDS), child-directed speech (CDS), child-directed language (CDL), caregiver register, parentese, or motherese.

Language development in humans is a process starting early in life. Infants start without knowing a language, yet by 10 months, babies can distinguish speech sounds and engage in babbling. Some research has shown that the earliest learning begins in utero when the fetus starts to recognize the sounds and speech patterns of its mother's voice and differentiate them from other sounds after birth.

The Child Language Data Exchange System (CHILDES) is a corpus established in 1984 by Brian MacWhinney and Catherine Snow to serve as a central repository for data of first language acquisition. Its earliest transcripts date from the 1960s, and it now has contents in 26 languages from 130 different corpora, all of which are publicly available worldwide. Recently, CHILDES has been made into a component of the larger corpus TalkBank, which also includes language data from aphasics, second language acquisition, conversation analysis, and classroom language learning. CHILDES is mainly used for analyzing the language of young children and directed to the child speech of adults.

Catherine Elizabeth Snow is an educational psychologist and applied linguist. In 2009 Snow was appointed to the Patricia Albjerg Graham Professorship in the Harvard Graduate School of Education, having previously held the Henry Lee Shattuck Professorship also in the Harvard Graduate School of Education. Snow is past president of the American Educational Research Association (2000-2001). She chaired the RAND Corporation 'reading study group' from 1999.

Brian James MacWhinney is a Professor of Psychology and Modern Languages at Carnegie Mellon University. He specializes in first and second language acquisition, psycholinguistics, and the neurological bases of language, and he has written and edited several books and over 100 peer-reviewed articles and book chapters on these subjects. MacWhinney is best known for his competition model of language acquisition and for creating the CHILDES and TalkBank corpora. He has also helped to develop a stream of pioneering software programs for creating and running psychological experiments, including PsyScope, an experimental control system for the Macintosh; E-Prime, an experimental control system for the Microsoft Windows platform; and System for Teaching Experimental Psychology (STEP), a database of scripts for facilitating and improving psychological and linguistic research.

Singapore Sign Language, or SgSL, is the native sign language used by the deaf and hard of hearing in Singapore, developed over six decades since the setting up of the first school for the Deaf in 1954. Since Singapore's independence in 1965, the Singapore deaf community has had to adapt to many linguistic changes. Today, the local deaf community recognises Singapore Sign Language (SgSL) as a reflection of Singapore's diverse linguistic culture. SgSL is influenced by Shanghainese Sign Language (SSL), American Sign Language (ASL), Signing Exact English (SEE-II) and locally developed signs. The total number of deaf clients registered with The Singapore Association For The Deaf (SADeaf), an organisation that advocates equal opportunity for the deaf, is 5,756, as of 2014. Among which, only about one-third stated their knowledge of Sign Language.

Max Planck Institute for Psycholinguistics

The Max Planck Institute for Psycholinguistics is a research institute situated on the campus of Radboud University Nijmegen located in Nijmegen, Gelderland, the Netherlands. Founded in 1980 by Pim Levelt, it is the only institution in the world entirely dedicated to psycholinguistics, and is also one of only three among a total of 90 within the Max Planck Society to be located outside Germany. The Nijmegen-based institute currently occupies 5th position in the Ranking Web of World Research Centers among all Max Planck institutes. It currently employs about 235 people.

Internet linguistics

Internet linguistics is a domain of linguistics advocated by the English linguist David Crystal. It studies new language styles and forms that have arisen under the influence of the Internet and of other new media, such as Short Message Service (SMS) text messaging. Since the beginning of human-computer interaction (HCI) leading to computer-mediated communication (CMC) and Internet-mediated communication (IMC), experts, such as Gretchen McCulloch have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. Such study aims to benefit both linguists and web users combined.

LENA is a developer of advanced technology and programs to accelerate language development of children 0-3 and to close opportunity gaps.

Virginia Yip (葉彩燕), is a Hong Kong linguist and writer. She is director of the Childhood Bilingualism Research Centre. She is a professor at the Chinese University of Hong Kong. Her research interests include bilingual language acquisition, second language acquisition, Cantonese, Chaozhou and comparative Sinitic grammar, psycholinguistics, and cognitive science.

Social interactionist theory (SIT) is an explanation of language development emphasizing the role of social interaction between the developing child and linguistically knowledgeable adults. It is based largely on the socio-cultural theories of Soviet psychologist, Lev Vygotsky.

Anat Ninio is a professor emeritus of psychology at the Hebrew University of Jerusalem, Israel. She specializes in the interactive context of language acquisition, the communicative functions of speech, pragmatic development, and syntactic development.

The Google Ngram Viewer or Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in sources printed between 1500 and 2019 in Google's text corpora in English, Chinese (simplified), French, German, Hebrew, Italian, Russian, or Spanish. There are also some specialized English corpora, such as American English, British English, and English Fiction.

The CLAN program is a cross-platform program designed by Brian MacWhinney and written by Leonid Spektor for the purpose of creating and analyzing transcripts in the Child Language Exchange System (CHILDES) database. CLAN is open source software and can be freely downloaded.

Hrafnhildur Hanna Ragnarsdóttir Icelandic academic

Hrafnhildur Hanna Ragnarsdóttir is professor emerita in Developmental and Educational Science at the University of Iceland. Her research is concerned with long-term language development and its relation to cognition, social-emotional development and literacy. Her primary research focus has been on the development of vocabulary, grammar, and narratives in early childhood and the first school years and on later language development as it appears in oral vs/written text construction and in narratives vs/expository texts from middle grades through adolescence and into adulthood.

References

  1. "From CHILDES to TalkBank" . Retrieved 24 February 2009.
  2. "Linguistic Annotation". Archived from the original on 5 March 2009. Retrieved 24 February 2009.
  3. "TalkBank: Multimedia Database of Communicative Interactions". Archived from the original on 10 June 2010. Retrieved 24 February 2009.
  4. "Using CLAN". dali.talkbank.org. Retrieved 23 November 2021.