Linguistic Data Consortium

Last updated

The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for linguistics research and development purposes. The University of Pennsylvania is the LDC's host institution. The LDC was founded in 1992 with a grant from the US Defense Advanced Research Projects Agency (DARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the National Science Foundation. The director of LDC is Mark Liberman and the executive director is Christopher Cieri.

See also

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.

In linguistics, a corpus or text corpus is a language resource consisting of a large and structured set of texts. In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

<span class="mw-page-title-main">Linguistic Society of America</span> Learned society in the US

The Linguistic Society of America (LSA) is a learned society for the field of linguistics. Founded in New York City in 1924, the LSA works to promote the scientific study of language. The society publishes three scholarly journals: Language, the open access journal Semantics and Pragmatics, and the open access journal Phonological Data & Analysis. Its annual meetings, held every winter, foster discussion amongst its members through the presentation of peer-reviewed research, as well as conducting official business of the society. Since 1928, the LSA has offered training to linguists through courses held at its biennial Linguistic Institutes held in the summer. The LSA and its 3,600 members work to raise awareness of linguistic issues with the public and contribute to policy debates on issues including bilingual education and the preservation of endangered languages.

The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities.

<span class="mw-page-title-main">Janet Pierrehumbert</span> American linguist

Janet Pierrehumbert is Professor of Language Modelling in the Oxford e-Research Centre at the University of Oxford and a senior research fellow of Trinity College, Oxford. She developed an intonational model which includes a grammar of intonation patterns and an explicit algorithm for calculating pitch contours in speech, as well as an account of intonational meaning. It has been widely influential in speech technology, psycholinguistics, and theories of language form and meaning. Pierrehumbert is also affiliated with the New Zealand Institute of Language Brain and Behaviour at the University of Canterbury.

Brian James MacWhinney is a Professor of Psychology and Modern Languages at Carnegie Mellon University. He specializes in first and second language acquisition, psycholinguistics, and the neurological bases of language, and he has written and edited several books and over 100 peer-reviewed articles and book chapters on these subjects. MacWhinney is best known for his competition model of language acquisition and for creating the CHILDES and TalkBank corpora. He has also helped to develop a stream of pioneering software programs for creating and running psychological experiments, including PsyScope, an experimental control system for the Macintosh; E-Prime, an experimental control system for the Microsoft Windows platform; and System for Teaching Experimental Psychology (STEP), a database of scripts for facilitating and improving psychological and linguistic research.

The European Language Resources Association (ELRA) is a not-for-profit organisation established under the law of the Grand Duchy of Luxembourg. Its seat is in Luxembourg and its headquarters is in Paris, France.

Mark Yoffe Liberman is an American linguist. He has a dual appointment at the University of Pennsylvania, as Trustee Professor of Phonetics in the Department of Linguistics, and as a professor in the Department of Computer and Information Sciences. He is the founder and director of the Linguistic Data Consortium. Liberman is the Faculty Director of Ware College House at the University of Pennsylvania.

Linguistic categories include

Computational lexicology is a branch of computational linguistics, which is concerned with the use of computers in the study of lexicon. It has been more narrowly described by some scholars as the use of computers in the study of machine-readable dictionaries. It is distinguished from computational lexicography, which more properly would be the use of computers in the construction of dictionaries, though some researchers have used computational lexicography as synonymous.

Sandra Annear Thompson is an American linguist specializing in discourse analysis, typology, and interactional linguistics. She is Professor Emerita of Linguistics at the University of California, Santa Barbara (UCSB). She has published numerous books, her research has appeared in many linguistics journals, and she serves on the editorial board of several prominent linguistics journals.

The LINGUIST List is a major online resource for the academic field of linguistics. It was founded by Anthony Aristar in early 1990 at the University of Western Australia, and is used as a reference by the National Science Foundation in the United States. Its main and oldest feature is the premoderated electronic mailing list, now with thousands of subscribers all over the world, where queries and their summarised results, discussions, journal table of contents, dissertation abstracts, calls for papers, book and conference announcements, software notices and other useful pieces of linguistic information are posted.

The BABEL speech corpus is a corpus of recorded speech materials from five Central and Eastern European languages. Intended for use in speech technology applications, it was funded by a grant from the European Union and completed in 1998. It is distributed by the European Language Resources Association.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

Helen Aristar-Dry is an American linguist who currently serves as the series editor for SpringerBriefs in Linguistics. Most notably, from 1991 to 2013 she co-directed The LINGUIST List with Anthony Aristar. She has served as principal investigator or co-Principal Investigator on over $5,000,000 worth of research grants from the National Science Foundation and the National Endowment for the Humanities. She retired as Professor of English Language and Literature from Eastern Michigan University in 2013.

<span class="mw-page-title-main">Dafydd Gibbon</span> British professor

Dafydd Gibbon is a British emeritus professor of English and General Linguistics at Bielefeld University in Germany, specialising in computational linguistics, the lexicography of spoken languages, applied phonetics and phonology. He is particularly concerned with endangered languages and has received awards from the Ivory Coast, Nigeria and Poland.

In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications."

Patience Louise "Pattie" Epps is an American linguistics professor and researcher at the University of Texas at Austin whose main research focus is on the Naduhup language family, which consists of four extant languages in the Amazon.