Linguistic Data Consortium

Linguistic Data Consortium
Founded	1992;33 years ago
Headquarters	Philadelphia, Pennsylvania , United States
Website	www.ldc.upenn.edu

Last updated March 28, 2025

The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for linguistics research and development purposes. The University of Pennsylvania is the LDC's host institution. The LDC was founded in 1992 with a grant from the US Defense Advanced Research Projects Agency (DARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the National Science Foundation.^[1]^[2] The director of LDC is Mark Liberman.^[3] It subsumed the previous ACL Data Collection Initiative.

Part of the motivation was to support the benchmark-oriented methodology of DARPA's Human Language Technology program. Previously, John R. Pierce directed the committee that produced the ALPAC report (1966), which caused a severe decrease in funding for linguistic AI for about 10 years. Later, Charles Wayne restarted funding in speech and language in the mid-1980s. In order to avoid the criticisms from the ALPAC report, they needed a way to demonstrate objective progress, which led to the benchmark-oriented methodology. DARPA would propose specific quantifiable and testable score targets on benchmarks, and teams being funded would attempt to reach the score targets.^[4]^[5]

It was noted that by 1993, the data needed for training and benchmarking the models was big enough that "Not even the largest companies can easily afford enough of [the needed] data... Researchers at smaller companies and in universities risk being frozen out of the process almost entirely."^[6] The LDC provided a central location for creating and dispensing such data. There is a membership fee that has been increased once since its founding.^[4]

References

↑ "About LDC". Linguistic Data Consortium. Retrieved June 18, 2024.
↑ "NSF Award Search: Award # 9528587 - HLR: Improved Speech and Text Data Resources". www.nsf.gov. Retrieved 2025-03-27.
↑ "Staff". Linguistic Data Consortium. Retrieved June 18, 2024.
1 2 Cieri, Christopher; Liberman, Mark; Cho, Sunghye; Strassel, Stephanie; Fiumara, James; Wright, Jonathan (June 2022). Calzolari, Nicoletta; Béchet, Frédéric; Blache, Philippe; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Isahara, Hitoshi; Maegaard, Bente (eds.). "Reflections on 30 Years of Language Resource Development and Sharing". Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association: 543–550.
↑ Liberman, Mark; Wayne, Charles (June 2020). "Human Language Technology". AI Magazine. 41 (2): 22–35. doi:10.1609/aimag.v41i2.5297. ISSN 0738-4602.
↑ Liberman, M. and Godfrey, J. (1993). The Linguistic Data Consortium. In Chen, Keh-Jiann, Chu-Ren Huang, Proc. ROCLing Computational Linguistics Conference VI, Nantou, Taiwan, September. Association for Computational Linguistics and Chinese Language Processing (ACLCLP).

External links

LDC Website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "About LDC". Linguistic Data Consortium. Retrieved June 18, 2024.

[2] "NSF Award Search: Award # 9528587 - HLR: Improved Speech and Text Data Resources". www.nsf.gov. Retrieved 2025-03-27.

[3] "Staff". Linguistic Data Consortium. Retrieved June 18, 2024.

[:0-4] 1 2 Cieri, Christopher; Liberman, Mark; Cho, Sunghye; Strassel, Stephanie; Fiumara, James; Wright, Jonathan (June 2022). Calzolari, Nicoletta; Béchet, Frédéric; Blache, Philippe; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Isahara, Hitoshi; Maegaard, Bente (eds.). "Reflections on 30 Years of Language Resource Development and Sharing". Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association: 543–550.

[5] Liberman, Mark; Wayne, Charles (June 2020). "Human Language Technology". AI Magazine. 41 (2): 22–35. doi:10.1609/aimag.v41i2.5297. ISSN 0738-4602.

[6] Liberman, M. and Godfrey, J. (1993). The Linguistic Data Consortium. In Chen, Keh-Jiann, Chu-Ren Huang, Proc. ROCLing Computational Linguistics Conference VI, Nantou, Taiwan, September. Association for Computational Linguistics and Chinese Language Processing (ACLCLP).

[1]

[2]

[3]

[4]

[5]

[6]

Authority control databases
International	ISNI VIAF
National	United States Israel
Academics	CiNii

Linguistic Data Consortium

Contents

See also

References

External links


Founded	1992;33 years ago (1992)
Headquarters	Philadelphia, Pennsylvania , United States
Website	www.ldc.upenn.edu