Buckeye Corpus

Last updated February 08, 2026

The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark Pitt.^[1]^[2]^[3]^[4] It contains high-quality recordings from 40 speakers in Columbus, Ohio conversing freely with an interviewer. The interviewer's voice is heard only faintly in the background of these recordings. The sessions were conducted as Sociolinguistics interviews, and are essentially monologues. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is also available at the project web site. The corpus is available to researchers in academia and industry.

References

↑ Pitt, Mark, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. (2005). The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability. Speech Communication, 45, 90-95.
↑ Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (2006). Word-medial /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change, 18(1), 55-97.
↑ Eric Fosler-Lussier, Laura Dilley, Na’im Tyson, Mark Pitt (2007) The Buckeye Corpus of Speech: Updates and Enhancements. In Proceedings of Interspeech 2007, Antwerp, Belgium.
↑ Dilley, L., & Pitt, M. (2007). A study of regressive place assimilation in spontaneous speech and its implications for spoken word recognition. Journal of the Acoustical Society of America, 122(4), 2340-2353.

External links

Buckeye Speech Corpus Homepage

This text corpus or speech corpus-related article is a stub. You can help Wikipedia by adding missing information.

This article about the English language is a stub. You can help Wikipedia by adding missing information.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Pitt, Mark, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. (2005). The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability. Speech Communication, 45, 90-95.

[2] Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (2006). Word-medial /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change, 18(1), 55-97.

[3] Eric Fosler-Lussier, Laura Dilley, Na’im Tyson, Mark Pitt (2007) The Buckeye Corpus of Speech: Updates and Enhancements. In Proceedings of Interspeech 2007, Antwerp, Belgium.

[4] Dilley, L., & Pitt, M. (2007). A study of regressive place assimilation in spontaneous speech and its implications for spoken word recognition. Journal of the Acoustical Society of America, 122(4), 2340-2353.

[1]

[2]

[3]

[4]

v t e Corpus linguistics
Text corpora, English	American National Corpus Bank of English Bergen Corpus of London Teenage Language British National Corpus Brown Corpus Buckeye Corpus Cambridge English Corpus Corpus of Contemporary American English Enron Corpus EnTenTen International Corpus of English Lancaster-Oslo-Bergen Corpus Oxford English Corpus PropBank Spoken English Corpus Switchboard Telephone Speech Corpus TIMIT VerbNet Wellington Corpus of Spoken New Zealand English
Text corpora, non-English	Bijankhan Corpus CHILDES CorCenCC National Corpus of Contemporary Welsh Croatian Language Corpus Croatian National Corpus Czech National Corpus Europarl Corpus German Reference Corpus Hamshahri Corpus National Corpus of Polish Neo-Assyrian Text Corpus Project Persian Speech Corpus Quranic Arabic Corpus Russian National Corpus Somali Corpus Scottish Corpus of Texts and Speech Slovenian National Corpus TalkBank Tatoeba Tekstaro de Esperanto TenTen Corpus Family Thesaurus Linguae Graecae
Organizations	BNC consortium COBUILD Sketch Engine

Buckeye Corpus

Contents

References

Further reading

External links