Buckeye Corpus

Last updated

The Buckeye Corpus of conversational speech is a speech corpus created by a team of linguists and psychologists at Ohio State University led by Prof. Mark Pitt. [1] [2] [3] [4] It contains high-quality recordings from 40 speakers in Columbus, Ohio conversing freely with an interviewer. The interviewer's voice is heard only faintly in the background of these recordings. The sessions were conducted as Sociolinguistics interviews, and are essentially monologues. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is also available at the project web site. The corpus is available to researchers in academics and industry.

Contents

The project was funded by the National Institute on Deafness and Other Communication Disorders and the Office of Research at Ohio State University.

Related Research Articles

International Phonetic Alphabet Alphabetic system of phonetic notation

The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation of speech sounds in written form. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech–language pathologists, singers, actors, constructed language creators and translators.

The Kana are syllabaries used to write Japanese phonological units, morae. Such syllabaries include: (1) the original kana, or magana, which were Chinese characters (kanji) used phonetically to transcribe Japanese; the most prominent magana system being man'yōgana (万葉仮名); the two descendants of man'yōgana, (2) cursive hiragana, and (3) angular katakana. There are also hentaigana, which are historical variants of the now standard hiragana. In current usage, kana can simply mean hiragana and katakana.

Ohio State University Public research university in Columbus, Ohio, United States

The Ohio State University, commonly Ohio State or OSU, is a public land-grant research university in Columbus, Ohio. The flagship of the University System of Ohio, it is considered a Public Ivy, and has been ranked by major institutional rankings as among the best public universities in the United States. Founded in 1870 as the state's land-grant university and the ninth university in Ohio with the Morrill Act of 1862, Ohio State was originally known as the Ohio Agricultural and Mechanical College and focused on various agricultural and mechanical disciplines but it developed into a comprehensive university under the direction of then-Governor and later U.S. president Rutherford B. Hayes, and in 1878 the Ohio General Assembly passed a law changing the name to "the Ohio State University" and broadening the scope of the university. Admission standards greatly tightened and became more selective throughout the 1990s and 2000s.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields.

Transcription in the linguistic sense is the systematic representation of spoken language in written form. The source can either be utterances or preexisting text in another writing system.

Glottal stop Sound made by stopping airflow in the glottis

The glottal plosive or stop is a type of consonantal sound used in many spoken languages, produced by obstructing airflow in the vocal tract or, more precisely, the glottis. The symbol in the International Phonetic Alphabet that represents this sound is ⟨ʔ⟩.

Online chat Real-time texting over the internet

Online chat may refer to any kind of communication over the Internet that offers a real-time transmission of text messages from sender to receiver. Chat messages are generally short in order to enable other participants to respond quickly. Thereby, a feeling similar to a spoken conversation is created, which distinguishes chatting from other text-based online communication forms such as Internet forums and email. Online chat may address point-to-point communications as well as multicast communications from one sender to many receivers and voice and video chat, or may be a feature of a web conferencing service.

Australian English (AuE) is a non-rhotic variety of English spoken by most native-born Australians. Phonologically, it is one of the most regionally homogeneous language varieties in the world. As with most dialects of English, it is distinguished primarily by its vowel phonology.

Michigan–Ohio State football rivalry College football rivalry in the United States

The Michigan–Ohio State football rivalry is an American college football rivalry game played annually between the University of Michigan Wolverines and The Ohio State University Buckeyes. It gathered particular national interest as most of the games from the 1970s through the mid-2000s determined the Big Ten Conference title and the resulting Rose Bowl Game match ups, and many influenced the outcome of the national college football championship. The game was ranked by ESPN in 2000 as the greatest North American sports rivalry.

Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language processing.

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time.

A speech corpus is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models. In linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.

The CMU Pronouncing Dictionary is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.

Speech-generating device

Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients suffering from amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies.

A non-native speech database is a speech database of non-native pronunciations of English. Such databases are used in the development of: multilingual automatic speech recognition systems, text to speech systems, pronunciation trainers, and second language learning systems.

In linguistics, a backchannel during a conversation occurs when one participant is speaking and another participant interjects responses to the speaker. A backchannel response can be verbal, non-verbal, or both. Backchannel responses are often phatic expressions, primarily serving a social or meta-conversational purpose, such as signifying the listener's attention, understanding, or agreement, rather than conveying significant information. Examples include such expressions as "yeah", "uh-huh", "hmm", and "right".

This article covers the phonological system of New Zealand English. While New Zealanders speak differently depending on their level of cultivation, this article covers the accent as it is spoken by educated speakers, unless otherwise noted. The IPA transcription is one designed by Bauer et al. (2007) specifically to faithfully represent a New Zealand accent, which this article follows in most aspects.

The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of more than 3.7 hours of MSA speech aligned with recorded speech on the phoneme level. The annotations include word stress marks on the individual phonemes.

The Persian Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about 2.5 hours of Persian speech aligned with recorded speech on the phoneme level, including annotations of word boundaries. Previous spoken corpora of Persian include FARSDAT, which consists of read aloud speech from newspaper texts from 100 Persian speakers and the Telephone FARsi Spoken language DATabase (TFARSDAT) which comprises seven hours of read and spontaneous speech produced by 60 native speakers of Persian from ten regions of Iran.

References

  1. Pitt, Mark, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. (2005). The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability. Speech Communication, 45, 90-95.
  2. Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (2006). Word-medial /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change, 18(1), 55-97.
  3. Eric Fosler-Lussier, Laura Dilley, Na’im Tyson, Mark Pitt (2007) The Buckeye Corpus of Speech: Updates and Enhancements. In Proceedings of Interspeech 2007, Antwerp, Belgium.
  4. Dilley, L., & Pitt, M. (2007). A study of regressive place assimilation in spontaneous speech and its implications for spoken word recognition. Journal of the Acoustical Society of America, 122(4), 2340-2353.

Further reading

Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational Speech (2nd release) Columbus, OH: Department of Psychology, Ohio State University (Distributor).