CMU Pronouncing Dictionary

Last updated
CMU Pronouncing Dictionary
Developer(s) Carnegie Mellon University
Stable release
0.7b / November 19, 2014;9 years ago (2014-11-19)
Available in English
License BSD
Website www.speech.cs.cmu.edu/cgi-bin/cmudict

The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.

Contents

CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models [1] that will generate pronunciations for words not yet included in the dictionary.

The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available. [2]

Database format

The database is distributed as a plain text file with one entry to a line in the format "WORD  <pronunciation>" with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;; token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR).

The following is a table of phonemes used by CMU Pronouncing Dictionary. [2]

Vowels
ARPABET Rspl. IPA Example
AAah ɑ odd
AEa æ at
AH0ə ə about
AHuh ʌ hut
AOaw ɔ ought, story
AWowcow
AYeyehide
EHeh ɛ Ed
Vowels
ARPABET Rspl. IPA Example
ERur, ər ɝ , ɚ hurt
EYayate
IHi, ih ɪ it
IYee i eat
OWohoat
OYoyɔɪtoy
UHuu ʊ hood
UWoo u two
Stress
ABDescription
0No stress
1 Primary stress
2 Secondary stress
Consonants
ARPABET Rspl. IPA Example
Bb b be
CHch, tch cheese
Dd d dee
DHdh ð thee
Ff f fee
Gg ɡ green
HHh h he
JHj gee
Consonants
ARPABET Rspl. IPA Example
Kk k key
Ll l lee
Mm m me
Nn n knee
NGng ŋ ping
Pp p pee
Rr r read
Ss, ss s sea
Consonants
ARPABET Rspl. IPA Example
SHsh ʃ she
Tt t tea
THth θ theta
Vv v vee
Ww, wh w we
Yy j yield
Zz z zee
ZHzh ʒ seizure

History

VersionRelease date [3] License
0.116 September 1993Public Domain
0.210 March 1994Public Domain
0.328 September 1994Public Domain
0.48 November 1995Public Domain
0.5No public releasePublic Domain
0.611 August 1998Public Domain
0.7No public releasePublic Domain
0.7a18 February 2008 2-clause BSD
0.7b19 November 2014 [4] 2-clause BSD
GitHub (unversioned)26 May 2021 2-clause BSD

Applications

See also

Related Research Articles

<span class="mw-page-title-main">International Phonetic Alphabet</span> System of phonetic notation

The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation of speech sounds in written form. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech–language pathologists, singers, actors, constructed language creators, and translators.

In phonology and linguistics, a phoneme is a set of phones that can distinguish one word from another in a particular language.

Received Pronunciation (RP) is the accent traditionally regarded as the standard and most prestigious form of spoken British English. For over a century, there has been argument over such questions as the definition of RP, whether it is geographically neutral, how many speakers there are, whether sub-varieties exist, how appropriate a choice it is as a standard, and how the accent has changed over time. The name itself is controversial. RP is an accent, so the study of RP is concerned only with matters of pronunciation, while other areas relevant to the study of language standards, such as vocabulary, grammar, and style, are not considered.

The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable ASCII characters, based on the International Phonetic Alphabet (IPA). It was originally developed in the late 1980s for six European languages by the EEC ESPRIT information technology research and development program. As many symbols as possible have been taken over from the IPA; where this is not possible, other signs that are available are used, e.g. [@] for schwa, [2] for the vowel sound found in French deux, and [9] for the vowel sound found in French neuf.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

Phonetic transcription is the visual representation of speech sounds by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the International Phonetic Alphabet.

A phonemic orthography is an orthography in which the graphemes correspond to the language's phonemes. Natural languages rarely have perfectly phonemic orthographies; a high degree of grapheme–phoneme correspondence can be expected in orthographies based on alphabetic writing systems, but they differ in how complete this correspondence is. English orthography, for example, is alphabetic but highly nonphonemic; it was once mostly phonemic during the Middle English stage, when the modern spellings originated, but spoken English changed rapidly while the orthography was much more stable, resulting in the modern nonphonemic situation. On the contrary the Albanian, Serbian/Croatian/Bosnian/Montenegrin, Romanian, Italian, Turkish, Spanish, Finnish, Czech, Latvian, Esperanto, Korean and Swahili orthographic systems come much closer to being consistent phonemic representations.

The Moby Project is a collection of public-domain lexical resources created by Grady Ward. The resources were dedicated to the public domain, and are now mirrored at Project Gutenberg. As of 2007, it contains the largest free phonetic database, with 177,267 words and corresponding pronunciations.

A.tong is one of the Garo dialect Sino-Tibetan language which is also related to Koch, Rabha, Bodo other than Garo language. It is spoken in the South Garo Hills and West Khasi Hills districts of Meghalaya state in Northeast India, southern Kamrup district in Assam, and adjacent areas in Bangladesh. The spelling "A.tong" is based on the way the speakers themselves pronounce the name of their language. There is no glottal stop in the name and it is not a tonal language.

<span class="mw-page-title-main">Romanization of Hebrew</span> Transcription of Hebrew into the Latin alphabet

The Hebrew language uses the Hebrew alphabet with optional vowel diacritics. The romanization of Hebrew is the use of the Latin alphabet to transliterate Hebrew words.

Th is a digraph in the Latin script. It was originally introduced into Latin to transliterate Greek loan words. In modern languages that use the Latin alphabet, it represents a number of different sounds. It is the most common digraph in order of frequency in the English language.

A pronunciation respelling for English is a notation used to convey the pronunciation of words in the English language, which do not have a phonemic orthography.

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation, or accent reduction. Pronunciation assessment does not determine unknown speech but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their intelligibility to listeners, sometimes along with often inconsequential prosody such as intonation, pitch, tempo, rhythm, and stress. Pronunciation assessment is also used in reading tutoring, for example in products such as Microsoft Teams and from Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia.

CMU Sphinx, also called Sphinx for short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers and an acoustic model trainer (SphinxTrain).

ARPABET is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character and the other with one or two (case-insensitive), were devised, the latter being far more widely adopted.

The Pronunciation Lexicon Specification (PLS) is a W3C Recommendation, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use.

Audio mining is a technique by which the content of an audio signal can be automatically analyzed and searched. It is most commonly used in the field of automatic speech recognition, where the analysis tries to identify any speech within the audio. The term ‘audio mining’ is sometimes used interchangeably with audio indexing, phonetic searching, phonetic indexing, speech indexing, audio analytics, speech analytics, word spotting, and information retrieval. Audio indexing, however, is mostly used to describe the pre-process of audio mining, in which the audio file is broken down into a searchable index of words.

The orthographic depth of an alphabetic orthography indicates the degree to which a written language deviates from simple one-to-one letter–phoneme correspondence. It depends on how easy it is to predict the pronunciation of a word based on its spelling: shallow orthographies are easy to pronounce based on the written word, and deep orthographies are difficult to pronounce based on how they are written.

The English Pronouncing Dictionary (EPD) was created by the British phonetician Daniel Jones and was first published in 1917. It originally comprised over 50,000 headwords listed in their spelling form, each of which was given one or more pronunciations transcribed using a set of phonemic symbols based on a standard accent. The dictionary is now in its 18th edition. John C. Wells has written of it "EPD has set the standard against which other dictionaries must inevitably be judged".

References

  1. "Sequitur G2P - A trainable Grapheme-to-Phoneme converter".
  2. 1 2 "The CMU Pronouncing Dictionary". CMU Pronouncing Dictionary. 2015-07-16. Archived from the original on 2022-06-03. Retrieved 2022-06-04.
  3. ftp://ftp.cs.cmu.edu/project/speech/dict/%5B%5D
  4. http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/00README_FIRST.txt [ bare URL plain text file ]
  5. "Cmusphinx - Revision 10973: /Trunk/Logios". Archived from the original on 2011-05-20. Retrieved 2009-12-19.