SAMPA

Last updated

The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable ASCII characters, based on the International Phonetic Alphabet (IPA). It was originally developed in the late 1980s for six European languages by the EEC ESPRIT information technology research and development program. As many symbols as possible have been taken over from the IPA; where this is not possible, other signs that are available are used, e.g. [@] for schwa (IPA [ə]), [2] for the vowel sound found in French deux 'two' (IPA [ø]), and [9] for the vowel sound found in French neuf 'nine' (IPA [œ]).

Contents

Today, officially, SAMPA has been developed for all the sounds of the following languages:

The characters ["s{mp@] represent the pronunciation of the name SAMPA in English, with the initial symbol ["] indicating primary stress. Like IPA, SAMPA is usually enclosed in square brackets or slashes, which are not part of the alphabet proper and merely signify that it is phonetic as opposed to regular text.

Features

SAMPA was developed in the late 1980s in the European Commission-funded ESPRIT project 2589 "Speech Assessment Methods" (SAM)—hence "SAM Phonetic Alphabet"—in order to facilitate email data exchange and computational processing of transcriptions in phonetics and speech technology.

SAMPA is a partial encoding of the IPA. The first version of SAMPA was the union of the sets of phoneme codes for Danish, Dutch, English, French, German and Italian; later versions extended SAMPA to cover other European languages. Since SAMPA is based on phoneme inventories, each SAMPA table is valid only in the language it was created for. In order to make this IPA encoding technique universally applicable, X-SAMPA was created, which provides one single table without language-specific differences.

SAMPA was devised as a hack to work around the inability of text encodings to represent IPA symbols. Consequently, as Unicode support for IPA symbols becomes more widespread, the necessity for a separate, computer-readable system for representing the IPA in ASCII decreases. However, text input relies on specific keyboard encodings or input devices. For this reason, SAMPA and X-SAMPA are still widely used [1] [ better source needed ] in computational phonetics and in speech technology.

See also

Related Research Articles

<span class="mw-page-title-main">International Phonetic Alphabet</span> System of phonetic notation

The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation of speech sounds in written form. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech–language pathologists, singers, actors, constructed language creators, and translators.

In phonology and linguistics, a phoneme is a set of phones that can distinguish one word from another in a particular language.

T, or t, is the twentieth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is tee, plural tees.

The following show the typical symbols for consonants and vowels used in SAMPA, an ASCII-based system based on the International Phonetic Alphabet. SAMPA is not a universal system as it varies from language to language.

Phonetic transcription is the visual representation of speech sounds by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the International Phonetic Alphabet.

Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanised Devanagari is also called Romanagari.

Kirshenbaum, sometimes called ASCII-IPA or erkIPA, is a system used to represent the International Phonetic Alphabet (IPA) in ASCII. This way it allows typewriting IPA-symbols by regular keyboard. It was developed for Usenet, notably the newsgroups sci.lang and alt.usage.english. It is named after Evan Kirshenbaum, who led the collaboration that created it. The eSpeak open source software speech synthesizer uses the Kirshenbaum scheme.

The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) is a variant of SAMPA developed in 1995 by John C. Wells, professor of phonetics at University College London. It is designed to unify the individual language SAMPA alphabets, and extend SAMPA to cover the entire range of characters in the 1993 version of International Phonetic Alphabet (IPA). The result is a SAMPA-inspired remapping of the IPA into 7-bit ASCII.

The voiced palatal approximant, or yod, is a type of consonant used in many spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is j. The equivalent X-SAMPA symbol is j, and in the Americanist phonetic notation it is ⟨y⟩. Because the English name of the letter J, jay, starts with, the approximant is sometimes instead called yod (jod), as in the phonological history terms yod-dropping and yod-coalescence.

<span class="mw-page-title-main">Voiceless palatal fricative</span> Consonantal sound represented by ⟨ç⟩ in IPA

The voiceless palatal fricative is a type of consonantal sound used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ç, and the equivalent X-SAMPA symbol is C. It is the non-sibilant equivalent of the voiceless alveolo-palatal fricative.

Voice or voicing is a term used in phonetics and phonology to characterize speech sounds. Speech sounds can be described as either voiceless or voiced.

Americanist phonetic notation, also known as the North American Phonetic Alphabet (NAPA), the Americanist Phonetic Alphabet or the American Phonetic Alphabet (APA), is a system of phonetic notation originally developed by European and American anthropologists and language scientists for the phonetic and phonemic transcription of indigenous languages of the Americas and for languages of Europe. It is still commonly used by linguists working on, among others, Slavic, Uralic, Semitic languages and for the languages of the Caucasus, of India, and of much of Africa; however, Uralists commonly use a variant known as the Uralic Phonetic Alphabet.

<span class="mw-page-title-main">History of the International Phonetic Alphabet</span> History of the IPA phonetic representation system

The International Phonetic Alphabet was created soon after the International Phonetic Association was established in the late 19th century. It was intended as an international system of phonetic transcription for oral languages, originally for pedagogical purposes. The Association was established in Paris in 1886 by French and British language teachers led by Paul Passy. The prototype of the alphabet appeared in Phonetic Teachers' Association (1888b). The Association based their alphabet upon the Romic alphabet of Henry Sweet, which in turn was based on the Phonotypic Alphabet of Isaac Pitman and the Palæotype of Alexander John Ellis.

<span class="mw-page-title-main">Extensions to the International Phonetic Alphabet</span> Disordered speech additions to the phonetic alphabet

The Extensions to the International Phonetic Alphabet for Disordered Speech, commonly abbreviated extIPA, are a set of letters and diacritics devised by the International Clinical Phonetics and Linguistics Association to augment the International Phonetic Alphabet for the phonetic transcription of disordered speech. Some of the symbols are used for transcribing features of normal speech in IPA transcription, and are accepted as such by the International Phonetic Association.

A pronunciation respelling for English is a notation used to convey the pronunciation of words in the English language, which do not have a phonemic orthography.

L, or l, is the twelfth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is el, plural els.

C, or c, is the third letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is cee, plural cees.

ARPABET is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character and the other with one or two (case-insensitive), were devised, the latter being far more widely adopted.

IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

The International Phonetic Alphabet (IPA) consists of more than 100 letters and diacritics. Before Unicode became widely available, several ASCII-based encoding systems of the IPA were proposed. The alphabet went through a large revision at the Kiel Convention of 1989, and the vowel symbols again in 1993. Systems devised before these revisions inevitably lack support for the additions they introduced.

References

  1. "Project Euphonia's Personalized Speech Recognition for Non-Standard Speech". Google AI Blog. Retrieved 2019-08-16.