X-SAMPA

Last updated

The Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) is a variant of SAMPA developed in 1995 by John C. Wells, professor of phonetics at University College London. [1] It is designed to unify the individual language SAMPA alphabets, and extend SAMPA to cover the entire range of characters in the 1993 version of International Phonetic Alphabet (IPA). The result is a SAMPA-inspired remapping of the IPA into 7-bit ASCII.

Contents

SAMPA was devised as a hack to work around the inability of text encodings to represent IPA symbols. Later, as Unicode support for IPA symbols became more widespread, the necessity for a separate, computer-readable system for representing the IPA in ASCII decreased. However, X-SAMPA is still useful as the basis for an input method for true IPA.

Summary

Notes

Lower-case symbols

X-SAMPAIPAIPA imageDescriptionExamples
aa IPA Unicode 0x0061.svg open front unrounded vowel French dame[dam]
bb IPA Unicode 0x0062.svg voiced bilabial plosive English bed[bEd], French bon[bO~]
b_<ɓ IPA Unicode 0x0253.svg voiced bilabial implosive Sindhi ɓarʊ [b_<arU]
cc IPA Unicode 0x0063.svg voiceless palatal plosive Hungarian latyak["lQcQk]
dd IPA Unicode 0x0064.svg voiced alveolar plosive English dig[dIg], French doigt[dwa]
d`ɖ IPA Unicode 0x0256.svg voiced retroflex plosive Swedish hord[hu:d`]
d_<ɗ IPA Unicode 0x0257.svg voiced alveolar implosive Sindhi ɗarʊ [d_<arU]
ee IPA Unicode 0x0065.svg close-mid front unrounded vowel French blé[ble]
ff IPA Unicode 0x0066.svg voiceless labiodental fricative English five[faIv], French femme[fam]
gɡ IPA Unicode 0x0261.svg voiced velar plosive English game[geIm], French longue[lO~g]
g_<ɠ IPA Unicode 0x0260.svg voiced velar implosive Sindhi ɠəro [g_<@ro]
hh IPA Unicode 0x0068.svg voiceless glottal fricative English house[haUs]
h\ɦ IPA Unicode 0x0266.svg voiced glottal fricative Czech hrad[h\rat]
ii IPA Unicode 0x0069.svg close front unrounded vowel English be[bi:], French oui[wi], Spanish si[si]
jj IPA Unicode 0x006A.svg palatal approximant English yes[jEs], French yeux[j2]
j\ʝ IPA Unicode 0x029D.svg voiced palatal fricative Greek γειά[j\a]
kk IPA Unicode 0x006B.svg voiceless velar plosive English skip[skIp], Spanish carro["karo]
ll IPA Unicode 0x006C.svg alveolar lateral approximant English lay[leI], French mal[mal]
l`ɭ IPA Unicode 0x026D.svg retroflex lateral approximant Svealand Swedish sorl[so:l`]
l\ɺ IPA Unicode 0x027A.svg alveolar lateral flap Wayuu püülükü[pM:l\MkM]
mm IPA Unicode 0x006D.svg bilabial nasal English mouse[maUs], French homme[Om]
nn IPA Unicode 0x006E.svg alveolar nasal English nap[n{p], French non[nO~]
n`ɳ IPA Unicode 0x0273.svg retroflex nasal Swedish rn[h2:n`]
oo IPA Unicode 0x006F.svg close-mid back rounded vowel French veau[vo]
pp IPA Unicode 0x0070.svg voiceless bilabial plosive English speak[spik], French pose[poz], Spanish perro["pero]
p\ɸ IPA Unicode 0x0278.svg voiceless bilabial fricative Japanese fuku[p\M_0kM]
qq IPA Unicode 0x0071.svg voiceless uvular plosive Arabic qasbah["qQs_Gba]
rr IPA Unicode 0x0072.svg alveolar trill Spanish perro["pero]
r`ɽ IPA Unicode 0x027D.svg retroflex flap Bengali gari[gar`i:]
r\ɹ IPA Unicode 0x0279.svg alveolar approximant English red[r\Ed]
r\`ɻ IPA Unicode 0x027B.svg retroflex approximant Malayalam വഴി["v@r\`i]
ss IPA Unicode 0x0073.svg voiceless alveolar fricative English seem[si:m], French session[sE"sjO~]
s`ʂ IPA Unicode 0x0282.svg voiceless retroflex fricative Swedish mars[mas`]
s\ɕ IPA Unicode 0x0255.svg voiceless alveolo-palatal fricative Polish świerszcz[s\v'ers`ts`]
tt IPA Unicode 0x0074.svg voiceless alveolar plosive English stew[stju:], French raté[Ra"te]
t`ʈ IPA Unicode 0x0288.svg voiceless retroflex plosive Swedish rt[m2t`]
uu IPA Unicode 0x0075.svg close back rounded vowel English boom[bu:m], Spanish su[su]
vv IPA Unicode 0x0076.svg voiced labiodental fricative English vest[vEst], French voix[vwa]
v\ (or P)ʋ IPA Unicode 0x028B.svg labiodental approximant Dutch west[v\Est]/[PEst]
ww IPA Unicode 0x0077.svg labial-velar approximant English west[wEst], French oui[wi]
xx IPA Unicode 0x0078.svg voiceless velar fricative Scots loch[lOx] or [5Ox]; German Buch, Dach; Spanish caja, gestión
x\ɧ IPA Unicode 0x0267.svg voiceless palatal-velar fricative Swedish sjal[x\A:l]
yy IPA Unicode 0x0079.svg close front rounded vowel French tu[ty] German über["y:b6]
zz IPA Unicode 0x007A.svg voiced alveolar fricative English zoo[zu:], French azote[a"zOt]
z`ʐ IPA Unicode 0x0290.svg voiced retroflex fricative Mandarin Chinese rang[z`aN]
z\ʑ IPA Unicode 0x0291.svg voiced alveolo-palatal fricative Polish źrebak["z\rEbak]

Capital symbols

X-SAMPAIPAIPA imageDescriptionExample
Aɑ IPA Unicode 0x0251.svg open back unrounded vowel English father ["fA:D@(r\)] (RP and Gen.Am.)
Bβ IPA Unicode 0x03B2.svg voiced bilabial fricative Spanish lavar[la"Ba4]
B\ʙ IPA Unicode 0x0299.svg bilabial trill Reminiscent of shivering ("brrr")
Cç IPA Unicode 0x00E7.svg voiceless palatal fricative German ich[IC], English human["Cjum@n] (broad transcription uses [hj-])
Dð IPA Unicode 0x00F0.svg voiced dental fricative English then[DEn]
Eɛ IPA Unicode 0x025B.svg open-mid front unrounded vowel French même[mE:m], English met[mEt] (RP and Gen.Am.)
Fɱ IPA Unicode 0x0271.svg labiodental nasal English emphasis["EFf@sIs] (spoken quickly, otherwise uses [Emf-])
Gɣ IPA Unicode 0x0263.svg voiced velar fricative Greek γωνία[Go"nia]
G\ɢ IPA Unicode 0x0262.svg voiced uvular plosive Inuktitut nirivvik[niG\ivvik]
G\_<ʛ IPA Unicode 0x029B.svg voiced uvular implosive Mam ʛa [G\_<a]
Hɥ IPA Unicode 0x0265.svg labial-palatal approximant French huit[Hit]
H\ʜ IPA Unicode 0x029C.svg voiceless epiglottal fricative Agul мехӀ[mEH\]
Iɪ IPA Unicode 0x026A.svg near-close front unrounded vowel English kit[kIt]
I\ IPA Unicode 0x1D7B.svg near-close central unrounded vowel (non-IPA)Polish ryba[rI\bA] 
Jɲ IPA Unicode 0x0272.svg palatal nasal Spanish año["aJo], English canyon["k{J@n] (broad transcription uses [-nj-])
J\ɟ IPA Unicode 0x025F.svg voiced palatal plosive Hungarian egy[EJ\]
J\_<ʄ IPA Unicode 0x0284.svg voiced palatal implosive Sindhi ʄaro [J\_<aro]
Kɬ IPA Unicode 0x026C.svg voiceless alveolar lateral fricative Welsh llaw[KaU]
K\ɮ IPA Unicode 0x026E.svg voiced alveolar lateral fricative Mongolian долоо[tOK\O:]
Lʎ IPA Unicode 0x028E.svg palatal lateral approximant Italian famiglia[fa"miLLa], Castilian: llamar[La"mar]
L\ʟ IPA Unicode 0x029F.svg velar lateral approximant Korean 구지[t6L\gudz\i]
Mɯ IPA Unicode 0x026F.svg close back unrounded vowel Korean [M:ms\_hik_}]
M\ɰ IPA Unicode 0x0270.svg velar approximant Spanish fuego["fweM\o]
Nŋ IPA Unicode 0x014B.svg velar nasal English thing[TIN]
N\ɴ IPA Unicode 0x0274.svg uvular nasal Japanese san[saN\]
Oɔ IPA Unicode 0x0254.svg open-mid back rounded vowel American English off[O:f]
O\ʘ IPA Unicode 0x0298.svg bilabial click  
P (or v\)ʋ IPA Unicode 0x028B.svg labiodental approximant Dutch west[PEst]/[v\Est], allophone of English phoneme /r\/
Qɒ IPA Unicode 0x0252.svg open back rounded vowel RP lot[lQt]
Rʁ IPA Unicode 0x0281.svg voiced uvular fricative German rein[RaIn]
R\ʀ IPA Unicode 0x0280.svg uvular trill French roi[R\wa]
Sʃ IPA Unicode 0x0283.svg voiceless postalveolar fricative English ship[SIp]
Tθ IPA Unicode 0x03B8.svg voiceless dental fricative English thin[TIn]
Uʊ IPA Unicode 0x028A.svg near-close back rounded vowel English foot[fUt]
U\ᵿ IPA Unicode 0x1D7F.svg near-close central rounded vowel (non-IPA)English euphoria[jU\"fO@r\i@]
Vʌ IPA Unicode 0x028C.svg open-mid back unrounded vowel Scottish English strut[str\Vt]
Wʍ IPA Unicode 0x028D.svg voiceless labial-velar fricative Scots when[WEn]
Xχ IPA Unicode 0x03C7.svg voiceless uvular fricative Klallam sχaʔqʷaʔ[sXa?q_wa?]
X\ħ IPA Unicode 0x0127.svg voiceless pharyngeal fricative Arabic حāʾ[X\A:]
Yʏ IPA Unicode 0x028F.svg near-close front rounded vowel German hübsch[hYpS]
Zʒ IPA Unicode 0x0292.svg voiced postalveolar fricative English vision["vIZ@n]

Other symbols

X-SAMPAIPAIPA imageDescriptionExample
.. Unicode 0x002E.svg syllable break 
"ˈ IPA Unicode 0x02C8.svg primary stress  
%ˌ IPA Unicode 0x02CC.svg secondary stress American English pronunciation[pr\@%nVn.si."eI.S@n]
' (or _j)ʲ IPA Unicode 0x02B2.svg palatalized Russian Земля (Earth) [z'I"ml'a] or [z_jI"ml_ja]
:ː Length sign.svg long 
:\ˑ IPA Unicode 0x02D1.svg half longEstonian differentiates three vowel lengths
- separatorPolish trzy[t-S1] vs. czy[tS1] (affricate)
@ə IPA Unicode 0x0259.svg schwa English arena[@"r\i:n@]
@\ɘ IPA Unicode 0x0258.svg close-mid central unrounded vowel Paicĩ kɘ̄ɾɘ[k@\_M4@\_M]
@`ɚ IPA Unicode 0x025A.svg r-coloured schwa American English color["kVl@`]
{æ IPA Unicode 0x00E6.svg near-open front unrounded vowel English trap[tr\{p]
}ʉ IPA Unicode 0x0289.svg close central rounded vowel Swedish sju[x\}:]; AuE/NZE boot[b}:t]
1ɨ IPA Unicode 0x0268.svg close central unrounded vowel Welsh tu[t1], American English rose's["r\oUz1z]
2ø IPA Unicode 0x00F8.svg close-mid front rounded vowel Danish købe["k2:b@], French deux[d2]
3ɜ IPA Unicode 0x025C.svg open-mid central unrounded vowel English nurse[n3:s] (RP) or [n3`s] (Gen.Am.)
3\ɞ IPA Unicode 0x025E.svg open-mid central rounded vowel Irish tomhail[t3\:l']
4ɾ IPA Unicode 0x027E.svg alveolar flap Spanish pero["pe4o], American English better["bE4@`]
5ɫ IPA Unicode 0x026B.svg velarized alveolar lateral approximant; also see _eEnglish milk[mI5k], Portuguese livro["5iv4u]
6ɐ IPA Unicode 0x0250.svg near-open central vowel German besser["bEs6], Australian English mud[m6d]
7ɤ IPA Unicode 0x0264.svg close-mid back unrounded vowel Estonian kõik[k7ik], Vietnamese mơ[m7_M]
8ɵ IPA Unicode 0x0275.svg close-mid central rounded vowel Swedish buss[b8s]
9œ IPA Unicode 0x0153.svg open-mid front rounded vowel French neuf[n9f], Danish drømme[dR9m@]
&ɶ IPA Unicode 0x0276.svg open front rounded vowel Swedish skörd[x\&d`]
?ʔ IPA Unicode 0x0294.svg glottal stop Cockney English bottle["bQ?o]
?\ʕ IPA Unicode 0x0295.svg voiced pharyngeal fricative Arabic عʿayn[?\Ajn]
* undefined escape character, SAMPA's "conjunctor" 
// Unicode 0x002F.svg (a) French vowel archiphonemes or indeterminacies
(b) delimiter of phonemic transcriptions
maison/mE/zO~/
< Unicode 0x27E8.svg begin nonsegmental notation, e.g., SAMPROSA [3]  
<\ʢ IPA Unicode 0x02A2.svg voiced epiglottal fricative Siwi arˤbˤəʢa (four) [ar_?\b_?\@<\a]
> Unicode 0x27E9.svg end nonsegmental notation 
>\ʡ IPA Unicode 0x02A1.svg epiglottal plosive Archi гӀарз (complaint) [>\arz]
^ Unicode 0xA71B.svg upstep  
! Unicode 0xA71C.svg downstep  
!\ǃ IPA Unicode 0x01C3.svg postalveolar click Zulu iqaqa (polecat) [i:!\a:!\a]
|| IPA Unicode 0x007C.svg minor (foot) group 
|\ǀ IPA Unicode 0x01C0 alt.svg dental click Zulu icici (earring) [i:|\i:|\i]
|| IPA Unicode 0x2016.svg major (intonation) group 
|\|\ǁ IPA Unicode 0x01C1 alt.svg alveolar lateral click Zulu xoxa (to converse) [|\|\O:|\|\a]
=\ǂ IPA Unicode 0x01C2 alt.svg palatal click  
-\ IPA Unicode 0x203F.svg linking mark  

Diacritics

X-SAMPAIPAIPA imageDescription
_" ̈ IPA Unicode 0x0308.svg centralized
_+ ̟ IPA Unicode 0x031F.svg advanced
_- ̠ IPA Unicode 0x0320.svg retracted
_/ ̌ IPA Unicode 0x030C.svg rising tone
_0 ̥ IPA Unicode 0x0325.svg voiceless
_<  implosive (IPA uses separate symbols for implosives)
= (or _=) ̩ IPA Unicode 0x0329.svg syllabic
_>ʼ IPA Unicode 0x02BC.svg ejective
_?\ˤ IPA Unicode 0x02C1.svg pharyngealized
_\ ̂ IPA Unicode 0x0302.svg falling tone
_^ ̯ IPA Unicode 0x032F.svg non-syllabic
_} ̚ IPA Unicode 0x031A.svg no audible release
` ˞ IPA Unicode 0x02DE.svg rhotacization in vowels, retroflexion in consonants (IPA uses separate symbols for consonants, see t` for an example)
~ (or _~) ̃ IPA Unicode 0x0303.svg nasalization
_A ̘ IPA Unicode 0x0318.svg advanced tongue root
_a ̺ IPA Unicode 0x033A.svg apical
_B ̏ IPA Unicode 0x030F.svg extra low tone
_B_L  IPA Unicode 0x1DC5.svg low rising tone
_c ̜ IPA Unicode 0x031C.svg less rounded
_d ̪ IPA Unicode 0x032A.svg dental
_e ̴ IPA Unicode 0x0334.svg velarized or pharyngealized; also see 5
<F> IPA global decrease from Gentium ancora.svg global fall
_F ̂ IPA Unicode 0x0302.svg falling tone
_Gˠ IPA Unicode 0x02E0.svg velarized
_H ́ IPA Unicode 0x0301.svg high tone
_H_T  IPA Unicode 0x1DC4.svg high rising tone
_hʰ IPA Unicode 0x02B0.svg aspirated
_j (or ')ʲ IPA Unicode 0x02B2.svg palatalized
_k ̰ IPA Unicode 0x0330.svg creaky voice
_L ̀ IPA Unicode 0x0300.svg low tone
_lˡ IPA Unicode 0x02E1.svg lateral release
_M ̄ IPA Unicode 0x0304.svg mid tone
_m ̻ IPA Unicode 0x033B.svg laminal
_N ̼ IPA Unicode 0x033C.svg linguolabial
_n IPA Unicode 0x207F.svg nasal release
_O ̹ IPA Unicode 0x0339.svg more rounded
_o ̞ IPA Unicode 0x031E.svg lowered
_q ̙ IPA Unicode 0x0319.svg retracted tongue root
<R> IPA global increase from Gentium ancora.svg global rise
_R ̌ IPA Unicode 0x030C.svg rising tone
_R_F  IPA Unicode 0x1DC8.svg rising falling tone
_r ̝ IPA Unicode 0x031D.svg raised
_T ̋ IPA Unicode 0x030B.svg extra high tone
_t ̤ IPA Unicode 0x0324.svg breathy voice
_v ̬ IPA Unicode 0x032C.svg voiced
_wʷ IPA Unicode 0x02B7.svg labialized
_X ̆ IPA Unicode 0x0306.svg extra-short
_x ̽ IPA Unicode 0x033D.svg mid-centralized

Charts

Consonants

Consonants (pulmonic)
Place of articulation Labial Coronal Dorsal Laryngeal
Manner of articulation Bilabial Labio‐
dental
Dental Alveolar Post‐
alveolar
Retro‐
flex
Palatal Velar Uvular Pharyn‐
geal
Epi‐
glottal
Glottal
Nasal     m     F     n     n`     J     N     N\
Plosive p b p_d b_d t d t` d` c J\ k g q G\ >\ ?
Fricative p\ B f v T D s z S Z s` z` C j\ x G X R X\ ?\ H\ <\ h h\
Approximant     B_o     v\     r\     r\`     j     M\
Trill     B\     r     *     R\     *
Tap or Flap     *     *     4     r`     *
Lateral Fricative K K\ *     *     *    
Lateral Approximant     l     l`     L     L\
Lateral Flap     l\     *     *     *
Coarticulated
W Voiceless labialized velar approximant
w Voiced labialized velar approximant
H Voiced labialized palatal approximant
s\ Voiceless palatalized postalveolar (alveolo-palatal) fricative
z\ Voiced palatalized postalveolar (alveolo-palatal) fricative
x\ Voiceless "palatal-velar" fricative
Affricates and double articulation
ts voiceless alveolar affricate
dz voiced alveolar affricate
tS voiceless postalveolar affricate
dZ voiced postalveolar affricate
ts\ voiceless alveolo-palatal affricate
dz\ voiced alveolo-palatal affricate
tK voiceless alveolar lateral affricate
kp voiceless labial-velar plosive
gb voiced labial-velar plosive
Nm labial-velar nasal stop
Consonants (non-pulmonic)
Clicks Implosives Ejectives
O\ Bilabial b_< Bilabial_>For example:
|\ Laminal alveolar ("dental") d_< Alveolar p_> Bilabial
!\ Apical (post-) alveolar ("retroflex") J\_< Palatal t_> Alveolar
=\ Laminal postalveolar ("palatal") g_< Velar k_> Velar
|\|\ Lateral coronal ("lateral") G\_< Uvular s_> Alveolar fricative

Vowels

Front Central Back
Close
Blank vowel trapezoid.svg
i    y
1    }
M    u
I    Y
I\    U\
  U
e    2
@\    8
7    o
e_o    2_o
@
  o_o
E    9
3    3\
V    O
{  
6
a    &
a_"
A    Q
Near‑close
Close‑mid
Mid
Open‑mid
Near‑open
Open

See also

Related Research Articles

<span class="mw-page-title-main">F</span> 6th letter of the Latin alphabet

F, or f, is the sixth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ef, and the plural is efs.

<span class="mw-page-title-main">International Phonetic Alphabet</span> System of phonetic notation

The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation of speech sounds in written form. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech–language pathologists, singers, actors, constructed language creators, and translators.

<span class="mw-page-title-main">N</span> 14th letter of the Latin alphabet

N, or n, is the fourteenth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages, and others worldwide. Its name in English is en, plural ens.

The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable ASCII characters, based on the International Phonetic Alphabet (IPA). It was originally developed in the late 1980s for six European languages by the EEC ESPRIT information technology research and development program. As many symbols as possible have been taken over from the IPA; where this is not possible, other signs that are available are used, e.g. [@] for schwa, [2] for the vowel sound found in French deux 'two', and [9] for the vowel sound found in French neuf 'nine'.

<span class="mw-page-title-main">T</span> 20th letter of the Latin alphabet

T, or t, is the twentieth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is tee, plural tees.

The following show the typical symbols for consonants and vowels used in SAMPA, an ASCII-based system based on the International Phonetic Alphabet. SAMPA is not a universal system as it varies from language to language.

Phonetic transcription is the visual representation of speech sounds by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the International Phonetic Alphabet.

Kirshenbaum, sometimes called ASCII-IPA or erkIPA, is a system used to represent the International Phonetic Alphabet (IPA) in ASCII. This way it allows typewriting IPA-symbols by regular keyboard. It was developed for Usenet, notably the newsgroups sci.lang and alt.usage.english. It is named after Evan Kirshenbaum, who led the collaboration that created it. The eSpeak open source software speech synthesizer uses the Kirshenbaum scheme.

Americanist phonetic notation, also known as the North American Phonetic Alphabet (NAPA), the Americanist Phonetic Alphabet or the American Phonetic Alphabet (APA), is a system of phonetic notation originally developed by European and American anthropologists and language scientists for the phonetic and phonemic transcription of indigenous languages of the Americas and for languages of Europe. It is still commonly used by linguists working on, among others, Slavic, Uralic, Semitic languages and for the languages of the Caucasus, of India, and of much of Africa; however, Uralists commonly use a variant known as the Uralic Phonetic Alphabet.

The Uralic Phonetic Alphabet (UPA) or Finno-Ugric transcription system is a phonetic transcription or notational system used predominantly for the transcription and reconstruction of Uralic languages. It was first published in 1901 by Eemil Nestor Setälä, a Finnish linguist.

<span class="mw-page-title-main">Extensions to the International Phonetic Alphabet</span> Disordered speech additions to the phonetic alphabet

The Extensions to the International Phonetic Alphabet for Disordered Speech, commonly abbreviated extIPA, are a set of letters and diacritics devised by the International Clinical Phonetics and Linguistics Association to augment the International Phonetic Alphabet for the phonetic transcription of disordered speech. Some of the symbols are used for transcribing features of normal speech in IPA transcription, and are accepted as such by the International Phonetic Association.

Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in a number of languages for several different purposes. The most familiar to English-language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such diacritics are also sometimes used for stylistic reasons.

<span class="mw-page-title-main">L</span> 12th letter of the Latin alphabet

L, or l, is the twelfth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is el, plural els.

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista). Latin Extended-F and -G contain characters for phonetic transcription.

Unicode supports several phonetic scripts and notations through its existing scripts and the addition of extra blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the International Phonetic Alphabet (IPA), extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

ARPABET is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character and the other with one or two (case-insensitive), were devised, the latter being far more widely adopted.

IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The Phonetic Symbol Guide is a book by Geoffrey Pullum and William Ladusaw that explains the histories and uses of the symbols of various phonetic transcription conventions. It was published in 1986, with a second edition in 1996, by the University of Chicago Press. Symbols include letters and diacritics of the International Phonetic Alphabet and Americanist phonetic notation, though not of the Uralic Phonetic Alphabet. The Guide was consulted by the International Phonetic Association when they established names and numerical codes for the International Phonetic Alphabet and was the basis for the characters of the TIPA set of phonetic fonts.

The International Phonetic Alphabet (IPA) consists of more than 100 letters and diacritics. Before Unicode became widely available, several ASCII-based encoding systems of the IPA were proposed. The alphabet went through a large revision at the Kiel Convention of 1989, and the vowel symbols again in 1993. Systems devised before these revisions inevitably lack support for the additions they introduced.

References

  1. Wells, J.C. "Computer-coding the IPA: a proposed extension of SAMPA" (PDF). UCL Phonetics and Linguistics. University College London. Retrieved 16 March 2016.
  2. "Language Subtag Registry" (text). IETF. 2022-08-08. Retrieved 12 November 2022.
  3. For a summary of SAMPROSA, see Wells, J.C. (19 September 1995). "SAMPROSA (SAM Prosodic Transcription)". UCL Phonetics and Linguistics. University College London. Retrieved 23 October 2021.