Unicode subscripts and superscripts

Last updated
The difference between superscript/subscript and numerator/denominator glyphs. In many popular fonts the Unicode "superscript" and "subscript" characters are actually numerator and denominator glyphs. Sub super num dem.svg
The difference between superscript/subscript and numerator/denominator glyphs. In many popular fonts the Unicode "superscript" and "subscript" characters are actually numerator and denominator glyphs.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. [1] These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

Contents

The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters:

When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts […] However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription. [2]

Uses

The intended use [2] when these characters were added to Unicode was to produce true superscripts and subscripts so that chemical and algebraic formulas could be written without markup. Thus "H₂O" (using a subscript 2 character) is supposed to be identical to "H2O" (with subscript markup).

In reality, many fonts that include these characters ignore the Unicode definition, and instead design the digits for mathematical numerator and denominator glyphs, [3] [4] which are aligned with the cap line and the baseline, respectively. When used with the solidus, these glyphs are a common substitute for diagonal fractions, such as ³/₄ for the ¾ glyph. This change was made because using markup does not give a good graphic approximation of fractions (compare markup 3/4 with super/sub-script ³/₄). The change also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters. However, it makes them incorrect for normal superscript and subscript, and so chemical and algebraic formulas are better rendered by using markup.

Unicode intended that diagonal fractions be rendered by a different mechanism: the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts), it instructs the layout system that a fraction such as ¾ is to be rendered using automatic glyph substitution. [5] [a] User-end support was quite poor for a number of years, but fonts, browsers, [b] word processors, [c] desktop publishing software [d] and others increasingly support the intended Unicode behavior.

A selection of supporting fonts is displayed in the table below. (These will not display properly if you do not have the fonts installed, or if your browser does not support this behavior.)

Comparison of encodings of simple fractions, superscripts and subscripts
Font
U+00BDVULGAR
FRACTION ONE HALF
U+0031DIGIT ONE
U+2044FRACTION SLASH
U+0032DIGIT TWO
U+00B9SUPERSCRIPT ONE
U+2044FRACTION SLASH
U+2082SUBSCRIPT TWO
Browser default font½1⁄2¹⁄₂
Andika ½1⁄2¹⁄₂
Arno Pro ½1⁄2¹⁄₂
URW Bookman ½1⁄2¹⁄₂
Brill ½1⁄2¹⁄₂
Brioso Pro ½1⁄2¹⁄₂
Calibri ½1⁄2¹⁄₂
Candara ½1⁄2¹⁄₂
Carlito ½1⁄2¹⁄₂
Cantarell ½1⁄2¹⁄₂
FiraGO ½1⁄2¹⁄₂
EB Garamond ½1⁄2¹⁄₂
Gentium Book ½1⁄2¹⁄₂
URW Gothic ½1⁄2¹⁄₂
Lato ½1⁄2¹⁄₂
Linux Libertine ½1⁄2¹⁄₂
Nimbus Roman ½1⁄2¹⁄₂
Nimbus Sans ½1⁄2¹⁄₂
Noto Sans ½1⁄2¹⁄₂
Noto Serif ½1⁄2¹⁄₂
Open Sans ½1⁄2¹⁄₂
Yrsa ½1⁄2¹⁄₂

Superscripts and subscripts block

The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal x to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML markup for the subscript or superscript.

Unicode characters
0123456789ABCDEF
U+00Bx
U+207x x⁰xⁱx⁴x⁵x⁶x⁷x⁸x⁹x⁺x⁻x⁼x⁽x⁾xⁿ
U+208x x₀x₁x₂x₃x₄x₅x₆x₇x₈x₉x₊x₋x₌x₍x₎
U+209x xₐxₑxₒxₓxₔ xₕxₖxₗxₘ xₙxₚxₛxₜ
HTML formatting using <sup> and <sub> tags
0123456789ABCDEF
U+00Bx x2x3x1
U+207x x0xix4x5x6x7x8x9x+xx=x(x)xn
U+208x x0x1x2x3x4x5x6x7x8x9x+xx=x(x)
U+209x xaxexoxxxəxhxkxlxmxnxpxsxt
  Reserved for future use.
  Other characters from Latin-1 not related to super- or sub-scripts.

Other superscript and subscript characters

Unicode version 16.0 also includes subscript and superscript characters that are intended for semantic usage, in the following blocks: [1] [6]

Superscript
Combining superscript
Subscript
Combining subscript

Latin, Greek, Cyrillic, and IPA tables

Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin, Greek and Cyrillic letters. Here they are arranged in alphabetical order for comparison (or for copy and paste convenience). Since these characters appear in different Unicode ranges, they may not appear to be the same size or position due to font substitution in the browser. Shaded cells mark small capitals that are not very distinct from minuscules, and Greek letters that are indistinguishable from Latin, and so would not be expected to be supported by Unicode.

Little punctuation is encoded. Parentheses are shown above in the basic block above, and the exclamation mark is shown in the IPA table below. A question mark may be created with a superscript gelded question mark and a combining dot: ˀ̣, although some fonts do not render it properly.

Latin superscript and subscript letters
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Superscript capitalᴿ
Superscript small cap𐞄𐞒𐞖𐞪𐞲
Superscript minusculeʰʲˡ𐞥ʳˢʷˣʸ
Overscript small cap◌ᷛ◌ᷞ◌ᷟ◌ᷡ◌ᷢ
Overscript minuscule◌ͣ◌ᷨ◌ͨ◌ͩ◌ͤ◌ᷫ◌ᷚ◌ͪ◌ͥ◌ᷜ◌ᷝ◌ͫ◌ᷠ◌ͦ◌ᷮ◌ͬ◌ᷤ◌ͭ◌ͧ◌ͮ◌ᷱ◌ͯ◌ᷦ
Subscript minuscule
Underscript minuscule◌᷊◌ᪿ

Additional superscript capitals are ᴭ ᴯ ᴲ ᴻ. Some of these are small caps in the source documents in the Unicode proposals.
Superscript capital s has been proposed for a future version of the Unicode Standard. [8] [9]
Superscript versons of small capital A and E have been proposed for a future version of the Unicode Standard. [10] [11] [9]

Greek superscript and subscript letters
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
Superscript minuscule [A] ᶿ [A]
Overscript minuscule◌ᷧ◌ᷩ
Subscript minusculeͺ [12]
Underscript minuscule◌ͅ◌̫ [13]
  1. 1 2 In some fonts, Latin alpha ᵅ and upsilon ᶹ can be used as superscript Greek alpha and upsilon. ᵋ and ᶥ are also officially Latin letters, but display the same as Greek.

Superscript versons of Greek psi and omega have been proposed for a future version of the Unicode Standard. [10] [9]

Cyrillic superscript and subscript letters
А Ә Б В Г Ґ Д Е Є Ж З Ѕ И І Ї Ј К Л М Н О Ө П Р С Ҫ
Superscript𞀰𞁋𞀱𞀲𞀳𞀴𞀵𞀶𞀷𞁊𞀸𞁌𞁍𞀹𞀺𞀻𞀼𞁎𞀽𞀾𞀿𞁫
Overscript◌ⷶ◌ⷠ◌ⷡ◌ⷢ◌ⷣ◌ⷷ◌ꙴ◌ⷤ◌ⷥ◌ꙵ𞂏◌ꙶ◌ⷦ◌ⷧ◌ⷨ◌ⷩ◌ⷪ◌ⷫ◌ⷬ◌ⷭ
Subscript𞁑𞁒𞁓𞁔𞁧𞁕𞁖𞁗𞁘𞁩𞁙𞁨𞁚𞁛𞁜𞁝𞁞
Т У Ү Ұ Ф Х Ѡ Ц Ч Џ Ш Щ Ъ Ы Ь Ѣ Э Ю Ѥ Ѧ Ѫ Ѭ Ѳ Ӏ
Superscript𞁀𞁁𞁏𞁭𞁂𞁃𞁄𞁅𞁆𞁬𞁇𞁈𞁉𞁐
Overscript◌ⷮ◌ꙷ◌ⷹ◌ꚞ◌ⷯ◌ꙻ◌ⷰ◌ⷱ◌ⷲ◌ⷳ◌ꙸ◌ꙹ◌ꙺ◌ⷺ◌ⷻ◌ⷼ◌ꚟ◌ⷽ◌ⷾ◌ⷿ◌ⷴ
Subscript𞁟𞁠𞁡𞁢𞁣𞁪𞁤𞁥𞁦

Many of the Cyrillic characters were added to the Cyrillic Extended-D block, which was added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023.

See also small caps in Unicode.

Superscript IPA

The Latin Extended-F block was created for the remaining superscript IPA letters. They are supported by the free Gentium Plus and Andika fonts. Additional superscript characters for historical and para-IPA letters have been proposed for future versions of the Unicode Standard. [11] [9]

Consonant letters

The Unicode characters for superscript (modifier) IPA and extIPA consonant letters are as follows. The entire Latin Extended-F block is dedicated to superscript IPA. Characters for sounds with secondary articulation are set off in parentheses and placed below the base letters.

IPA and extIPA consonants, along with superscript variants and their Unicode code points
Bi­labialLabio­dentalDentalAlveolarPost­alveolarRetro­flexPalatalVelarUvularPharyn­gealGlottal
Nasalm 
1D50
ɱ 
1DAC
n 
207F
()
 
 
(ȵ)
ɳ 
1DAF
ɲ 
1DAE
ŋ 
1D51
ɴ 
1DB0
Plosivep 
1D56
b 
1D47
t 
1D57
(ƫ )
1DB5
d 
1D48
()
 
 
(ȶ)
 
 
(ȡ)
ʈ 𐞯
107AF
ɖ 𐞋
1078B
c 
1D9C
ɟ 
1DA1
k 
1D4F
ɡ /g 
1DA2/1D4D
q 𐞥
107A5
ɢ 𐞒
10792
ʡ 𐞳
107B3
ʔ ˀ
02C0
Affricateʦ 𐞬
107AC
ʣ 𐞇
10787
ʧ 𐞮
107AE
(ʨ𐞫)
107AB
ʤ 𐞊
1078A
(ʥ𐞉)
10789
 𐞭
107AD
(𝼜)
 𐞈
10788
(𝼙)
Fricativeɸ 
1DB2
β 
1D5D
f 
1DA0
v 
1D5B
θ ᶿ
1DBF
ð 
1D9E
s ˢ
02E2
()
z 
1DBB
()
ʃ 
1DB4
(ɕ )
1D9D
ʒ 
1DBE
(ʑ )
1DBD
ʂ 
1DB3
()
ʐ 
1DBC
()
ç ᶜ̧
1D9C + 0327 [e]
ʝ 
1DA8
x ˣ
02E3
(ɧ 𐞗)
10797
ɣ ˠ
02E0
χ 
1D61
ʁ ʶ
02B6
ħ 𐞕
10795
(ʩ 𐞐)
10790
ʕ ˤ
02E4 [f]
h ʰ
02B0
()
ɦ ʱ
02B1
Approximantʋ 
1DB9
ɹ ʴ
02B4
ɻ ʵ
02B5
j ʲ
02B2
(ɥ )
1DA3
 
 
(ʍ )
AB69
ɰ 
1DAD
(w ʷ)
02B7
Tap/flap 𐞰
107B0
ɾ 𐞩
107A9
ɽ 𐞨
107A8
Trillʙ 𐞄
10784
r ʳ
02B3
ʀ 𐞪
107AA
ʜ 𐞖
10796
ʢ 𐞴
107B4
Lateral fricativeɬ 𐞛
1079B
(ʪ 𐞙)
10799
ɮ 𐞞
1079E
(ʫ 𐞚)
1079A
 𐞝
1079D
𝼅 𐞟
1079F
𝼆 𐞡
107A1
𝼄 𐞜
1079C
Lateral approximantl ˡ
02E1
( )
1DAA
 
 
(ȴ)
ɭ 
1DA9
ʎ 𐞠
107A0
ʟ 
1DAB
(ɫ ) [g]
AB5E
Lateral tap/flapɺ 𐞦
107A6
𝼈 𐞧
107A7
Implosiveƥɓ 𐞅
10785
ƭɗ 𐞌
1078C
𝼉 𐞍
1078D
ƈʄ 𐞘
10798
ƙɠ 𐞓
10793
ʠʛ 𐞔
10794
Click releaseʘ 𐞵
107B5
ǀ 𐞶
107B6
ʇǃ 
A71D
ʗ𝼊 𐞹
107B9
ψǂ 𐞸
107B8
𝼋(ʞ)
Lateral click
release
ǁ 𐞷
107B7
ʖ
Percussive¡ 
A71E [h]

The spacing diacritic for ejective consonants, U+2BC, works with superscript letters despite not being superscript itself: ᵖʼ ᵗʼ ᶜʼ ᵏˣʼ. If a distinction needs to be made, the combining apostrophe U+315 may be used: ̕̕̕ ᵏˣ̕. The spacing diacritic should be used for a baseline letter with a superscript release, such as [tˢʼ] or [kˣʼ], where the scope of the apostrophe includes the non-superscript letter, but the combining apostrophe U+315 might be used to indicate a weakly articulated ejective consonant like [ᵗ̕] or [ᵏ̕], where the whole consonant is written as a superscript, or together with U+2BC when separate apostrophes have scope over the base and modifier letters, as in pʼᵏˣ̕. [14]

Spacing diacritics, as in , cannot be secondarily superscripted in plain text: ᵗʲ. (In this instance, the old IPA letter for [tʲ], ƫ, has a superscript variant in Unicode, U+1DB5 , but that is not generally the case.)

Among older letters, (U+A727) was a graphic variant of ɮ. Its superscript is supported at (U+AB5C). The most common letters with palatal hook are also supported; they are displayed in the table above. IPA once had an idiosyncratic curl on some of the palatalized letters: these are the fricative letters ʆ ʓ. Their superscript forms have been proposed for a future version of the Unicode Standard. [11] [9] The retired letters ƞ and ɼ have also been proposed for a future version of the Unicode Standard. [11] [9]

Among para-IPA letters, Sinological superscript ȡ ȴ ȵ ȶ have been proposed for a future version of the Unicode Standard. [10] [9] Superscripts of the Bantuist labio-dental plosives ȹ and ȸ have been proposed for a future version of the Unicode Standard. [10] [9] The central semivowels ɉ, ɥ̶, and have also been proposed for a future version of the Unicode Standard. [10] [9]

Old-style click letters have been proposed for a future version of the Unicode Standard. [15] [9]

Vowel letters

The Unicode characters for superscript (modifier) IPA vowel letters, plus a pair of extended letters ᵻ ᵿ found in English dictionaries, are as follows. Recently retired alternative letters such as ɩ ɷ are also supported; they are set off in parentheses and placed below the standard IPA letters:

IPA vowels and superscript variants
FrontCentralBack
Closei 
2071
y ʸ
02B8
ɨ 
1DA4
ʉ 
1DB6
ɯ 
1D5A
u 
1D58
Near-closeɪ 
1DA6
(ɩ )
1DA5
ʏ 𐞲
107B2




( )
1DA7


(ᵿ)



(ω)

ʊ 
1DB7
(ɷ 𐞤)
107A4
Close-mide 
1D49
ø 𐞢
107A2
ɘ 𐞎
1078E
ɵ 
1DB1
ɤ 𐞑
10791
o 
1D52
Midə 
1D4A
Open-midɛ 
1D4B
œ 
A7F9
ɜ 
1D9F
( )
1D4C
ɞ 𐞏
1078F
ʌ 
1DBA
ɔ 
1D53
Near-openæ 𐞃
10783
ɶ 𐞣
107A3
ɐ 
1D44
ɑ 
1D45
ɒ 
1D9B
Opena 
1D43

The precomposed Unicode rhotic vowel letters ɚ ɝ are not directly supported. The rhotic diacritic U+02DE ◌˞ should be used instead: ᵊ˞ ᶟ˞. [16]

ɜ and are reversed ɛ. The older IPA turned ɛ, , is also supported, at U+1D4C . However, the briefly resurrected vowel letter ʚ (U+029A) is not supported, only its reversed replacement ɞ is.

Among older letters, (U+1D1C), a graphic variant of ʊ, is supported at (U+1DB8).

Among para-IPA letters, Sinological superscript ɿ ʅ ʮ ʯ have been proposed for a future version of the Unicode Standard. [10] [9]

Length marks

The two length marks are also supported:

Length marks
LongHalf-long
ː 𐞁
10781
ˑ 𐞂
10782

These are used to add length to another superscript, such as Cʰ𐞁 or Cʰ𐞂 for long aspiration.

Wildcards

Superscript wildcards (full caps) are largely supported: e.g. ᴺC (prenasalized consonant), ꟲN (prestopped nasal), Pꟳ (fricative release), NᴾF (epenthetic plosive), CVNᵀ (tone-bearing syllable), Cᴸ (liquid or lateral release), Cᴿ (rhotic or resonant release), Vᴳ (off-glide/diphthong), Cⱽ (fleeting vowel). Superscript S for sibilant release has been proposed for a future version of the Unicode Standard; [8] [9] superscript for fleeting/epenthetic click has not. Other basic Latin superscript wildcards for tone and weak indeterminate sounds, as described in the article on the International Phonetic Alphabet, are mostly supported. (See table in previous section.)

Combining marks and subscripts

In addition, a very few IPA letters beyond the basic Latin alphabet have combining forms or are supported as subscripts:

Additional IPA characters
äɑæçðəʃʍʔʼ
Overscript◌ᷲ◌ᷧ◌ᷔ◌ᷗ◌ᷙ◌ᷪ◌ᷯ◌̉ [i] ◌̓
Subscript
Underscript◌ᫀ◌̦

Composite characters

Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols. [1] In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.

Notes

  1. For a general overview and technical information on glyph substitution (though not specifically for fractions), see GSUB — Glyph Substitution Table in the OpenType specification on the Microsoft Typography site.
  2. Such as Chrome, Firefox and Falkon
  3. Such as LibreOffice Writer
  4. Such as Adobe InDesign and Scribus
  5. Superscript ç is composed of superscript c and a combining cedilla, which should display properly in a good font. Superscript c was specifically requested for this purpose in Unicode proposal L2/03-180.
  6. U+02E4ˤMODIFIER LETTER SMALL REVERSED GLOTTAL STOP is the superscript variant of U+0295ʕLATIN LETTER PHARYNGEAL VOICED FRICATIVE and is defined for IPA use. The similar character U+02C1ˁMODIFIER LETTER REVERSED GLOTTAL STOP is a reversed U+02C0ˀMODIFIER LETTER GLOTTAL STOP, perhaps a gelded reversed question mark. Fonts are inconsistent in whether they look different and what the difference is.
  7. In Microsoft fonts, superscript ɫ was erroneously designed as a superscript .
  8. U+A71D and A71E were adopted as the Africanist equivalents of the IPA characters downstep and upstep. The correspondence of U+A71D to the IPA click letter ǃ is thus accidental. Coincidentally, U+A71E serves as the superscript variant of the extIPA percussive consonant ¡; the other percussive letters, ʬ and ʭ, do not have superscript support in Unicode.
  9. This is actually the Vietnamese diacritic dấu hỏi , not specifically IPA, but graphically both are gelded question marks.

Related Research Articles

<span class="mw-page-title-main">D</span> 4th letter of the Latin alphabet

D, or d, is the fourth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is dee, plural dees.

<span class="mw-page-title-main">E</span> 5th letter of the Latin alphabet

E, or e, is the fifth letter and the second vowel letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is e ; plural es, Es, or E's.

<span class="mw-page-title-main">T</span> 20th letter of the Latin alphabet

T, or t, is the twentieth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is tee, plural tees.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks.

<span class="mw-page-title-main">Gentium</span> Serif typeface

Gentium is a Unicode serif typeface designed by Victor Gaultney. Gentium fonts are free and open source software, and are released under the SIL Open Font License (OFL), which permits modification and redistribution. Gentium has wide support for languages using the Latin, Greek, and Cyrillic alphabets, and the International Phonetic Alphabet (IPA). Gentium Plus variants released since November 2010 now include over 5,500 glyphs and advanced typographic features through OpenType and formerly Graphite.

<span class="mw-page-title-main">Bitstream Cyberbit</span> Unicode serif typeface

Bitstream Cyberbit is a commercial serif Unicode font designed by Bitstream Inc. It is freeware for non-commercial uses. It was one of the first widely available fonts to support a large portion of the Unicode repertoire.

Apple Symbols is a font introduced in Mac OS X 10.3 “Panther”. This is a TrueType font intended to provide coverage for characters defined as symbols in the Unicode Standard. It continues to ship with Mac OS X as part of the default installation. Prior to Mac OS X 10.5, its path was /Library/Fonts/Apple Symbols.ttf. From Mac OS X 10.5 onward, it is to be found at /System/Library/Fonts/Apple Symbols.ttf, meaning it is now considered an essential part of the system software, not to be deleted by users.

<span class="mw-page-title-main">Monospace (typeface)</span> Serif typeface

Monospace is a monospaced Unicode font, developed by George Williams. It is based on the typeface Courier. This font contains 2860 glyphs. It includes characters in the following unicode ranges: Basic Latin, Latin-1 Supplement, Latin Extended-A, Latin Extended-B, IPA Extensions, Spacing Modifier Letters, Combining Diacritical Marks, Greek, Cyrillic, Hebrew, Latin Extended Additional, Greek Extended, General Punctuation, Superscripts and Subscripts, Currency Symbols, Combining Diacritical Marks for Symbols, Letterlike Symbols, Number Forms, Arrows, Mathematical Operators, Miscellaneous Technical, Control Pictures, Enclosed Alphanumerics, Box Drawing, Block Elements, Geometric Shapes, Miscellaneous Symbols, Alphabetic Presentation Forms, Halfwidth and Fullwidth Forms.

<span class="mw-page-title-main">L</span> 12th letter of the Latin alphabet

L, or l, is the twelfth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is el, plural els.

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista). Latin Extended-F and -G contain characters for phonetic transcription.

Unicode supports several phonetic scripts and notation systems through its existing scripts and the addition of extra blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the International Phonetic Alphabet (IPA), extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

<span class="mw-page-title-main">Dz (digraph)</span> Digraph of the Latin script

Dz is a digraph of the Latin script, consisting of the consonants D and Z. It may represent, , or, depending on the language.

<span class="mw-page-title-main">Microsoft Sans Serif</span> Neo-grotesque sans-serif typeface

Microsoft Sans Serif is a sans-serif typeface introduced with early Microsoft Windows versions. It is the successor of MS Sans Serif, formerly Helv, a proportional bitmap font introduced in Windows 1.0. Both typefaces are very similar in design to Arial and Helvetica. The typeface was designed to match the MS Sans bitmap included in the early releases of Microsoft Windows.

Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with pre-existing standard character sets, which often included similar or identical characters.

In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older, standards. As the Unicode Glossary says:

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards

<span class="mw-page-title-main">Subscript and superscript</span> A character set slightly below and above the normal line of type, respectively

A subscript or superscript is a character that is set slightly below or above the normal line of type, respectively. It is usually smaller than the rest of the text. Subscripts appear at or below the baseline, while superscripts are above. Subscripts and superscripts are perhaps most often used in formulas, mathematical expressions, and specifications of chemical compounds and isotopes, but have many other uses as well.

In typesetting, the hook or tail is a diacritic mark attached to letters in many alphabets. In shape it looks like a hook and it can be attached below as a descender, on top as an ascender and sometimes to the side. The orientation of the hook can change its meaning: when it is below and curls to the left it can be interpreted as a palatal hook, and when it curls to the right is called hook tail or tail and can be interpreted as a retroflex hook. It should not be mistaken with the hook above, a diacritical mark used in Vietnamese, or the rhotic hook, used in the International Phonetic Alphabet.

Latin Extended-F is a Unicode block containing modifier letters, nearly all IPA and extIPA, for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane (BMP). They were added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023. Some computers have 𐞃, 𐞎 and 𐞥 supported on the font Calibri.

References

  1. 1 2 3 "UCD: UnicodeData.txt". The Unicode Standard. Retrieved 2016-05-14.
  2. 1 2 Martin Dürst, Asmus Freytag (16 May 2007). "Unicode in XML and other Markup Languages". W3C. Retrieved 13 September 2010.
  3. "fraction | Dart Package". Dart packages. 27 December 2021. Retrieved 21 September 2022.
  4. "MathML | General layout elements | Fractions". data2type GmbH (in German). 30 March 2021. Retrieved 13 January 2022.[ dead link ]
  5. Martin Dürst, Asmus Freytag (16 May 2007). "Fraction Slash". W3C. Retrieved 13 September 2010.
  6. "UCD: Scripts.txt". The Unicode Standard. Retrieved 2022-09-21.
  7. Everson, Michael; West, Andrew (2020-10-05). "L2/20-268: Revised proposal to add ten characters for Middle English to the UCS" (PDF).
  8. 1 2 Kirk Miller (2024-01-30). "L2/24-081: Unicode request for modifier capital S" (PDF).
  9. 1 2 3 4 5 6 7 8 9 10 11 12 "Proposed New Characters: Pipeline Table". Unicode Consortium. 2024-09-10. Retrieved 2024-09-21.
  10. 1 2 3 4 5 6 Kirk Miller (2024-06-14). "L2/24-147: Modifier Sinological extensions to the IPA" (PDF).
  11. 1 2 3 4 Kirk Miller (2024-06-06). "L2/24-171: Miscellaneous historical and para-IPA modifier letters" (PDF).
  12. ͺ is set lower than a normal subscript. It is equivalent to underscript ◌ͅ on a space.
  13. ◌̫ is traditionally typeset as an omega.
  14. Kirk Miller & Michael Ashby, L2/20-253R Unicode request for IPA modifier letters (b), non-pulmonic.
  15. Kirk Miller (2024-04-26). "L2/24-052R: Unicode request for modifier pre-Kiel click letters" (PDF).
  16. Kirk Miller & Michael Ashby, L2/20-252R Unicode request for IPA modifier-letters (a), pulmonic
  17. Silva, Eduardo Marín (2017-03-01). "L2/17-066R: Proposal to encode the Marca Registrada sign" (PDF).