Latin Extended-B

Last updated
Latin Extended-B
RangeU+0180..U+024F
(208 code points)
Plane BMP
Scripts Latin
Major alphabets Africa alphabet
Americanist
Azerbaijani
Khoisan
Pan-Nigerian
Pinyin
Romanian
Assigned208 code points
Unused0 reserved code points
Unicode version history
1.0.0 (1991)113 (+113)
1.1 (1993)148 (+35)
3.0 (1999)178 (+30)
3.2 (2002)179 (+1)
4.0 (2003)183 (+4)
4.1 (2005)194 (+11)
5.0 (2006)208 (+14)
Unicode documentation
Code chart ∣ Web page
Note: Block range was extended by 80 code points in Unicode 1.1 during the unification with ISO 10646. [1] [2]

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin. [3]

Contents

Character table

Code Glyph Decimal Description
Non-European and historic Latin
U+0180
ƀ
ƀLatin Small Letter B with Stroke
U+0181
Ɓ
ƁLatin Capital Letter B with Hook
U+0182
Ƃ
ƂLatin Capital Letter B with Top Bar
U+0183
ƃ
ƃLatin Small Letter B with Top Bar
U+0184
Ƅ
ƄLatin Capital Letter Tone Six
U+0185
ƅ
ƅLatin Small Letter Tone Six
U+0186
Ɔ
ƆLatin Capital Letter Open O
U+0187
Ƈ
ƇLatin Capital Letter C with Hook
U+0188
ƈ
ƈLatin Small Letter C with Hook
U+0189
Ɖ
ƉLatin Capital Letter African D
U+018A
Ɗ
ƊLatin Capital Letter D with Hook
U+018B
Ƌ
ƋLatin Capital Letter D with Top Bar
U+018C
ƌ
ƌLatin Small Letter D with Top Bar
U+018D
ƍ
ƍLatin Small Letter Turned Delta
U+018E
Ǝ
ƎLatin Capital Letter Reversed E
U+018F
Ə
ƏLatin Capital Letter Schwa
U+0190
Ɛ
ƐLatin Capital Letter Open E (= Latin Capital Letter Epsilon)
U+0191
Ƒ
ƑLatin Capital Letter F with Hook
U+0192
ƒ
ƒLatin Small Letter F with Hook
U+0193
Ɠ
ƓLatin Capital Letter G with Hook
U+0194
Ɣ
ƔLatin Capital Letter Gamma
U+0195
ƕ
ƕLatin Small Letter HV
U+0196
Ɩ
ƖLatin Capital Letter Iota
U+0197
Ɨ
ƗLatin Capital Letter I with Stroke
U+0198
Ƙ
ƘLatin Capital Letter K with Hook
U+0199
ƙ
ƙLatin Small Letter K with Hook
U+019A
ƚ
ƚLatin Small Letter L with Bar
U+019B
ƛ
ƛLatin Small Letter Lambda with Stroke
U+019C
Ɯ
ƜLatin Capital Letter Turned M
U+019D
Ɲ
ƝLatin Capital Letter N with Left Hook
U+019E
ƞ
ƞLatin Small Letter N with Long Right Leg
U+019F
Ɵ
ƟLatin Capital Letter O with Middle Tilde
U+01A0
Ơ
ƠLatin Capital Letter O with Horn
U+01A1
ơ
ơLatin Small Letter O with Horn
U+01A2
Ƣ
ƢLatin Capital Letter OI (= Latin Capital Letter Gha)
U+01A3
ƣ
ƣLatin Small Letter OI (= Latin Small Letter Gha)
U+01A4
Ƥ
ƤLatin Capital Letter P with Hook
U+01A5
ƥ
ƥLatin Small Letter P with Hook
U+01A6
Ʀ
ƦLatin Letter YR
U+01A7
Ƨ
ƧLatin Capital Letter Tone Two
U+01A8
ƨ
ƨLatin Small Letter Tone Two
U+01A9
Ʃ
ƩLatin Capital Letter Esh
U+01AA
ƪ
ƪLatin Letter Reversed Esh Loop
U+01AB
ƫ
ƫLatin Small Letter T with Palatal Hook
U+01AC
Ƭ
ƬLatin Capital Letter T with Hook
U+01AD
ƭ
ƭLatin Small Letter T with Hook
U+01AE
Ʈ
ƮLatin Capital Letter T with Retroflex Hook
U+01AF
Ư
ƯLatin Capital Letter U with Horn
U+01B0
ư
ưLatin Small Letter U with Horn
U+01B1
Ʊ
ƱLatin Capital Letter Upsilon
U+01B2
Ʋ
ƲLatin Capital Letter V with Hook
U+01B3
Ƴ
ƳLatin Capital Letter Y with Hook
U+01B4
ƴ
ƴLatin Small Letter Y with Hook
U+01B5
Ƶ
ƵLatin Capital Letter Z with Stroke
U+01B6
ƶ
ƶLatin Small Letter Z with Stroke
U+01B7
Ʒ
ƷLatin Capital Letter Ezh
U+01B8
Ƹ
ƸLatin Capital Letter Ezh Reversed
U+01B9
ƹ
ƹLatin Small Letter Ezh Reversed
U+01BA
ƺ
ƺLatin Small Letter Ezh with Tail
U+01BB
ƻ
ƻLatin Letter Two with Stroke
U+01BC
Ƽ
ƼLatin Capital Letter Tone Five
U+01BD
ƽ
ƽLatin Small Letter Tone Five
U+01BE
ƾ
ƾLatin Letter Inverted Glottal Stop with Stroke
U+01BF
ƿ
ƿLatin Letter Wynn
African letters for clicks
U+01C0
ǀ
ǀLatin Letter Dental Click
U+01C1
ǁ
ǁLatin Letter Lateral Click
U+01C2
ǂ
ǂLatin Letter Alveolar Click
U+01C3
ǃ
ǃLatin Letter Retroflex Click
Croatian digraphs matching Serbian Cyrillic letters
U+01C4
DŽ
DŽLatin Capital Letter DZ with Caron
U+01C5
Dž
DžLatin Capital Letter D with Small Letter Z with Caron
U+01C6
dž
džLatin Small Letter DZ with Caron
U+01C7
LJ
LJLatin Capital Letter LJ
U+01C8
Lj
LjLatin Capital Letter L with Small Letter J
U+01C9
lj
ljLatin Small Letter LJ
U+01CA
NJ
NJLatin Capital Letter NJ
U+01CB
Nj
NjLatin Capital Letter N with Small Letter J
U+01CC
nj
njLatin Small Letter NJ
Pinyin diacritic-vowel combinations
U+01CD
Ǎ
ǍLatin Capital Letter A with Caron
U+01CE
ǎ
ǎLatin Small Letter A with Caron
U+01CF
Ǐ
ǏLatin Capital Letter I with Caron
U+01D0
ǐ
ǐLatin Small Letter I with Caron
U+01D1
Ǒ
ǑLatin Capital Letter O with Caron
U+01D2
ǒ
ǒLatin Small Letter O with Caron
U+01D3
Ǔ
ǓLatin Capital Letter U with Caron
U+01D4
ǔ
ǔLatin Small Letter U with Caron
U+01D5
Ǖ
ǕLatin Capital Letter U with Diaeresis and Macron
U+01D6
ǖ
ǖLatin Small Letter U with Diaeresis and Macron
U+01D7
Ǘ
ǗLatin Capital Letter U with Diaeresis and Acute
U+01D8
ǘ
ǘLatin Small Letter U with Diaeresis and Acute
U+01D9
Ǚ
ǙLatin Capital Letter U with Diaeresis and Caron
U+01DA
ǚ
ǚLatin Small Letter U with Diaeresis and Caron
U+01DB
Ǜ
ǛLatin Capital Letter U with Diaeresis and Grave
U+01DC
ǜ
ǜLatin Small Letter U with Diaeresis and Grave
Phonetic and historic letters
U+01DD
ǝ
ǝLatin Small Letter Turned E
U+01DE
Ǟ
ǞLatin Capital Letter A with Diaeresis and Macron
U+01DF
ǟ
ǟLatin Small Letter A with Diaeresis and Macron
U+01E0
Ǡ
ǠLatin Capital Letter A with Dot Above and Macron
U+01E1
ǡ
ǡLatin Small Letter A with Dot Above and Macron
U+01E2
Ǣ
ǢLatin Capital Letter AE with Macron
U+01E3
ǣ
ǣLatin Small Letter AE with Macron
U+01E4
Ǥ
ǤLatin Capital Letter G with Stroke
U+01E5
ǥ
ǥLatin Small Letter G with Stroke
U+01E6
Ǧ
ǦLatin Capital Letter G with Caron
U+01E7
ǧ
ǧLatin Small Letter G with Caron
U+01E8
Ǩ
ǨLatin Capital Letter K with Caron
U+01E9
ǩ
ǩLatin Small Letter K with Caron
U+01EA
Ǫ
ǪLatin Capital Letter O with Ogonek
U+01EB
ǫ
ǫLatin Small Letter O with Ogonek
U+01EC
Ǭ
ǬLatin Capital Letter O with Ogonek and Macron (=Latin Capital Letter O with Macron and Ogonek)
U+01ED
ǭ
ǭLatin Small Letter O with Ogonek and Macron (=Latin Small Letter O with Macron and Ogonek)
U+01EE
Ǯ
ǮLatin Capital Letter Ezh with Caron
U+01EF
ǯ
ǯLatin Small Letter Ezh with Caron
U+01F0
ǰ
ǰLatin Small Letter J with Caron
U+01F1
DZ
DZLatin Capital Letter DZ
U+01F2
Dz
DzLatin Capital Letter D with Small Letter Z
U+01F3
dz
dzLatin Small Letter DZ
U+01F4
Ǵ
ǴLatin Capital Letter G with Acute
U+01F5
ǵ
ǵLatin Small Letter G with Acute
U+01F6
Ƕ
ǶLatin Capital Letter Hwair
U+01F7
Ƿ
ǷLatin Capital Letter Wynn
U+01F8
Ǹ
ǸLatin Capital Letter N with Grave
U+01F9
ǹ
ǹLatin Small Letter N with Grave
U+01FA
Ǻ
ǺLatin Capital Letter A with Ring Above and Acute
U+01FB
ǻ
ǻLatin Small Letter A with Ring Above and Acute
U+01FC
Ǽ
ǼLatin Capital Letter AE with Acute
U+01FD
ǽ
ǽLatin Small Letter AE with Acute
U+01FE
Ǿ
ǾLatin Capital Letter O with Stroke and Acute
U+01FF
ǿ
ǿLatin Small Letter O with Stroke and Acute
Additions for Slovenian and Croatian
U+0200
Ȁ
ȀLatin Capital Letter A with Double Grave
U+0201
ȁ
ȁLatin Small Letter A with Double Grave
U+0202
Ȃ
ȂLatin Capital Letter A with Inverted Breve
U+0203
ȃ
ȃLatin Small Letter A with Inverted Breve
U+0204
Ȅ
ȄLatin Capital Letter E with Double Grave
U+0205
ȅ
ȅLatin Small Letter E with Double Grave
U+0206
Ȇ
ȆLatin Capital Letter E with Inverted Breve
U+0207
ȇ
ȇLatin Small Letter E with Inverted Breve
U+0208
Ȉ
ȈLatin Capital Letter I with Double Grave
U+0209
ȉ
ȉLatin Small Letter I with Double Grave
U+020A
Ȋ
ȊLatin Capital Letter I with Inverted Breve
U+020B
ȋ
ȋLatin Small Letter I with Inverted Breve
U+020C
Ȍ
ȌLatin Capital Letter O with Double Grave
U+020D
ȍ
ȍLatin Small Letter O with Double Grave
U+020E
Ȏ
ȎLatin Capital Letter O with Inverted Breve
U+020F
ȏ
ȏLatin Small Letter O with Inverted Breve
U+0210
Ȑ
ȐLatin Capital Letter R with Double Grave
U+0211
ȑ
ȑLatin Small Letter R with Double Grave
U+0212
Ȓ
ȒLatin Capital Letter R with Inverted Breve
U+0213
ȓ
ȓLatin Small Letter R with Inverted Breve
U+0214
Ȕ
ȔLatin Capital Letter U with Double Grave
U+0215
ȕ
ȕLatin Small Letter U with Double Grave
U+0216
Ȗ
ȖLatin Capital Letter U with Inverted Breve
U+0217
ȗ
ȗLatin Small Letter U with Inverted Breve
Additions for Romanian
U+0218
Ș
ȘLatin Capital Letter S with Comma Below
U+0219
ș
șLatin Small Letter S with Comma Below
U+021A
Ț
ȚLatin Capital Letter T with Comma Below
U+021B
ț
țLatin Small Letter T with Comma Below
Miscellaneous additions
U+021C
Ȝ
ȜLatin Capital Letter Yogh
U+021D
ȝ
ȝLatin Small Letter Yogh
U+021E
Ȟ
ȞLatin Capital Letter H with Caron
U+021F
ȟ
ȟLatin Small Letter H with Caron
U+0220
Ƞ
ȠLatin Capital Letter N with Long Right Leg
U+0221
ȡ
ȡLatin Small Letter D with Curl
U+0222
Ȣ
ȢLatin Capital Letter OU
U+0223
ȣ
ȣLatin Small Letter OU
U+0224
Ȥ
ȤLatin Capital Letter Z with Hook
U+0225
ȥ
ȥLatin Small Letter Z with Hook
U+0226
Ȧ
ȦLatin Capital Letter A with Dot Above
U+0227
ȧ
ȧLatin Small Letter A with Dot Above
U+0228
Ȩ
ȨLatin Capital Letter E with Cedilla
U+0229
ȩ
ȩLatin Small Letter E with Cedilla
Additions for Livonian
U+022A
Ȫ
ȪLatin Capital Letter O with Diaeresis and Macron
U+022B
ȫ
ȫLatin Small Letter O with Diaeresis and Macron
U+022C
Ȭ
ȬLatin Capital Letter O with Tilde and Macron
U+022D
ȭ
ȭLatin Small Letter O with Tilde and Macron
U+022E
Ȯ
ȮLatin Capital Letter O with Dot Above
U+022F
ȯ
ȯLatin Small Letter O with Dot Above
U+0230
Ȱ
ȰLatin Capital Letter O with Dot Above and Macron
U+0231
ȱ
ȱLatin Small Letter O with Dot Above and Macron
U+0232
Ȳ
ȲLatin Capital Letter Y with Macron
U+0233
ȳ
ȳLatin Small Letter Y with Macron
Additions for Sinology
U+0234
ȴ
ȴLatin Small Letter L with Curl
U+0235
ȵ
ȵLatin Small Letter N with Curl
U+0236
ȶ
ȶLatin Small Letter T with Curl
Miscellaneous addition
U+0237
ȷ
ȷLatin Small Letter Dotless J
Additions for Africanist linguistics
U+0238
ȸ
ȸLatin Small Letter DB Digraph
U+0239
ȹ
ȹLatin Small Letter QP Digraph
Additions for Sencoten
U+023A
Ⱥ
ȺLatin Capital Letter A with Stroke
U+023B
Ȼ
ȻLatin Capital Letter C with Stroke
U+023C
ȼ
ȼLatin Small Letter C with Stroke
U+023D
Ƚ
ȽLatin Capital Letter L with Bar
U+023E
Ⱦ
ȾLatin Capital Letter T with Diagonal Stroke
Additions for Africanist linguistics
U+023F
ȿ
ȿLatin Small Letter S with Swash Tail
U+0240
ɀ
ɀLatin Small Letter Z with Swash Tail
Miscellaneous additions
U+0241
Ɂ
ɁLatin Capital Letter Glottal Stop
U+0242
ɂ
ɂLatin Small Letter Glottal Stop
U+0243
Ƀ
ɃLatin Capital Letter B with Stroke
U+0244
Ʉ
ɄLatin Capital Letter U Bar
U+0245
Ʌ
ɅLatin Capital Letter Turned V
U+0246
Ɇ
ɆLatin Capital Letter E with Stroke
U+0247
ɇ
ɇLatin Small Letter E with Stroke
U+0248
Ɉ
ɈLatin Capital Letter J with Stroke
U+0249
ɉ
ɉLatin Small Letter J with Stroke
U+024A
Ɋ
ɊLatin Capital Letter Q with Hook Tail
U+024B
ɋ
ɋLatin Small Letter Q with Hook Tail
U+024C
Ɍ
ɌLatin Capital Letter R with Stroke
U+024D
ɍ
ɍLatin Small Letter R with Stroke
U+024E
Ɏ
ɎLatin Capital Letter Y with Stroke
U+024F
ɏ
ɏLatin Small Letter Y with Stroke

Subheadings

The Latin Extended-B block contains ten subheadings for groups of characters: Non-European and historic Latin, African letters for clicks, Croatian digraphs matching Serbian Cyrillic letters, Pinyin diacritic-vowel combinations, Phonetic and historic letters, Additions for Slovenian and Croatian, Additions for Romanian, Miscellaneous additions, Additions for Livonian, and Additions for Sinology. The Non-European and historic, African clicks, Croatian digraphs, Pinyin, and the first part of the Phonetic and historic letters were present in Unicode 1.0; additional Phonetic and historic letters were added for version 3.0; and other Phonetic and historic, as well as the rest of the sub-blocks were the characters added for version 1.1.

Non-European and historic Latin

The Non-European and historic Latin subheading contains the first 64 characters of the block, and includes various variant letters for use in Zhuang, Americanist phonetic transcription, African languages, and other Latin script alphabets. It does not contain any standard letters with diacritics.

African letters for clicks

The four African letters for clicks are used in Khoisan orthography.

Croatian digraphs matching Serbian Cyrillic letters

The Croatian digraphs matching Serbian Cyrillic letters are three sets of three case mappings (lower case, upper case, and title case) of Latin digraphs used for compatibility with Cyrillic texts, Serbo-Croatian being a digraphic language.

Pinyin diacritic-vowel combinations

The 16 Pinyin diacritic-vowel combinations are used to represent the standard Mandarin Chinese vowel sounds with tone marks.

Phonetic and historic letters

The 35 Phonetic and historic letters are largely various standard and variant Latin letters with diacritic marks.

Additions for Slovenian and Croatian

The 24 Additions for Slovenian and Croatian are all standard Latin letters with unusual diacritics, like the double grave and inverted breve.

Additions for Romanian

The Additions for Romanian are 4 characters that were erroneously unified as having a cedilla, when they have a comma below. The conflation of S and T with cedilla vs. comma below continues to plague Romanian language implementation up to the present. [4]

Miscellaneous additions

The Miscellaneous additions subheading contains 39 characters of various description and origin.

Additions for Livonian

The Additions for Livonian are 10 letters with diacritics for writing the Livonian language.

Additions for Sinology

The Additions for Sinology are three lowercase letters with curls used in the study of classical Chinese language.

Additions for Africanist linguistics

The Additions for Africanist linguistics are two lowercase letter with swash tails used in Africanist linguistics.

Additions for Sencoten

The Additions for Sencoten are 5 letters with strokes for writing Saanich.

Number of letters

The following table shows the number of letters in the Latin Extended-B block.

Type of subheadingNumber of symbolsRange of characters
Non-European and historic Latin64 various letters for use in Zhuang, Americanist phonetic transcription, African languages, and other Latin script alphabets.U+0180 to U+01BF
African letters for clicksFour African letters for clicks are used in Khoisan orthography.U+01C0 to U+01C3
Croatian digraphs matching Serbian Cyrillic lettersThree sets of three case mappings (lower case, upper case, and title case) of Latin digraphs used for compatibility with Cyrillic texts.U+01C4 to U+01CC
Pinyin diacritic-vowel combinationsSixteen diacritic-vowel combinations which are used to represent the standard Mandarin Chinese vowel sounds with tone marks.U+01CD to U+01DC
Phonetic and historic letters35 Phonetic and historic letters which are largely various standard and variant Latin letters with diacritic marks.U+01DD to U+01FF
Additions for Slovenian and Croatian24 Additions for Slovenian and Croatian are all standard Latin letters with unusual diacritics, like the double grave and inverted breve.U+0200 to U+0217
Additions for Romanian4 characters that were erroneously unified as having a cedilla, when they have a comma below.U+0218 to U+021B
Miscellaneous additions14 characters of various description and origin.U+021C to U+0229
Additions for Livonian10 letters with diacritics for writing the Livonian language.U+022A to U+0233
Additions for SinologyThree lowercase letters with curls used in the study of classical Chinese language.U+0234 to U+0236

Compact table

Latin Extended-B [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+018x ƀ Ɓ Ƃ ƃ Ƅ ƅ Ɔ Ƈ ƈ Ɖ Ɗ Ƌ ƌ ƍ Ǝ Ə
U+019x Ɛ Ƒ ƒ Ɠ Ɣ ƕ Ɩ Ɨ Ƙ ƙ ƚ ƛ Ɯ Ɲ ƞ Ɵ
U+01Ax Ơ ơ Ƣ ƣ Ƥ ƥ Ʀ Ƨ ƨ Ʃ ƪ ƫ Ƭ ƭ Ʈ Ư
U+01Bx ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ ƹ ƺ ƻ Ƽ ƽ ƾ ƿ
U+01Cx ǀ ǁ ǂ ǃ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ
U+01Dx ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ
U+01Ex Ǡ ǡ Ǣ ǣ Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ
U+01Fx ǰ DZ Dz dz Ǵ ǵ Ƕ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ
U+020x Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ
U+021x Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ
U+022x Ƞ ȡ Ȣ ȣ Ȥ ȥ Ȧ ȧ Ȩ ȩ Ȫ ȫ Ȭ ȭ Ȯ ȯ
U+023x Ȱ ȱ Ȳ ȳ ȴ ȵ ȶ ȷ ȸ ȹ Ⱥ Ȼ ȼ Ƚ Ⱦ ȿ
U+024x ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ
Notes
1. ^ As of Unicode version 16.0

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Latin Extended-B block:

See also

Related Research Articles

<span class="mw-page-title-main">Diacritic</span> Modifier mark added to a letter

A diacritic is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨ó⟩, grave ⟨ò⟩, and circumflex ⟨ô⟩, are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.

The ogonek is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American languages. It is also placed on the lower right corner of consonants in some Latin transcriptions of various indigenous languages of the Caucasus mountains.

A cedilla, or cedille, is a hook or tail added under certain letters as a diacritical mark to modify their pronunciation. In Catalan, French, and Portuguese it is used only under the letter c, and the entire letter is called, respectively, c trencada, c cédille, and c cedilhado. It is used to mark vowel nasalization in many languages of Sub-Saharan Africa, including Vute from Cameroon.

A caron is a diacritic mark commonly placed over certain letters in the orthography of some languages to indicate a change of the related letter's pronunciation.

<span class="mw-page-title-main">Polish alphabet</span> Script of the Polish language

The Polish alphabet is the script of the Polish language, the basis for the Polish system of orthography. It is based on the Latin alphabet but includes certain letters (9) with diacritics: the acute accent ; the overdot ; the tail or ogonek ; and the stroke. ⟨q⟩, ⟨v⟩, and ⟨x⟩, which are used only in foreign words, are usually absent from the Polish alphabet. Additionally, before the standardization of Polish spelling, ⟨qu⟩ was sometimes used in place of ⟨kw⟩, and ⟨x⟩ in place of ⟨ks⟩.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.

<span class="mw-page-title-main">Digraph (orthography)</span> Pair of characters used to write one phoneme

A digraph or digram is a pair of characters used in the orthography of a language to write either a single phoneme, or a sequence of phonemes that does not correspond to the normal values of the two characters combined.

<span class="mw-page-title-main">Romanian alphabet</span> Variant of the Latin alphabet

The Romanian alphabet is a variant of the Latin alphabet used for writing the Romanian language. It is a modification of the classical Latin alphabet and consists of 31 letters, five of which have been modified from their Latin originals for the phonetic requirements of the language.

<span class="mw-page-title-main">Ț</span> Latin letter T with comma

T-comma is a letter which consists of a t with a diacritical comma underneath it, and is distinct from t-cedilla. It is part of the Romanian alphabet, used to represent the Romanian language sound, the voiceless alveolar affricate. The letter is also a part of the Finno-Ugric Livonian language alphabet, representing the sound.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in several languages for several different purposes. The most familiar to English-language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such diacritics are also sometimes used for stylistic reasons.

<span class="mw-page-title-main">C</span> 3rd letter of the Latin alphabet

C, or c, is the third letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is cee, plural cees.

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista). Latin Extended-F and -G contain characters for phonetic transcription.

Unicode supports several phonetic scripts and notation systems through its existing scripts and the addition of extra blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the International Phonetic Alphabet (IPA), extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

<span class="mw-page-title-main">Latin script</span> Writing system based on the alphabet used by the Romans

The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Greek alphabet was altered by the Etruscans, and subsequently their alphabet was altered by the Ancient Romans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.

Yi Syllables is a Unicode block containing the 1,165 characters of the Liangshan Standard Yi script for writing the Nuosu language.

<span class="mw-page-title-main">D-comma</span> Letter of the Latin alphabet

D-comma is a letter that was part of the Romanian alphabet to represent the sound or if it was derived from a Latin d. It was the equivalent of the Cyrillic letters З and Ѕ.

IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.
  4. Kaplan, Michael. "The history of messing up Romanian on computers". Sorting it all out.