This article needs additional citations for verification .(June 2024) |
In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. [1] Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. More or less complementary to scripts are symbols and Unicode control characters.
The unified diacritical characters and unified punctuation characters frequently have the "common" or "inherited" script property. However, the individual scripts often have their own punctuation and diacritics, so that many scripts include not only letters but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and space characters.
Unicode 16.0 defines 168 separate scripts, including 99 modern scripts and 69 ancient or historic scripts. [2] [3] More scripts are in the process for encoding or have been tentatively allocated for encoding in roadmaps. [4]
When multiple languages make use of the same script, there are frequently some differences, particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However, Swedish includes the character å (sometimes called a Swedish O), while English has no such character. Nor does English make use of the diacritic combining ring above for any character. In general, the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems, they are said to use the same Latin script. Thus, the Unicode abstraction of scripts is a basic organizing technique. The differences among different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms.
Writing system is sometimes treated as a synonym for "script". However, it also can be used as the specific concrete writing system supported by a script. For example, the Vietnamese writing system is supported by the Latin script. A writing system may also cover more than one script; for example, the Japanese writing system makes use of the Han, Hiragana and Katakana scripts.
Most writing systems can be broadly divided into several categories: logographic, syllabic, alphabetic (or segmental), abugida, abjad and featural; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term complex system is sometimes used to describe those where the admixture makes classification problematic.
Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text-processing algorithms.
In addition to explicit or specific script properties, Unicode uses three special values: [5]
Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter and modifier letters. Some characters are considered titlecase letters for a few precomposed ligatures such as Dz (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all compatibility characters, and therefore Unicode discourages their use by authors. It is unlikely that new titlecase letters will be added in the future.
Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as "other letter" or "modifier letter". Ideographs such as Unihan ideographs are also categorized as "other letters". A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are neither uppercase nor lowercase.
Scripts can also contain any other general category character such as marks (diacritic and otherwise), numbers (numerals), punctuation, separators (word separators such as spaces), symbols and non-graphical format characters. These are included in a particular script when they are unique to that script. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script (other than the common and inherited scripts) are letters.
As of version 16.0 [update] , Unicode defines 168 scripts (called "Alias" or "Property value alias") based on the ISO 15924 list. In addition, Unicode assigns the name "Common" to ISO 15924's Zyyy
code for undetermined scripts, "Inherited" to ISO 15924's Zinh
code for inherited scripts, and "Unknown" to ISO 15924's Zzzz
code for uncoded scripts. There are script codes defined by ISO 15924 but are not used in Unicode, including Zsym
(Symbols) and Zmth
(Mathematical notation).
ISO 15924 | Script in Unicode [e] | |||||||
---|---|---|---|---|---|---|---|---|
Code | ISO number | ISO formal name | Directionality | Unicode Alias [f] | Version | Characters | Notes | Description |
Adlm | 166 | Adlam | right-to-left script | Adlam | 9.0 | 88 | Ch 19.9 | |
Afak | 439 | Afaka | varies | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Aghb | 239 | Caucasian Albanian | left-to-right | Caucasian Albanian | 7.0 | 53 | Ancient/historic | Ch 8.11 |
Ahom | 338 | Ahom, Tai Ahom | left-to-right | Ahom | 8.0 | 65 | Ancient/historic | Ch 15.16 |
Arab | 160 | Arabic | right-to-left script | Arabic | 1.0 | 1,373 | Ch 9.2 | |
Aran | 161 | Arabic (Nastaliq variant) | mixed | — Typographic variant of Arabic (see § Arab) | ||||
Armi | 124 | Imperial Aramaic | right-to-left script | Imperial Aramaic | 5.2 | 31 | Ancient/historic | Ch 10.4 |
Armn | 230 | Armenian | left-to-right | Armenian | 1.0 | 96 | Ch 7.6 | |
Avst | 134 | Avestan | right-to-left script | Avestan | 5.2 | 61 | Ancient/historic | Ch 10.7 |
Bali | 360 | Balinese | left-to-right | Balinese | 5.0 | 127 | Ch 17.3 | |
Bamu | 435 | Bamum | left-to-right | Bamum | 5.2 | 657 | Ch 19.6 | |
Bass | 259 | Bassa Vah | left-to-right | Bassa Vah | 7.0 | 36 | Ancient/historic | Ch 19.7 |
Batk | 365 | Batak | left-to-right | Batak | 6.0 | 56 | Ch 17.6 | |
Beng | 325 | Bengali (Bangla) | left-to-right | Bengali | 1.0 | 96 | Ch 12.2 | |
Bhks | 334 | Bhaiksuki | left-to-right | Bhaiksuki | 9.0 | 97 | Ancient/historic | Ch 14.3 |
Blis | 550 | Blissymbols | varies | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Bopo | 285 | Bopomofo | left-to-right, right-to-left script | Bopomofo | 1.0 | 77 | Ch 18.3 | |
Brah | 300 | Brahmi | left-to-right | Brahmi | 6.0 | 115 | Ancient/historic | Ch 14.1 |
Brai | 570 | Braille | left-to-right | Braille | 3.0 | 256 | Ch 21.1 | |
Bugi | 367 | Buginese | left-to-right | Buginese | 4.1 | 30 | Ch 17.2 | |
Buhd | 372 | Buhid | left-to-right | Buhid | 3.2 | 20 | Ch 17.1 | |
Cakm | 349 | Chakma | left-to-right | Chakma | 6.1 | 71 | Ch 13.11 | |
Cans | 440 | Unified Canadian Aboriginal Syllabics | left-to-right | Canadian Aboriginal | 3.0 | 726 | Ch 20.2 | |
Cari | 201 | Carian | left-to-right, right-to-left script | Carian | 5.1 | 49 | Ancient/historic | Ch 8.5 |
Cham | 358 | Cham | left-to-right | Cham | 5.1 | 83 | Ch 16.10 | |
Cher | 445 | Cherokee | left-to-right | Cherokee | 3.0 | 172 | Ch 20.1 | |
Chis | 298 | Chisoi | left-to-right | — Not in Unicode, proposal is mature [lower-roman 2] | ||||
Chrs | 109 | Chorasmian | right-to-left script, top-to-bottom | Chorasmian | 13.0 | 28 | Ancient/historic | Ch 10.8 |
Cirt | 291 | Cirth | varies | — Not in Unicode | ||||
Copt | 204 | Coptic | left-to-right | Coptic | 1.0 | 137 | Ancient/historic, disunified from Greek in 4.1 | Ch 7.3 |
Cpmn | 402 | Cypro-Minoan | left-to-right | Cypro Minoan | 14.0 | 99 | Ancient/historic | Ch 8.4 |
Cprt | 403 | Cypriot syllabary | right-to-left script | Cypriot | 4.0 | 55 | Ancient/historic | Ch 8.3 |
Cyrl | 220 | Cyrillic | left-to-right | Cyrillic | 1.0 | 508 | Includes typographic variant Old Church Slavonic (see § Cyrs) | Ch 7.4 |
Cyrs | 221 | Cyrillic (Old Church Slavonic variant) | varies | — Typographic variant of Cyrillic (see § Cyrl); Ancient/historic | ||||
Deva | 315 | Devanagari (Nagari) | left-to-right | Devanagari | 1.0 | 164 | Ch 12.1 | |
Diak | 342 | Dives Akuru | left-to-right | Dives Akuru | 13.0 | 72 | Ancient/historic | Ch 15.15 |
Dogr | 328 | Dogra | left-to-right | Dogra | 11.0 | 60 | Ancient/historic | Ch 15.18 |
Dsrt | 250 | Deseret (Mormon) | left-to-right | Deseret | 3.1 | 80 | Ch 20.4 | |
Dupl | 755 | Duployan shorthand, Duployan stenography | left-to-right | Duployan | 7.0 | 143 | Ch 21.6 | |
Egyd | 070 | Egyptian demotic | mixed | — Not in Unicode | ||||
Egyh | 060 | Egyptian hieratic | mixed | — Not in Unicode | ||||
Egyp | 050 | Egyptian hieroglyphs | right-to-left script, left-to-right | Egyptian Hieroglyphs | 5.2 | 5,105 | Ancient/historic | Ch 11.4 |
Elba | 226 | Elbasan | left-to-right | Elbasan | 7.0 | 40 | Ancient/historic | Ch 8.10 |
Elym | 128 | Elymaic | right-to-left script | Elymaic | 12.0 | 23 | Ancient/historic | Ch 10.9 |
Ethi | 430 | Ethiopic (Geʻez) | left-to-right | Ethiopic | 3.0 | 523 | Ch 19.1 | |
Gara | 164 | Garay | right-to-left | Garay | 16.0 | 69 | ||
Geok | 241 | Khutsuri (Asomtavruli and Nuskhuri) | left-to-right | Georgian | Unicode groups Khutsori, Asomtavruli and Nuskhuri into 'Georgian' (see § Geok). Similarly, Mkhedruli and Mtavruli are 'Georgian' (see § Geor) | Ch 7.7 | ||
Geor | 240 | Georgian (Mkhedruli and Mtavruli) | left-to-right | Georgian | 1.0 | 173 | In Unicode this also includes Nuskhuri (Geok) | Ch 7.7 |
Glag | 225 | Glagolitic | left-to-right | Glagolitic | 4.1 | 134 | Ancient/historic | Ch 7.5 |
Gong | 312 | Gunjala Gondi | left-to-right | Gunjala Gondi | 11.0 | 63 | Ch 13.15 | |
Gonm | 313 | Masaram Gondi | left-to-right | Masaram Gondi | 10.0 | 75 | Ch 13.14 | |
Goth | 206 | Gothic | left-to-right | Gothic | 3.1 | 27 | Ancient/historic | Ch 8.9 |
Gran | 343 | Grantha | left-to-right | Grantha | 7.0 | 85 | Ancient/historic | Ch 15.14 |
Grek | 200 | Greek | left-to-right | Greek | 1.0 | 518 | Directionality sometimes as boustrophedon | Ch 7.2 |
Gujr | 320 | Gujarati | left-to-right | Gujarati | 1.0 | 91 | Ch 12.4 | |
Gukh | 397 | Gurung Khema | left-to-right | Gurung Khema | 16.0 | 58 | ||
Guru | 310 | Gurmukhi | left-to-right | Gurmukhi | 1.0 | 80 | Ch 12.3 | |
Hanb | 503 | Han with Bopomofo (alias for Han + Bopomofo) | mixed | — See § Hani, § Bopo | ||||
Hang | 286 | Hangul (Hangŭl, Hangeul) | left-to-right, vertical right-to-left | Hangul | 1.0 | 11,739 | Hangul syllables relocated in 2.0 | Ch 18.6 |
Hani | 500 | Han (Hanzi, Kanji, Hanja) | top-to-bottom, columns right-to-left (historically) | Han | 1.0 | 99,030 | Ch 18.1 | |
Hano | 371 | Hanunoo (Hanunóo) | left-to-right, bottom-to-top | Hanunoo | 3.2 | 21 | Ch 17.1 | |
Hans | 501 | Han (Simplified variant) | varies | — Subset of Han (Hanzi, Kanji, Hanja) (see § Hani) | ||||
Hant | 502 | Han (Traditional variant) | varies | — Subset of § Hani | ||||
Hatr | 127 | Hatran | right-to-left script | Hatran | 8.0 | 26 | Ancient/historic | Ch 10.12 |
Hebr | 125 | Hebrew | right-to-left script | Hebrew | 1.0 | 134 | Ch 9.1 | |
Hira | 410 | Hiragana | vertical right-to-left, left-to-right | Hiragana | 1.0 | 381 | Ch 18.4 | |
Hluw | 080 | Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) | left-to-right | Anatolian Hieroglyphs | 8.0 | 583 | Ancient/historic | Ch 11.6 |
Hmng | 450 | Pahawh Hmong | left-to-right | Pahawh Hmong | 7.0 | 127 | Ch 16.11 | |
Hmnp | 451 | Nyiakeng Puachue Hmong | left-to-right | Nyiakeng Puachue Hmong | 12.0 | 71 | Ch 16.12 | |
Hrkt | 412 | Japanese syllabaries (alias for Hiragana + Katakana) | vertical right-to-left, left-to-right | Katakana or Hiragana | See § Hira, § Kana | Ch 18.4 | ||
Hung | 176 | Old Hungarian (Hungarian Runic) | right-to-left script | Old Hungarian | 8.0 | 108 | Ancient/historic | Ch 8.8 |
Inds | 610 | Indus (Harappan) | mixed | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Ital | 210 | Old Italic (Etruscan, Oscan, etc.) | right-to-left script, left-to-right | Old Italic | 3.1 | 39 | Ancient/historic | Ch 8.6 |
Jamo | 284 | Jamo (alias for Jamo subset of Hangul) | varies | — Subset of § Hang | ||||
Java | 361 | Javanese | left-to-right | Javanese | 5.2 | 90 | Ch 17.4 | |
Jpan | 413 | Japanese (alias for Han + Hiragana + Katakana) | varies | — See § Hani, § Hira and § Kana | ||||
Jurc | 510 | Jurchen | left-to-right | — Not in Unicode | ||||
Kali | 357 | Kayah Li | left-to-right | Kayah Li | 5.1 | 47 | Ch 16.9 | |
Kana | 411 | Katakana | vertical right-to-left, left-to-right | Katakana | 1.0 | 321 | Ch 18.4 | |
Kawi | 368 | Kawi | left-to-right | Kawi | 15.0 | 87 | Ancient/historic | Ch 17.9 |
Khar | 305 | Kharoshthi | right-to-left script | Kharoshthi | 4.1 | 68 | Ancient/historic | Ch 14.2 |
Khmr | 355 | Khmer | left-to-right | Khmer | 3.0 | 146 | Ch 16.4 | |
Khoj | 322 | Khojki | left-to-right | Khojki | 7.0 | 65 | Ancient/historic | Ch 15.7 |
Kitl | 505 | Khitan large script | left-to-right | — Not in Unicode | ||||
Kits | 288 | Khitan small script | vertical right-to-left | Khitan Small Script | 13.0 | 472 | Ancient/historic | Ch 18.12 |
Knda | 345 | Kannada | left-to-right | Kannada | 1.0 | 91 | Ch 12.8 | |
Kore | 287 | Korean (alias for Hangul + Han) | left-to-right | — See § Hani, § Hang | ||||
Kpel | 436 | Kpelle | left-to-right | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Krai | 396 | Kirat Rai | left-to-right | Kirat Rai | 16.0 | 58 | ||
Kthi | 317 | Kaithi | left-to-right | Kaithi | 5.2 | 68 | Ancient/historic | Ch 15.2 |
Lana | 351 | Tai Tham (Lanna) | left-to-right | Tai Tham | 5.2 | 127 | Ch 16.7 | |
Laoo | 356 | Lao | left-to-right | Lao | 1.0 | 83 | Ch 16.2 | |
Latf | 217 | Latin (Fraktur variant) | varies | — Typographic variant of Latin (see § Latn) | ||||
Latg | 216 | Latin (Gaelic variant) | left-to-right | — Typographic variant of Latin (see § Latn) | ||||
Latn | 215 | Latin | left-to-right | Latin | 1.0 | 1,487 | See also: Latin script in Unicode | Ch 7.1 |
Leke | 364 | Leke | left-to-right | — Not in Unicode | ||||
Lepc | 335 | Lepcha (Róng) | left-to-right | Lepcha | 5.1 | 74 | Ch 13.12 | |
Limb | 336 | Limbu | left-to-right | Limbu | 4.0 | 68 | Ch 13.6 | |
Lina | 400 | Linear A | left-to-right | Linear A | 7.0 | 341 | Ancient/historic | Ch 8.1 |
Linb | 401 | Linear B | left-to-right | Linear B | 4.0 | 211 | Ancient/historic | Ch 8.2 |
Lisu | 399 | Lisu (Fraser) | left-to-right | Lisu | 5.2 | 49 | Ch 18.9 | |
Loma | 437 | Loma | left-to-right | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Lyci | 202 | Lycian | left-to-right | Lycian | 5.1 | 29 | Ancient/historic | Ch 8.5 |
Lydi | 116 | Lydian | right-to-left script | Lydian | 5.1 | 27 | Ancient/historic | Ch 8.5 |
Mahj | 314 | Mahajani | left-to-right | Mahajani | 7.0 | 39 | Ancient/historic | Ch 15.6 |
Maka | 366 | Makasar | left-to-right | Makasar | 11.0 | 25 | Ancient/historic | Ch 17.8 |
Mand | 140 | Mandaic, Mandaean | right-to-left script | Mandaic | 6.0 | 29 | Ch 9.5 | |
Mani | 139 | Manichaean | right-to-left script | Manichaean | 7.0 | 51 | Ancient/historic | Ch 10.5 |
Marc | 332 | Marchen | left-to-right | Marchen | 9.0 | 68 | Ancient/historic | Ch 14.5 |
Maya | 090 | Mayan hieroglyphs | mixed | — Not in Unicode | ||||
Medf | 265 | Medefaidrin (Oberi Okaime, Oberi Ɔkaimɛ) | left-to-right | Medefaidrin | 11.0 | 91 | Ch 19.10 | |
Mend | 438 | Mende Kikakui | right-to-left script | Mende Kikakui | 7.0 | 213 | Ch 19.8 | |
Merc | 101 | Meroitic Cursive | right-to-left script | Meroitic Cursive | 6.1 | 90 | Ancient/historic | Ch 11.5 |
Mero | 100 | Meroitic Hieroglyphs | right-to-left script | Meroitic Hieroglyphs | 6.1 | 32 | Ancient/historic | Ch 11.5 |
Mlym | 347 | Malayalam | left-to-right | Malayalam | 1.0 | 118 | Ch 12.9 | |
Modi | 324 | Modi, Moḍī | left-to-right | Modi | 7.0 | 79 | Ancient/historic | Ch 15.12 |
Mong | 145 | Mongolian | vertical left-to-right, left-to-right | Mongolian | 3.0 | 168 | Mong includes Clear and Manchu scripts | Ch 13.5 |
Moon | 218 | Moon (Moon code, Moon script, Moon type) | mixed | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Mroo | 264 | Mro, Mru | left-to-right | Mro | 7.0 | 43 | Ch 13.8 | |
Mtei | 337 | Meitei Mayek (Meithei, Meetei) | left-to-right | Meetei Mayek | 5.2 | 79 | Ch 13.7 | |
Mult | 323 | Multani | left-to-right | Multani | 8.0 | 38 | Ancient/historic | Ch 15.10 |
Mymr | 350 | Myanmar (Burmese) | left-to-right | Myanmar | 3.0 | 243 | Ch 16.3 | |
Nagm | 295 | Nag Mundari | left-to-right | Nag Mundari | 15.0 | 42 | ||
Nand | 311 | Nandinagari | left-to-right | Nandinagari | 12.0 | 65 | Ancient/historic | Ch 15.13 |
Narb | 106 | Old North Arabian (Ancient North Arabian) | right-to-left script | Old North Arabian | 7.0 | 32 | Ancient/historic | Ch 10.1 |
Nbat | 159 | Nabataean | right-to-left script | Nabataean | 7.0 | 40 | Ancient/historic | Ch 10.10 |
Newa | 333 | Newa, Newar, Newari, Nepāla lipi | left-to-right | Newa | 9.0 | 97 | Ch 13.3 | |
Nkdb | 085 | Naxi Dongba (na²¹ɕi³³ to³³ba²¹, Nakhi Tomba) | left-to-right | — Not in Unicode | ||||
Nkgb | 420 | Naxi Geba (na²¹ɕi³³ gʌ²¹ba²¹, 'Na-'Khi ²Ggŏ-¹baw, Nakhi Geba) | left-to-right | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Nkoo | 165 | N’Ko | right-to-left script | NKo | 5.0 | 62 | Ch 19.4 | |
Nshu | 499 | Nüshu | vertical right-to-left | Nushu | 10.0 | 397 | Ch 18.8 | |
Ogam | 212 | Ogham | bottom-to-top, left-to-right | Ogham | 3.0 | 29 | Ancient/historic | Ch 8.14 |
Olck | 261 | Ol Chiki (Ol Cemet’, Ol, Santali) | left-to-right | Ol Chiki | 5.1 | 48 | Ch 13.10 | |
Onao | 296 | Ol Onal | left-to-right | Ol Onal | 16.0 | 44 | ||
Orkh | 175 | Old Turkic, Orkhon Runic | right-to-left script | Old Turkic | 5.2 | 73 | Ancient/historic | Ch 14.8 |
Orya | 327 | Oriya (Odia) | left-to-right | Oriya | 1.0 | 91 | Ch 12.5 | |
Osge | 219 | Osage | left-to-right | Osage | 9.0 | 72 | Ch 20.3 | |
Osma | 260 | Osmanya | left-to-right | Osmanya | 4.0 | 40 | Ch 19.2 | |
Ougr | 143 | Old Uyghur | mixed | Old Uyghur | 14.0 | 26 | Ancient/historic | Ch 14.11 |
Palm | 126 | Palmyrene | right-to-left script | Palmyrene | 7.0 | 32 | Ancient/historic | Ch 10.11 |
Pauc | 263 | Pau Cin Hau | left-to-right | Pau Cin Hau | 7.0 | 57 | Ch 16.13 | |
Pcun | 015 | Proto-Cuneiform | left-to-right | — Not in Unicode | ||||
Pelm | 016 | Proto-Elamite | left-to-right | — Not in Unicode | ||||
Perm | 227 | Old Permic | left-to-right | Old Permic | 7.0 | 43 | Ancient/historic | Ch 8.13 |
Phag | 331 | Phags-pa | vertical left-to-right | Phags-pa | 5.0 | 56 | Ancient/historic | Ch 14.4 |
Phli | 131 | Inscriptional Pahlavi | right-to-left script | Inscriptional Pahlavi | 5.2 | 27 | Ancient/historic | Ch 10.6 |
Phlp | 132 | Psalter Pahlavi | right-to-left script | Psalter Pahlavi | 7.0 | 29 | Ancient/historic | Ch 10.6 |
Phlv | 133 | Book Pahlavi | mixed | — Not in Unicode | ||||
Phnx | 115 | Phoenician | right-to-left script | Phoenician | 5.0 | 29 | Ancient/historic [g] | Ch 10.3 |
Piqd | 293 | Klingon (KLI pIqaD) | left-to-right | — Rejected for inclusion in Unicode [lower-roman 3] [lower-roman 4] | ||||
Plrd | 282 | Miao (Pollard) | left-to-right | Miao | 6.1 | 149 | Ch 18.10 | |
Prti | 130 | Inscriptional Parthian | right-to-left script | Inscriptional Parthian | 5.2 | 30 | Ancient/historic | Ch 10.6 |
Psin | 103 | Proto-Sinaitic | mixed | — Not in Unicode | ||||
Qaaa-Qabx | 900-949 | Reserved for private use (range) | — Not in Unicode | |||||
Ranj | 303 | Ranjana | left-to-right | — Not in Unicode | ||||
Rjng | 363 | Rejang (Redjang, Kaganga) | left-to-right | Rejang | 5.1 | 37 | Ch 17.5 | |
Rohg | 167 | Hanifi Rohingya | right-to-left script | Hanifi Rohingya | 11.0 | 50 | Ch 16.14 | |
Roro | 620 | Rongorongo | mixed | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Runr | 211 | Runic | left-to-right, boustrophedon | Runic | 3.0 | 86 | Ancient/historic | Ch 8.7 |
Samr | 123 | Samaritan | right-to-left script, top-to-bottom | Samaritan | 5.2 | 61 | Ch 9.4 | |
Sara | 292 | Sarati | mixed | — Not in Unicode | ||||
Sarb | 105 | Old South Arabian | right-to-left script | Old South Arabian | 5.2 | 32 | Ancient/historic | Ch 10.2 |
Saur | 344 | Saurashtra | left-to-right | Saurashtra | 5.1 | 82 | Ch 13.13 | |
Sgnw | 095 | SignWriting | vertical left-to-right | SignWriting | 8.0 | 672 | Ch 21.7 | |
Shaw | 281 | Shavian (Shaw) | left-to-right | Shavian | 4.0 | 48 | Ch 8.15 | |
Shrd | 319 | Sharada, Śāradā | left-to-right | Sharada | 6.1 | 96 | Ch 15.3 | |
Shui | 530 | Shuishu | left-to-right | — Not in Unicode | ||||
Sidd | 302 | Siddham, Siddhaṃ, Siddhamātṛkā | left-to-right | Siddham | 7.0 | 92 | Ancient/historic | Ch 15.5 |
Sidt | 180 | Sidetic | right-to-left | — Not in Unicode, proposal is mature [lower-roman 2] | ||||
Sind | 318 | Khudawadi, Sindhi | left-to-right | Khudawadi | 7.0 | 69 | Ch 15.9 | |
Sinh | 348 | Sinhala | left-to-right | Sinhala | 3.0 | 111 | Ch 13.2 | |
Sogd | 141 | Sogdian | horizontal and vertical writing in East Asian scripts, top-to-bottom | Sogdian | 11.0 | 42 | Ancient/historic | Ch 14.10 |
Sogo | 142 | Old Sogdian | right-to-left script | Old Sogdian | 11.0 | 40 | Ancient/historic | Ch 14.9 |
Sora | 398 | Sora Sompeng | left-to-right | Sora Sompeng | 6.1 | 35 | Ch 15.17 | |
Soyo | 329 | Soyombo | left-to-right | Soyombo | 10.0 | 83 | Ancient/historic | Ch 14.7 |
Sund | 362 | Sundanese | left-to-right | Sundanese | 5.1 | 72 | Ch 17.7 | |
Sunu | 274 | Sunuwar | left-to-right | Sunuwar | 16.0 | 44 | ||
Sylo | 316 | Syloti Nagri | left-to-right | Syloti Nagri | 4.1 | 45 | Ancient/historic | Ch 15.1 |
Syrc | 135 | Syriac | right-to-left script | Syriac | 3.0 | 88 | Includes typographic variants Estrangelo (see § Syre), Western (§ Syrj), and Eastern (§ Syrn) | Ch 9.3 |
Syre | 138 | Syriac (Estrangelo variant) | mixed | — Typographic variant of Syriac (see § Syrc) | ||||
Syrj | 137 | Syriac (Western variant) | mixed | — Typographic variant of Syriac (see § Syrc) | ||||
Syrn | 136 | Syriac (Eastern variant) | mixed | — Typographic variant of Syriac (see § Syrc) | ||||
Tagb | 373 | Tagbanwa | left-to-right | Tagbanwa | 3.2 | 18 | Ch 17.1 | |
Takr | 321 | Takri, Ṭākrī, Ṭāṅkrī | left-to-right | Takri | 6.1 | 68 | Ch 15.4 | |
Tale | 353 | Tai Le | left-to-right | Tai Le | 4.0 | 35 | Ch 16.5 | |
Talu | 354 | New Tai Lue | left-to-right | New Tai Lue | 4.1 | 83 | Ch 16.6 | |
Taml | 346 | Tamil | left-to-right | Tamil | 1.0 | 123 | Ch 12.6 | |
Tang | 520 | Tangut | vertical right-to-left, left-to-right | Tangut | 9.0 | 6,914 | Ancient/historic | Ch 18.11 |
Tavt | 359 | Tai Viet | left-to-right | Tai Viet | 5.2 | 72 | Ch 16.8 | |
Tayo | 380 | Tai Yo | top-to-bottom, columns right-to-left | — Not in Unicode, proposal is mature [lower-roman 2] | ||||
Telu | 340 | Telugu | left-to-right | Telugu | 1.0 | 100 | Ch 12.7 | |
Teng | 290 | Tengwar | left-to-right | — Not in Unicode | ||||
Tfng | 120 | Tifinagh (Berber) | left-to-right, right-to-left script, top-to-bottom, bottom-to-top | Tifinagh | 4.1 | 59 | Ch 19.3 | |
Tglg | 370 | Tagalog (Baybayin, Alibata) | left-to-right | Tagalog | 3.2 | 23 | Ch 17.1 | |
Thaa | 170 | Thaana | right-to-left script | Thaana | 3.0 | 50 | Ch 13.1 | |
Thai | 352 | Thai | left-to-right | Thai | 1.0 | 86 | Ch 16.1 | |
Tibt | 330 | Tibetan | left-to-right | Tibetan | 2.0 | 207 | Added in 1.0, removed in 1.1 and reintroduced in 2.0 | Ch 13.4 |
Tirh | 326 | Tirhuta | left-to-right | Tirhuta | 7.0 | 82 | Ch 15.11 | |
Tnsa | 275 | Tangsa | left-to-right | Tangsa | 14.0 | 89 | Ch 13.18 | |
Todr | 229 | Todhri | right-to-left | Todhri | 16.0 | 52 | ||
Tols | 299 | Tolong Siki | left-to-right | — Not in Unicode, proposal is mature [lower-roman 2] | ||||
Toto | 294 | Toto | left-to-right | Toto | 14.0 | 31 | Ch 13.17 | |
Tutg | 341 | Tulu-Tigalari | left-to-right | Tulu Tigalari | 16.0 | 80 | ||
Ugar | 040 | Ugaritic | left-to-right | Ugaritic | 4.0 | 31 | Ancient/historic | Ch 11.2 |
Vaii | 470 | Vai | left-to-right | Vai | 5.1 | 300 | Ch 19.5 | |
Visp | 280 | Visible Speech | left-to-right | — Not in Unicode | ||||
Vith | 228 | Vithkuqi | left-to-right | Vithkuqi | 14.0 | 70 | Ancient/historic | Ch 8.12 |
Wara | 262 | Warang Citi (Varang Kshiti) | left-to-right | Warang Citi | 7.0 | 84 | Ch 13.9 | |
Wcho | 283 | Wancho | left-to-right | Wancho | 12.0 | 59 | Ch 13.16 | |
Wole | 480 | Woleai | mixed | — Not in Unicode, proposal is explored [lower-roman 1] | ||||
Xpeo | 030 | Old Persian | left-to-right | Old Persian | 4.1 | 50 | Ancient/historic | Ch 11.3 |
Xsux | 020 | Cuneiform, Sumero-Akkadian | left-to-right | Cuneiform | 5.0 | 1,234 | Ancient/historic | Ch 11.1 |
Yezi | 192 | Yezidi | right-to-left script | Yezidi | 13.0 | 47 | Ancient/historic | Ch 9.6 |
Yiii | 460 | Yi | left-to-right | Yi | 3.0 | 1,220 | Ch 18.7 | |
Zanb | 339 | Zanabazar Square (Zanabazarin Dörböljin Useg, Xewtee Dörböljin Bicig, Horizontal Square Script) | left-to-right | Zanabazar Square | 10.0 | 72 | Ancient/historic | Ch 14.6 |
Zinh | 994 | Code for inherited script | Inherited | 657 | ||||
Zmth | 995 | Mathematical notation | — Not a 'script' in Unicode | |||||
Zsym | 996 | Symbols | — Not a 'script' in Unicode | |||||
Zsye | 993 | Symbols (emoji variant) | — Not a 'script' in Unicode | |||||
Zxxx | 997 | Code for unwritten documents | — Not a 'script' in Unicode | |||||
Zyyy | 998 | Code for undetermined script | Common | 9,053 | ||||
Zzzz | 999 | Code for uncoded script | Unknown | 959,049 | In Unicode: All other code points | |||
Notes
| ||||||||
References
|
With each new version of Unicode, new writing systems are added to the international character code. According to a statement by linguist Dr Deborah Anderson of UC Berkeley, there are over 100 writing systems that have not yet been included in Unicode.
According to a list of the project Missing Scripts by the University of Applied Sciences Mainz, Germany, the ANRT Nancy, France and UC Berkeley, USA, there are 294 known writing systems of mankind according to the current state of research (January 2022). 131 of them have not yet been encoded in Unicode, i.e. cannot yet be used on a computer or mobile phone.
Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 of the standard defines 154998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts.
The Coptic script is the script used for writing the Coptic language, the most recent development of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.
The double acute accent is a diacritic mark of the Latin and Cyrillic scripts. It is used primarily in Hungarian or Chuvash, and consequently it is sometimes referred to by typographers as hungarumlaut. The signs formed with a regular umlaut are letters in their own right in the Hungarian alphabet—for instance, they are separate letters for the purpose of collation. Letters with the double acute, however, are considered variants of their equivalents with the umlaut, being thought of as having both an umlaut and an acute accent.
Letter case is the distinction between the letters that are in larger uppercase or capitals and smaller lowercase in the written representation of certain languages. The writing systems that distinguish between the upper- and lowercase have two parallel sets of letters: each in the majuscule set has a counterpart in the minuscule set. Some counterpart letters have the same shape, and differ only in size, but for others the shapes are different. The two case variants are alternative representations of the same letter: they have the same name and pronunciation and are typically treated identically when sorting in alphabetical order.
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as well as consonants. In Archaic and early Classical times, the Greek alphabet existed in many local variants, but, by the end of the 4th century BC, the Euclidean alphabet, with 24 letters, ordered from alpha to omega, had become standard and it is this version that is still used for Greek writing today.
ISO 15924, Codes for the representation of names of scripts, is an international standard defining codes for writing systems or scripts. Each script is given both a four-letter code and a numeric code.
Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.
Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems.
T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics. The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.
Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in several languages for several different purposes. The most familiar to English-language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such diacritics are also sometimes used for stylistic reasons.
Unicode supports several phonetic scripts and notation systems through its existing scripts and the addition of extra blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the International Phonetic Alphabet (IPA), extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Greek alphabet was altered by the Etruscans, and subsequently their alphabet was altered by the Ancient Romans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.
An IETF BCP 47 language tag is a standardized code that is used to identify human languages on the Internet. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.
The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.
The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.
The Unicode Standard assigns various properties to each Unicode character and code point.
The Vietnamese language is written with a Latin script with diacritics which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third-party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI and VIQR. VNI input method is not to be confused with VNI code page.