Script (Unicode)

Last updated
Egyptian-A.PNG Manichaean letter beth.svg 1bc5c.png
Palmyrene letter aleph.svg U+A6AA.svg
Tai Viet letter High Kho.svg Grantha Aa.png Soyombo sa.svg ழ்
Manichaean letter beth.svg عשДA

In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. [1] Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. More or less complementary to scripts are symbols and Unicode control characters.

Contents

The unified diacritical characters and unified punctuation characters frequently have the "common" or "inherited" script property. However, the individual scripts often have their own punctuation and diacritics, so that many scripts include not only letters but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and space characters.

Unicode 16.0 defines 168 separate scripts, including 99 modern scripts and 69 ancient or historic scripts. [2] [3] More scripts are in the process for encoding or have been tentatively allocated for encoding in roadmaps. [4]

Definition and classification

When multiple languages make use of the same script, there are frequently some differences, particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However, Swedish includes the character å (sometimes called a Swedish O), while English has no such character. Nor does English make use of the diacritic combining ring above for any character. In general, the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems, they are said to use the same Latin script. Thus, the Unicode abstraction of scripts is a basic organizing technique. The differences among different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms.

Script versus writing system

Writing system is sometimes treated as a synonym for "script". However, it also can be used as the specific concrete writing system supported by a script. For example, the Vietnamese writing system is supported by the Latin script. A writing system may also cover more than one script; for example, the Japanese writing system makes use of the Han, Hiragana and Katakana scripts.

Most writing systems can be broadly divided into several categories: logographic, syllabic, alphabetic (or segmental), abugida, abjad and featural; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term complex system is sometimes used to describe those where the admixture makes classification problematic.

Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text-processing algorithms.

Special script property values

In addition to explicit or specific script properties, Unicode uses three special values: [5]

Common
Unicode can assign a character in the UCS to a single script only. However, many characters—those that are not part of a formal natural-language writing system or are unified across many writing systems—may be used in more than one script (for example, currency signs, symbols, numerals and punctuation marks). In these cases Unicode defines them as belonging to the "common" script (ISO 15924 code "Zyyy").
Inherited
Many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the "inherited" script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, U+0308 ̈ COMBINING DIAERESIS may combine either with U+0065eLATIN SMALL LETTER E to create a Latin ë or with U+0435еCYRILLIC SMALL LETTER IE for the Cyrillic ё. In the former case, it inherits the Latin script of the base character, whereas in the latter case, it inherits the Cyrillic script of the base character.
Unknown
The value of "unknown" script (ISO 15924 code Zzzz) is given to unassigned, private-use, noncharacter, and surrogate code points.

Character categories within scripts

Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter and modifier letters. Some characters are considered titlecase letters for a few precomposed ligatures such as Dz (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all compatibility characters, and therefore Unicode discourages their use by authors. It is unlikely that new titlecase letters will be added in the future.

Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as "other letter" or "modifier letter". Ideographs such as Unihan ideographs are also categorized as "other letters". A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are neither uppercase nor lowercase.

Scripts can also contain any other general category character such as marks (diacritic and otherwise), numbers (numerals), punctuation, separators (word separators such as spaces), symbols and non-graphical format characters. These are included in a particular script when they are unique to that script. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script (other than the common and inherited scripts) are letters.

List of encoded scripts

As of version 16.0, Unicode defines 168 scripts (called "Alias" or "Property value alias") based on the ISO 15924 list. In addition, Unicode assigns the name "Common" to ISO 15924's Zyyy code for undetermined scripts, "Inherited" to ISO 15924's Zinh code for inherited scripts, and "Unknown" to ISO 15924's Zzzz code for uncoded scripts. There are script codes defined by ISO 15924 but are not used in Unicode, including Zsym (Symbols) and Zmth (Mathematical notation).

ISO 15924 Script in Unicode [e]
CodeISO numberISO formal nameDirectionalityUnicode Alias [f] VersionCharactersNotesDescription
Adlm166 Adlam right-to-left script   OOjs UI icon edit-ltr-progressive.svg Adlam9.088 Ch 19.9
Afak439 Afaka variesZZ Not in Unicode, proposal is explored [lower-roman 1]
Aghb239 Caucasian Albanian left-to-right  OOjs UI icon edit-ltr-progressive.svg Caucasian Albanian7.053Ancient/historic Ch 8.11
Ahom338 Ahom, Tai Ahom left-to-right  OOjs UI icon edit-ltr-progressive.svg Ahom8.065Ancient/historic Ch 15.16
Arab160 Arabic right-to-left script   OOjs UI icon edit-ltr-progressive.svg Arabic1.01,373 Ch 9.2
Aran161 Arabic (Nastaliq variant) mixedZZ Typographic variant of Arabic (see § Arab)
Armi124 Imperial Aramaic right-to-left script   OOjs UI icon edit-ltr-progressive.svg Imperial Aramaic5.231Ancient/historic Ch 10.4
Armn230 Armenian left-to-right  OOjs UI icon edit-ltr-progressive.svg Armenian1.096 Ch 7.6
Avst134 Avestan right-to-left script   OOjs UI icon edit-ltr-progressive.svg Avestan5.261Ancient/historic Ch 10.7
Bali360 Balinese left-to-right  OOjs UI icon edit-ltr-progressive.svg Balinese5.0127 Ch 17.3
Bamu435 Bamum left-to-right  OOjs UI icon edit-ltr-progressive.svg Bamum5.2657 Ch 19.6
Bass259 Bassa Vah left-to-right  OOjs UI icon edit-ltr-progressive.svg Bassa Vah7.036Ancient/historic Ch 19.7
Batk365 Batak left-to-right  OOjs UI icon edit-ltr-progressive.svg Batak6.056 Ch 17.6
Beng325 Bengali (Bangla) left-to-right  OOjs UI icon edit-ltr-progressive.svg Bengali1.096 Ch 12.2
Bhks334 Bhaiksuki left-to-right  OOjs UI icon edit-ltr-progressive.svg Bhaiksuki9.097Ancient/historic Ch 14.3
Blis550 Blissymbols variesZZ Not in Unicode, proposal is explored [lower-roman 1]
Bopo285 Bopomofo left-to-right, right-to-left script   OOjs UI icon edit-ltr-progressive.svg Bopomofo1.077 Ch 18.3
Brah300 Brahmi left-to-right  OOjs UI icon edit-ltr-progressive.svg Brahmi6.0115Ancient/historic Ch 14.1
Brai570 Braille left-to-right  OOjs UI icon edit-ltr-progressive.svg Braille3.0256 Ch 21.1
Bugi367 Buginese left-to-right  OOjs UI icon edit-ltr-progressive.svg Buginese4.130 Ch 17.2
Buhd372 Buhid left-to-right  OOjs UI icon edit-ltr-progressive.svg Buhid3.220 Ch 17.1
Cakm349 Chakma left-to-right  OOjs UI icon edit-ltr-progressive.svg Chakma6.171 Ch 13.11
Cans440 Unified Canadian Aboriginal Syllabics left-to-right  OOjs UI icon edit-ltr-progressive.svg Canadian Aboriginal3.0726 Ch 20.2
Cari201 Carian left-to-right, right-to-left script   OOjs UI icon edit-ltr-progressive.svg Carian5.149Ancient/historic Ch 8.5
Cham358 Cham left-to-right  OOjs UI icon edit-ltr-progressive.svg Cham5.183 Ch 16.10
Cher445 Cherokee left-to-right  OOjs UI icon edit-ltr-progressive.svg Cherokee3.0172 Ch 20.1
Chis298 Chisoi left-to-rightZZ Not in Unicode, proposal is mature [lower-roman 2]
Chrs109 Chorasmian right-to-left script, top-to-bottom  OOjs UI icon edit-ltr-progressive.svg Chorasmian13.028Ancient/historic Ch 10.8
Cirt291 Cirth variesZZ Not in Unicode
Copt204 Coptic left-to-right  OOjs UI icon edit-ltr-progressive.svg Coptic1.0137Ancient/historic, disunified from Greek in 4.1 Ch 7.3
Cpmn402 Cypro-Minoan left-to-rightCypro Minoan14.099Ancient/historic Ch 8.4
Cprt403 Cypriot syllabary right-to-left script   OOjs UI icon edit-ltr-progressive.svg Cypriot4.055Ancient/historic Ch 8.3
Cyrl220 Cyrillic left-to-right  OOjs UI icon edit-ltr-progressive.svg Cyrillic1.0508Includes typographic variant Old Church Slavonic (see § Cyrs) Ch 7.4
Cyrs221 Cyrillic (Old Church Slavonic variant) variesZZ Typographic variant of Cyrillic (see § Cyrl); Ancient/historic
Deva315 Devanagari (Nagari) left-to-right  OOjs UI icon edit-ltr-progressive.svg Devanagari1.0164 Ch 12.1
Diak342 Dives Akuru left-to-right  OOjs UI icon edit-ltr-progressive.svg Dives Akuru13.072Ancient/historic Ch 15.15
Dogr328 Dogra left-to-right  OOjs UI icon edit-ltr-progressive.svg Dogra11.060Ancient/historic Ch 15.18
Dsrt250 Deseret (Mormon) left-to-right  OOjs UI icon edit-ltr-progressive.svg Deseret3.180 Ch 20.4
Dupl755 Duployan shorthand, Duployan stenography left-to-right  OOjs UI icon edit-ltr-progressive.svg Duployan7.0143 Ch 21.6
Egyd070 Egyptian demotic mixedZZ Not in Unicode
Egyh060 Egyptian hieratic mixedZZ Not in Unicode
Egyp050 Egyptian hieroglyphs right-to-left script, left-to-right  OOjs UI icon edit-ltr-progressive.svg Egyptian Hieroglyphs5.25,105Ancient/historic Ch 11.4
Elba226 Elbasan left-to-right  OOjs UI icon edit-ltr-progressive.svg Elbasan7.040Ancient/historic Ch 8.10
Elym128 Elymaic right-to-left script   OOjs UI icon edit-ltr-progressive.svg Elymaic12.023Ancient/historic Ch 10.9
Ethi430 Ethiopic (Geʻez) left-to-right  OOjs UI icon edit-ltr-progressive.svg Ethiopic3.0523 Ch 19.1
Gara164 Garay right-to-leftGaray16.069
Geok241 Khutsuri (Asomtavruli and Nuskhuri) left-to-right  OOjs UI icon edit-ltr-progressive.svg GeorgianUnicode groups Khutsori, Asomtavruli and Nuskhuri into 'Georgian' (see § Geok). Similarly, Mkhedruli and Mtavruli are 'Georgian' (see § Geor) Ch 7.7
Geor240 Georgian (Mkhedruli and Mtavruli) left-to-right  OOjs UI icon edit-ltr-progressive.svg Georgian1.0173In Unicode this also includes Nuskhuri (Geok) Ch 7.7
Glag225 Glagolitic left-to-right  OOjs UI icon edit-ltr-progressive.svg Glagolitic4.1134Ancient/historic Ch 7.5
Gong312 Gunjala Gondi left-to-right  OOjs UI icon edit-ltr-progressive.svg Gunjala Gondi11.063 Ch 13.15
Gonm313 Masaram Gondi left-to-right  OOjs UI icon edit-ltr-progressive.svg Masaram Gondi10.075 Ch 13.14
Goth206 Gothic left-to-right  OOjs UI icon edit-ltr-progressive.svg Gothic3.127Ancient/historic Ch 8.9
Gran343 Grantha left-to-right  OOjs UI icon edit-ltr-progressive.svg Grantha7.085Ancient/historic Ch 15.14
Grek200 Greek left-to-right  OOjs UI icon edit-ltr-progressive.svg Greek1.0518Directionality sometimes as boustrophedon Ch 7.2
Gujr320 Gujarati left-to-right  OOjs UI icon edit-ltr-progressive.svg Gujarati1.091 Ch 12.4
Gukh397 Gurung Khema left-to-rightGurung Khema16.058
Guru310 Gurmukhi left-to-right  OOjs UI icon edit-ltr-progressive.svg Gurmukhi1.080 Ch 12.3
Hanb503 Han with Bopomofo (alias for Han + Bopomofo) mixedZZ See § Hani, § Bopo
Hang286 Hangul (Hangŭl, Hangeul) left-to-right, vertical right-to-left  OOjs UI icon edit-ltr-progressive.svg Hangul1.011,739Hangul syllables relocated in 2.0 Ch 18.6
Hani500 Han (Hanzi, Kanji, Hanja) top-to-bottom, columns right-to-left (historically)Han1.099,030 Ch 18.1
Hano371 Hanunoo (Hanunóo) left-to-right, bottom-to-top  OOjs UI icon edit-ltr-progressive.svg Hanunoo3.221 Ch 17.1
Hans501 Han (Simplified variant) variesZZ Subset of Han (Hanzi, Kanji, Hanja) (see § Hani)
Hant502 Han (Traditional variant) variesZZ Subset of § Hani
Hatr127 Hatran right-to-left script   OOjs UI icon edit-ltr-progressive.svg Hatran8.026Ancient/historic Ch 10.12
Hebr125 Hebrew right-to-left script   OOjs UI icon edit-ltr-progressive.svg Hebrew1.0134 Ch 9.1
Hira410 Hiragana vertical right-to-left, left-to-right  OOjs UI icon edit-ltr-progressive.svg Hiragana1.0381 Ch 18.4
Hluw080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) left-to-right  OOjs UI icon edit-ltr-progressive.svg Anatolian Hieroglyphs8.0583Ancient/historic Ch 11.6
Hmng450 Pahawh Hmong left-to-right  OOjs UI icon edit-ltr-progressive.svg Pahawh Hmong7.0127 Ch 16.11
Hmnp451 Nyiakeng Puachue Hmong left-to-right  OOjs UI icon edit-ltr-progressive.svg Nyiakeng Puachue Hmong12.071 Ch 16.12
Hrkt412 Japanese syllabaries (alias for Hiragana + Katakana) vertical right-to-left, left-to-right  OOjs UI icon edit-ltr-progressive.svg Katakana or HiraganaSee § Hira, § Kana Ch 18.4
Hung176 Old Hungarian (Hungarian Runic) right-to-left script   OOjs UI icon edit-ltr-progressive.svg Old Hungarian8.0108Ancient/historic Ch 8.8
Inds610 Indus (Harappan) mixedZZ Not in Unicode, proposal is explored [lower-roman 1]
Ital210 Old Italic (Etruscan, Oscan, etc.) right-to-left script, left-to-right  OOjs UI icon edit-ltr-progressive.svg Old Italic3.139Ancient/historic Ch 8.6
Jamo284 Jamo (alias for Jamo subset of Hangul) variesZZ Subset of § Hang
Java361 Javanese left-to-right  OOjs UI icon edit-ltr-progressive.svg Javanese5.290 Ch 17.4
Jpan413 Japanese (alias for Han + Hiragana + Katakana) variesZZ See § Hani, § Hira and § Kana
Jurc510 Jurchen left-to-rightZZ Not in Unicode
Kali357 Kayah Li left-to-right  OOjs UI icon edit-ltr-progressive.svg Kayah Li5.147 Ch 16.9
Kana411 Katakana vertical right-to-left, left-to-right  OOjs UI icon edit-ltr-progressive.svg Katakana1.0321 Ch 18.4
Kawi368 Kawi left-to-right  OOjs UI icon edit-ltr-progressive.svg Kawi15.087Ancient/historic Ch 17.9
Khar305 Kharoshthi right-to-left script   OOjs UI icon edit-ltr-progressive.svg Kharoshthi4.168Ancient/historic Ch 14.2
Khmr355 Khmer left-to-right  OOjs UI icon edit-ltr-progressive.svg Khmer3.0146 Ch 16.4
Khoj322 Khojki left-to-right  OOjs UI icon edit-ltr-progressive.svg Khojki7.065Ancient/historic Ch 15.7
Kitl505 Khitan large script left-to-rightZZ Not in Unicode
Kits288 Khitan small script vertical right-to-left  OOjs UI icon edit-ltr-progressive.svg Khitan Small Script13.0472Ancient/historic Ch 18.12
Knda345 Kannada left-to-right  OOjs UI icon edit-ltr-progressive.svg Kannada1.091 Ch 12.8
Kore287 Korean (alias for Hangul + Han) left-to-rightZZ See § Hani, § Hang
Kpel436 Kpelle left-to-rightZZ Not in Unicode, proposal is explored [lower-roman 1]
Krai396 Kirat Rai left-to-rightKirat Rai16.058
Kthi317 Kaithi left-to-right  OOjs UI icon edit-ltr-progressive.svg Kaithi5.268Ancient/historic Ch 15.2
Lana351 Tai Tham (Lanna) left-to-right  OOjs UI icon edit-ltr-progressive.svg Tai Tham5.2127 Ch 16.7
Laoo356 Lao left-to-right  OOjs UI icon edit-ltr-progressive.svg Lao1.083 Ch 16.2
Latf217 Latin (Fraktur variant) variesZZ Typographic variant of Latin (see § Latn)
Latg216 Latin (Gaelic variant) left-to-rightZZ Typographic variant of Latin (see § Latn)
Latn215 Latin left-to-right  OOjs UI icon edit-ltr-progressive.svg Latin1.01,487See also: Latin script in Unicode Ch 7.1
Leke364 Leke left-to-rightZZ Not in Unicode
Lepc335 Lepcha (Róng) left-to-right  OOjs UI icon edit-ltr-progressive.svg Lepcha5.174 Ch 13.12
Limb336 Limbu left-to-right  OOjs UI icon edit-ltr-progressive.svg Limbu4.068 Ch 13.6
Lina400 Linear A left-to-right  OOjs UI icon edit-ltr-progressive.svg Linear A7.0341Ancient/historic Ch 8.1
Linb401 Linear B left-to-right  OOjs UI icon edit-ltr-progressive.svg Linear B4.0211Ancient/historic Ch 8.2
Lisu399 Lisu (Fraser) left-to-right  OOjs UI icon edit-ltr-progressive.svg Lisu5.249 Ch 18.9
Loma437 Loma left-to-rightZZ Not in Unicode, proposal is explored [lower-roman 1]
Lyci202 Lycian left-to-right  OOjs UI icon edit-ltr-progressive.svg Lycian5.129Ancient/historic Ch 8.5
Lydi116 Lydian right-to-left script   OOjs UI icon edit-ltr-progressive.svg Lydian5.127Ancient/historic Ch 8.5
Mahj314 Mahajani left-to-right  OOjs UI icon edit-ltr-progressive.svg Mahajani7.039Ancient/historic Ch 15.6
Maka366 Makasar left-to-right  OOjs UI icon edit-ltr-progressive.svg Makasar11.025Ancient/historic Ch 17.8
Mand140 Mandaic, Mandaean right-to-left script   OOjs UI icon edit-ltr-progressive.svg Mandaic6.029 Ch 9.5
Mani139 Manichaean right-to-left script   OOjs UI icon edit-ltr-progressive.svg Manichaean7.051Ancient/historic Ch 10.5
Marc332 Marchen left-to-right  OOjs UI icon edit-ltr-progressive.svg Marchen9.068Ancient/historic Ch 14.5
Maya090 Mayan hieroglyphs mixedZZ Not in Unicode
Medf265 Medefaidrin (Oberi Okaime, Oberi Ɔkaimɛ) left-to-right  OOjs UI icon edit-ltr-progressive.svg Medefaidrin11.091 Ch 19.10
Mend438 Mende Kikakui right-to-left script   OOjs UI icon edit-ltr-progressive.svg Mende Kikakui7.0213 Ch 19.8
Merc101 Meroitic Cursive right-to-left script   OOjs UI icon edit-ltr-progressive.svg Meroitic Cursive6.190Ancient/historic Ch 11.5
Mero100 Meroitic Hieroglyphs right-to-left script   OOjs UI icon edit-ltr-progressive.svg Meroitic Hieroglyphs6.132Ancient/historic Ch 11.5
Mlym347 Malayalam left-to-right  OOjs UI icon edit-ltr-progressive.svg Malayalam1.0118 Ch 12.9
Modi324 Modi, Moḍī left-to-right  OOjs UI icon edit-ltr-progressive.svg Modi7.079Ancient/historic Ch 15.12
Mong145 Mongolian vertical left-to-right, left-to-right  OOjs UI icon edit-ltr-progressive.svg Mongolian3.0168Mong includes Clear and Manchu scripts Ch 13.5
Moon218 Moon (Moon code, Moon script, Moon type) mixedZZ Not in Unicode, proposal is explored [lower-roman 1]
Mroo264 Mro, Mru left-to-right  OOjs UI icon edit-ltr-progressive.svg Mro7.043 Ch 13.8
Mtei337 Meitei Mayek (Meithei, Meetei) left-to-right  OOjs UI icon edit-ltr-progressive.svg Meetei Mayek5.279 Ch 13.7
Mult323 Multani left-to-right  OOjs UI icon edit-ltr-progressive.svg Multani8.038Ancient/historic Ch 15.10
Mymr350 Myanmar (Burmese) left-to-right  OOjs UI icon edit-ltr-progressive.svg Myanmar3.0243 Ch 16.3
Nagm295 Nag Mundari left-to-right  OOjs UI icon edit-ltr-progressive.svg Nag Mundari15.042
Nand311 Nandinagari left-to-right  OOjs UI icon edit-ltr-progressive.svg Nandinagari12.065Ancient/historic Ch 15.13
Narb106 Old North Arabian (Ancient North Arabian) right-to-left script   OOjs UI icon edit-ltr-progressive.svg Old North Arabian7.032Ancient/historic Ch 10.1
Nbat159 Nabataean right-to-left script   OOjs UI icon edit-ltr-progressive.svg Nabataean7.040Ancient/historic Ch 10.10
Newa333 Newa, Newar, Newari, Nepāla lipi left-to-right  OOjs UI icon edit-ltr-progressive.svg Newa9.097 Ch 13.3
Nkdb085 Naxi Dongba (na²¹ɕi³³ to³³ba²¹, Nakhi Tomba) left-to-rightZZ Not in Unicode
Nkgb420 Naxi Geba (na²¹ɕi³³ gʌ²¹ba²¹, 'Na-'Khi ²Ggŏ-¹baw, Nakhi Geba) left-to-rightZZ Not in Unicode, proposal is explored [lower-roman 1]
Nkoo165 N’Ko right-to-left script   OOjs UI icon edit-ltr-progressive.svg NKo5.062 Ch 19.4
Nshu499 Nüshu vertical right-to-left  OOjs UI icon edit-ltr-progressive.svg Nushu10.0397 Ch 18.8
Ogam212 Ogham bottom-to-top, left-to-right  OOjs UI icon edit-ltr-progressive.svg Ogham3.029Ancient/historic Ch 8.14
Olck261 Ol Chiki (Ol Cemet’, Ol, Santali) left-to-right  OOjs UI icon edit-ltr-progressive.svg Ol Chiki5.148 Ch 13.10
Onao296 Ol Onal left-to-rightOl Onal16.044
Orkh175 Old Turkic, Orkhon Runic right-to-left script   OOjs UI icon edit-ltr-progressive.svg Old Turkic5.273Ancient/historic Ch 14.8
Orya327 Oriya (Odia) left-to-right  OOjs UI icon edit-ltr-progressive.svg Oriya1.091 Ch 12.5
Osge219 Osage left-to-right  OOjs UI icon edit-ltr-progressive.svg Osage9.072 Ch 20.3
Osma260 Osmanya left-to-right  OOjs UI icon edit-ltr-progressive.svg Osmanya4.040 Ch 19.2
Ougr143 Old Uyghur mixedOld Uyghur14.026Ancient/historic Ch 14.11
Palm126 Palmyrene right-to-left script   OOjs UI icon edit-ltr-progressive.svg Palmyrene7.032Ancient/historic Ch 10.11
Pauc263 Pau Cin Hau left-to-right  OOjs UI icon edit-ltr-progressive.svg Pau Cin Hau7.057 Ch 16.13
Pcun015 Proto-Cuneiform left-to-rightZZ Not in Unicode
Pelm016 Proto-Elamite left-to-rightZZ Not in Unicode
Perm227 Old Permic left-to-right  OOjs UI icon edit-ltr-progressive.svg Old Permic7.043Ancient/historic Ch 8.13
Phag331 Phags-pa vertical left-to-right  OOjs UI icon edit-ltr-progressive.svg Phags-pa5.056Ancient/historic Ch 14.4
Phli131 Inscriptional Pahlavi right-to-left script   OOjs UI icon edit-ltr-progressive.svg Inscriptional Pahlavi5.227Ancient/historic Ch 10.6
Phlp132 Psalter Pahlavi right-to-left script   OOjs UI icon edit-ltr-progressive.svg Psalter Pahlavi7.029Ancient/historic Ch 10.6
Phlv133 Book Pahlavi mixedZZ Not in Unicode
Phnx115 Phoenician right-to-left script   OOjs UI icon edit-ltr-progressive.svg Phoenician5.029Ancient/historic [g] Ch 10.3
Piqd293 Klingon (KLI pIqaD) left-to-right  OOjs UI icon edit-ltr-progressive.svg ZZ Rejected for inclusion in Unicode [lower-roman 3] [lower-roman 4]
Plrd282 Miao (Pollard) left-to-right  OOjs UI icon edit-ltr-progressive.svg Miao6.1149 Ch 18.10
Prti130 Inscriptional Parthian right-to-left script   OOjs UI icon edit-ltr-progressive.svg Inscriptional Parthian5.230Ancient/historic Ch 10.6
Psin103 Proto-Sinaitic mixedZZ Not in Unicode
Qaaa-Qabx900-949 Reserved for private use (range) ZZ Not in Unicode
Ranj303 Ranjana left-to-rightZZ Not in Unicode
Rjng363 Rejang (Redjang, Kaganga) left-to-right  OOjs UI icon edit-ltr-progressive.svg Rejang5.137 Ch 17.5
Rohg167 Hanifi Rohingya right-to-left script   OOjs UI icon edit-ltr-progressive.svg Hanifi Rohingya11.050 Ch 16.14
Roro620 Rongorongo mixedZZ Not in Unicode, proposal is explored [lower-roman 1]
Runr211 Runic left-to-right, boustrophedon   OOjs UI icon edit-ltr-progressive.svg Runic3.086Ancient/historic Ch 8.7
Samr123 Samaritan right-to-left script, top-to-bottom  OOjs UI icon edit-ltr-progressive.svg Samaritan5.261 Ch 9.4
Sara292 Sarati mixedZZ Not in Unicode
Sarb105 Old South Arabian right-to-left script   OOjs UI icon edit-ltr-progressive.svg Old South Arabian5.232Ancient/historic Ch 10.2
Saur344 Saurashtra left-to-right  OOjs UI icon edit-ltr-progressive.svg Saurashtra5.182 Ch 13.13
Sgnw095 SignWriting vertical left-to-right  OOjs UI icon edit-ltr-progressive.svg SignWriting8.0672 Ch 21.7
Shaw281 Shavian (Shaw) left-to-right  OOjs UI icon edit-ltr-progressive.svg Shavian4.048 Ch 8.15
Shrd319 Sharada, Śāradā left-to-right  OOjs UI icon edit-ltr-progressive.svg Sharada6.196 Ch 15.3
Shui530 Shuishu left-to-rightZZ Not in Unicode
Sidd302 Siddham, Siddhaṃ, Siddhamātṛkā left-to-right  OOjs UI icon edit-ltr-progressive.svg Siddham7.092Ancient/historic Ch 15.5
Sidt180 Sidetic right-to-leftZZ Not in Unicode, proposal is mature [lower-roman 2]
Sind318 Khudawadi, Sindhi left-to-right  OOjs UI icon edit-ltr-progressive.svg Khudawadi7.069 Ch 15.9
Sinh348 Sinhala left-to-right  OOjs UI icon edit-ltr-progressive.svg Sinhala3.0111 Ch 13.2
Sogd141 Sogdian horizontal and vertical writing in East Asian scripts, top-to-bottom  OOjs UI icon edit-ltr-progressive.svg Sogdian11.042Ancient/historic Ch 14.10
Sogo142 Old Sogdian right-to-left script   OOjs UI icon edit-ltr-progressive.svg Old Sogdian11.040Ancient/historic Ch 14.9
Sora398 Sora Sompeng left-to-right  OOjs UI icon edit-ltr-progressive.svg Sora Sompeng6.135 Ch 15.17
Soyo329 Soyombo left-to-right  OOjs UI icon edit-ltr-progressive.svg Soyombo10.083Ancient/historic Ch 14.7
Sund362 Sundanese left-to-right  OOjs UI icon edit-ltr-progressive.svg Sundanese5.172 Ch 17.7
Sunu274 Sunuwar left-to-rightSunuwar16.044
Sylo316 Syloti Nagri left-to-right  OOjs UI icon edit-ltr-progressive.svg Syloti Nagri4.145Ancient/historic Ch 15.1
Syrc135 Syriac right-to-left script   OOjs UI icon edit-ltr-progressive.svg Syriac3.088Includes typographic variants Estrangelo (see § Syre), Western (§ Syrj), and Eastern (§ Syrn) Ch 9.3
Syre138 Syriac (Estrangelo variant) mixedZZ Typographic variant of Syriac (see § Syrc)
Syrj137 Syriac (Western variant) mixedZZ Typographic variant of Syriac (see § Syrc)
Syrn136 Syriac (Eastern variant) mixedZZ Typographic variant of Syriac (see § Syrc)
Tagb373 Tagbanwa left-to-right  OOjs UI icon edit-ltr-progressive.svg Tagbanwa3.218 Ch 17.1
Takr321 Takri, Ṭākrī, Ṭāṅkrī left-to-right  OOjs UI icon edit-ltr-progressive.svg Takri6.168 Ch 15.4
Tale353 Tai Le left-to-right  OOjs UI icon edit-ltr-progressive.svg Tai Le4.035 Ch 16.5
Talu354 New Tai Lue left-to-right  OOjs UI icon edit-ltr-progressive.svg New Tai Lue4.183 Ch 16.6
Taml346 Tamil left-to-right  OOjs UI icon edit-ltr-progressive.svg Tamil1.0123 Ch 12.6
Tang520 Tangut vertical right-to-left, left-to-right  OOjs UI icon edit-ltr-progressive.svg Tangut9.06,914Ancient/historic Ch 18.11
Tavt359 Tai Viet left-to-right  OOjs UI icon edit-ltr-progressive.svg Tai Viet5.272 Ch 16.8
Tayo380 Tai Yo top-to-bottom, columns right-to-leftZZ Not in Unicode, proposal is mature [lower-roman 2]
Telu340 Telugu left-to-right  OOjs UI icon edit-ltr-progressive.svg Telugu1.0100 Ch 12.7
Teng290 Tengwar left-to-rightZZ Not in Unicode
Tfng120 Tifinagh (Berber) left-to-right, right-to-left script, top-to-bottom, bottom-to-top  OOjs UI icon edit-ltr-progressive.svg Tifinagh4.159 Ch 19.3
Tglg370 Tagalog (Baybayin, Alibata) left-to-right  OOjs UI icon edit-ltr-progressive.svg Tagalog3.223 Ch 17.1
Thaa170 Thaana right-to-left script   OOjs UI icon edit-ltr-progressive.svg Thaana3.050 Ch 13.1
Thai352 Thai left-to-right  OOjs UI icon edit-ltr-progressive.svg Thai1.086 Ch 16.1
Tibt330 Tibetan left-to-right  OOjs UI icon edit-ltr-progressive.svg Tibetan2.0207Added in 1.0, removed in 1.1 and reintroduced in 2.0 Ch 13.4
Tirh326 Tirhuta left-to-right  OOjs UI icon edit-ltr-progressive.svg Tirhuta7.082 Ch 15.11
Tnsa275 Tangsa left-to-rightTangsa14.089 Ch 13.18
Todr229 Todhri right-to-leftTodhri16.052
Tols299 Tolong Siki left-to-rightZZ Not in Unicode, proposal is mature [lower-roman 2]
Toto294 Toto left-to-rightToto14.031 Ch 13.17
Tutg341 Tulu-Tigalari left-to-rightTulu Tigalari16.080
Ugar040 Ugaritic left-to-right  OOjs UI icon edit-ltr-progressive.svg Ugaritic4.031Ancient/historic Ch 11.2
Vaii470 Vai left-to-right  OOjs UI icon edit-ltr-progressive.svg Vai5.1300 Ch 19.5
Visp280 Visible Speech left-to-rightZZ Not in Unicode
Vith228 Vithkuqi left-to-rightVithkuqi14.070Ancient/historic Ch 8.12
Wara262 Warang Citi (Varang Kshiti) left-to-right  OOjs UI icon edit-ltr-progressive.svg Warang Citi7.084 Ch 13.9
Wcho283 Wancho left-to-right  OOjs UI icon edit-ltr-progressive.svg Wancho12.059 Ch 13.16
Wole480 Woleai mixedZZ Not in Unicode, proposal is explored [lower-roman 1]
Xpeo030 Old Persian left-to-right  OOjs UI icon edit-ltr-progressive.svg Old Persian4.150Ancient/historic Ch 11.3
Xsux020 Cuneiform, Sumero-Akkadian left-to-right  OOjs UI icon edit-ltr-progressive.svg Cuneiform5.01,234Ancient/historic Ch 11.1
Yezi192 Yezidi right-to-left script   OOjs UI icon edit-ltr-progressive.svg Yezidi13.047Ancient/historic Ch 9.6
Yiii460 Yi left-to-right  OOjs UI icon edit-ltr-progressive.svg Yi3.01,220 Ch 18.7
Zanb339 Zanabazar Square (Zanabazarin Dörböljin Useg, Xewtee Dörböljin Bicig, Horizontal Square Script) left-to-right  OOjs UI icon edit-ltr-progressive.svg Zanabazar Square10.072Ancient/historic Ch 14.6
Zinh994 Code for inherited script Inherited657
Zmth995 Mathematical notation ZZ Not a 'script' in Unicode
Zsym996 Symbols ZZ Not a 'script' in Unicode
Zsye993 Symbols (emoji variant) ZZ Not a 'script' in Unicode
Zxxx997 Code for unwritten documents ZZ Not a 'script' in Unicode
Zyyy998 Code for undetermined script Common9,053
Zzzz999 Code for uncoded script Unknown959,049In Unicode: All other code points
Notes
  1. ^
    ISO 15924 publications As of 12 September 2023
  2. ^
    ISO 15924 Normative text file As of 12 September 2023
  3. ^
    ISO 15924 Changes (including Aliases for Unicode; as of 12 September 2023)
  4. ^
    Unicode version 16.0
  5. ^
  6. ^
    Unicode uses the "Property Value Alias" (Alias) as the script-name. These Alias names are part of Unicode and are published informatively next to ISO 15924. An alias script name may be used in a character name: Palm, Palmyrene U+10860𐡠PALMYRENE LETTER ALEPH.
  7. ^
    In Unicode, the Phoenician script is intended for the representation of text in Paleo-Hebrew, Archaic Phoenician, Phoenician, Early Aramaic, Late Phoenician cursive, Phoenician papyri, Siloam Hebrew, Hebrew seals, Ammonite, Moabite, and Punic. [lower-roman 5]
References
  1. 1 2 3 4 5 6 7 8 9 "SEI List of Scripts Not Yet Encoded". Unicode Consortium. March 2023. Retrieved 2023-09-25.
  2. 1 2 3 4 "Unicode Pipeline § Code Points Provisionally Assigned for Mature Proposals". Unicode Consortium. 2023-09-12. Retrieved 2023-09-25.
  3. Michael Everson (1997-09-18). "Proposal to encode Klingon in Plane 1 of ISO/IEC 10646-2".
  4. The Unicode Consortium (2001-08-14). "Approved Minutes of the UTC 87 / L2 184 Joint Meeting".
  5. "Middle East-II, Ancient Scripts" (PDF). 15.0.0. The Unicode Consortium. Retrieved 2023-09-25.

Missing scripts in Unicode

With each new version of Unicode, new writing systems are added to the international character code. According to a statement by linguist Dr Deborah Anderson of UC Berkeley, there are over 100 writing systems that have not yet been included in Unicode.

According to a list of the project Missing Scripts by the University of Applied Sciences Mainz, Germany, the ANRT Nancy, France and UC Berkeley, USA, there are 294 known writing systems of mankind according to the current state of research (January 2022). 131 of them have not yet been encoded in Unicode, i.e. cannot yet be used on a computer or mobile phone.

See also

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 of the standard defines 154998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts.

The Coptic script is the script used for writing the Coptic language, the most recent development of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.

The double acute accent is a diacritic mark of the Latin and Cyrillic scripts. It is used primarily in Hungarian or Chuvash, and consequently it is sometimes referred to by typographers as hungarumlaut. The signs formed with a regular umlaut are letters in their own right in the Hungarian alphabet—for instance, they are separate letters for the purpose of collation. Letters with the double acute, however, are considered variants of their equivalents with the umlaut, being thought of as having both an umlaut and an acute accent.

<span class="mw-page-title-main">Letter case</span> Uppercase or lowercase

Letter case is the distinction between the letters that are in larger uppercase or capitals and smaller lowercase in the written representation of certain languages. The writing systems that distinguish between the upper- and lowercase have two parallel sets of letters: each in the majuscule set has a counterpart in the minuscule set. Some counterpart letters have the same shape, and differ only in size, but for others the shapes are different. The two case variants are alternative representations of the same letter: they have the same name and pronunciation and are typically treated identically when sorting in alphabetical order.

The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as well as consonants. In Archaic and early Classical times, the Greek alphabet existed in many local variants, but, by the end of the 4th century BC, the Euclidean alphabet, with 24 letters, ordered from alpha to omega, had become standard and it is this version that is still used for Greek writing today.

ISO 15924, Codes for the representation of names of scripts, is an international standard defining codes for writing systems or scripts. Each script is given both a four-letter code and a numeric code.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics. The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in several languages for several different purposes. The most familiar to English-language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such diacritics are also sometimes used for stylistic reasons.

Unicode supports several phonetic scripts and notation systems through its existing scripts and the addition of extra blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the International Phonetic Alphabet (IPA), extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

<span class="mw-page-title-main">Latin script</span> Writing system based on the alphabet used by the Romans

The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Greek alphabet was altered by the Etruscans, and subsequently their alphabet was altered by the Ancient Romans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.

An IETF BCP 47 language tag is a standardized code that is used to identify human languages on the Internet. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

The Unicode Standard assigns various properties to each Unicode character and code point.

The Vietnamese language is written with a Latin script with diacritics which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third-party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI and VIQR. VNI input method is not to be confused with VNI code page.

References

  1. "Glossary". unicode.org.
  2. "Unicode Character Database: Scripts". unicode.org.
  3. "Chapter 14: Additional Ancient and Historic Scripts". The Unicode Standard, Version 15.0 (PDF). Mountain View, CA: Unicode, Inc. September 2022. ISBN   978-1-936213-32-0.
  4. https://www.unicode.org/roadmaps/ Roadmaps to Unicode
  5. "UAX #24: Unicode Script Property". www.unicode.org.