Unicode block

Last updated

A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Contents

Each block is generally, but not always, meant to supply glyphs used by one or more specific languages, or in some general application area such as mathematics, surveying, decorative typesetting, social forums, etc.

Design and implementation

Unicode blocks are identified by unique names, which use only ASCII characters and are usually descriptive of the nature of the symbols, in English; such as "Tibetan" or "Supplemental Arrows-A". (When comparing block names, one is supposed to equate uppercase with lowercase letters, and ignore any whitespace, hyphens, and underbars; so the last name is equivalent to "supplemental_arrows__a" and "SUPPLEMENTALARROWSA". [1]

Blocks are pairwise disjoint; that is, they do not overlap. The starting code point and the size (number of code points) of each block are always multiples of 16; therefore, in the hexadecimal notation, the starting (smallest) point is U+xxx0 and the ending (largest) point is U+yyyF, where xxx and yyy are three or more hexadecimal digits. (These constraints are intended to simplify the display of glyphs in Unicode Consortium documents, as tables with 16 rows labeled with the last hexadecimal digit of the code point. [1] ) The size of a block may range from the minimum of 16 to a maximum of 65,536 code points.

Every assigned code point has a glyph property called "Block", whose value is a character string naming the unique block that owns that point. [2] However, a block may also contain unassigned code points, usually reserved for future additions of characters that "logically" should belong to that block. Code points not belonging to any of the named blocks, e.g. in the unassigned planes 4–13, have the value block="No_Block". [1]

Simply belonging to a particular Unicode block does not guarantee the certain particular properties of the characters it is or will be expected to contain. The identity of any character is determined by its properties stated in the Unicode Character Database. For example, the contiguous range of 32 noncharacter code points U+FDD0..U+FDEF share none of the properties common to the other characters in the Arabic Presentation Forms-A block, that they are certainly not Arabic script characters or "right-to-left noncharacters", and are assigned there as a filler to this block given that it has been agreed that no further Arabic compatibility characters will be encoded. [3]

Other classifications

Each Unicode point also has a property called "General Category", that attempts to describe the role of the corresponding symbol in the languages or applications for whose sake it was included in the system. Examples of General Categories are "Lu" (meaning upper-case letter), "Nd" (decimal digit), "Pi" (open-quote punctuation), and "Mn" (non-spacing mark, i.e. a diacritic for the preceding glyph). This division is completely independent of code blocks: the code points with a given General Category generally span many blocks, and do not have to be consecutive, not even within each block. [4]

Each code point also has a script property, specifying which writing system it is intended for, or whether it is intended for multiple writing systems. This, also, is independent of block.

In descriptions of the Unicode system, a block may be subdivided into more specific subgroups, such as the "Chess symbols" in the Miscellaneous Symbols block (not to be confused with the separate Chess Symbols block). Those subgroups are not "blocks" in the technical sense used by the Unicode consortium, and are named only for the convenience of users.

List of blocks

Unicode 15.1 defines 328 blocks: [1]

Plane Block rangeBlock nameCode points [lower-alpha 1] Assigned characters Scripts [lower-alpha 2] [lower-alpha 3] [lower-alpha 4] [lower-alpha 5] [lower-alpha 6]
  0 BMP U+0000..U+007F Basic Latin [lower-alpha 7] 128128 Latin (52 characters), Common (76 characters)
 0 BMPU+0080..U+00FF Latin-1 Supplement [lower-alpha 8] 128128Latin (64 characters), Common (64 characters)
 0 BMPU+0100..U+017F Latin Extended-A 128128Latin
 0 BMPU+0180..U+024F Latin Extended-B 208208Latin
 0 BMPU+0250..U+02AF IPA Extensions 9696Latin
 0 BMPU+02B0..U+02FF Spacing Modifier Letters 8080 Bopomofo (2 characters), Latin (14 characters), Common (64 characters)
 0 BMPU+0300..U+036F Combining Diacritical Marks 112112 Inherited
 0 BMPU+0370..U+03FF Greek and Coptic 144135 Coptic (14 characters), Greek (117 characters), Common (4 characters)
 0 BMPU+0400..U+04FF Cyrillic 256256 Cyrillic (254 characters), Inherited (2 characters)
 0 BMPU+0500..U+052F Cyrillic Supplement 4848Cyrillic
 0 BMPU+0530..U+058F Armenian 9691 Armenian
 0 BMPU+0590..U+05FF Hebrew 11288 Hebrew
 0 BMPU+0600..U+06FF Arabic 256256 Arabic (238 characters), Common (6 characters), Inherited (12 characters)
 0 BMPU+0700..U+074F Syriac 8077 Syriac
 0 BMPU+0750..U+077F Arabic Supplement 4848Arabic
 0 BMPU+0780..U+07BF Thaana 6450 Thaana
 0 BMPU+07C0..U+07FF NKo 6462 N’Ko
 0 BMPU+0800..U+083F Samaritan 6461 Samaritan
 0 BMPU+0840..U+085F Mandaic 3229 Mandaic
 0 BMPU+0860..U+086F Syriac Supplement 1611Syriac
 0 BMPU+0870..U+089F Arabic Extended-B 4841Arabic
 0 BMPU+08A0..U+08FF Arabic Extended-A 9696Arabic (95 characters), Common (1 character)
 0 BMPU+0900..U+097F Devanagari 128128 Devanagari (122 characters), Common (2 characters), Inherited (4 characters)
 0 BMPU+0980..U+09FF Bengali 12896 Bengali
 0 BMPU+0A00..U+0A7F Gurmukhi 12880 Gurmukhi
 0 BMPU+0A80..U+0AFF Gujarati 12891 Gujarati
 0 BMPU+0B00..U+0B7F Oriya 12891 Oriya
 0 BMPU+0B80..U+0BFF Tamil 12872 Tamil
 0 BMPU+0C00..U+0C7F Telugu 128100 Telugu
 0 BMPU+0C80..U+0CFF Kannada 12891 Kannada
 0 BMPU+0D00..U+0D7F Malayalam 128118 Malayalam
 0 BMPU+0D80..U+0DFF Sinhala 12891 Sinhala
 0 BMPU+0E00..U+0E7F Thai 12887 Thai (86 characters), Common (1 character)
 0 BMPU+0E80..U+0EFF Lao 12883 Lao
 0 BMPU+0F00..U+0FFF Tibetan 256211 Tibetan (207 characters), Common (4 characters)
 0 BMPU+1000..U+109F Myanmar 160160 Myanmar
 0 BMPU+10A0..U+10FF Georgian 9688 Georgian (87 characters), Common (1 character)
 0 BMPU+1100..U+11FF Hangul Jamo 256256 Hangul
 0 BMPU+1200..U+137F Ethiopic 384358 Ethiopic
 0 BMPU+1380..U+139F Ethiopic Supplement 3226Ethiopic
 0 BMPU+13A0..U+13FF Cherokee 9692 Cherokee
 0 BMPU+1400..U+167F Unified Canadian Aboriginal Syllabics 640640 Canadian Aboriginal
 0 BMPU+1680..U+169F Ogham 3229 Ogham
 0 BMPU+16A0..U+16FF Runic 9689 Runic (86 characters), Common (3 characters)
 0 BMPU+1700..U+171F Tagalog 3223 Tagalog
 0 BMPU+1720..U+173F Hanunoo 3223 Hanunoo (21 characters), Common (2 characters)
 0 BMPU+1740..U+175F Buhid 3220 Buhid
 0 BMPU+1760..U+177F Tagbanwa 3218 Tagbanwa
 0 BMPU+1780..U+17FF Khmer 128114 Khmer
 0 BMPU+1800..U+18AF Mongolian 176158 Mongolian (155 characters), Common (3 characters)
 0 BMPU+18B0..U+18FF Unified Canadian Aboriginal Syllabics Extended 8070Canadian Aboriginal
 0 BMPU+1900..U+194F Limbu 8068 Limbu
 0 BMPU+1950..U+197F Tai Le 4835 Tai Le
 0 BMPU+1980..U+19DF New Tai Lue 9683 New Tai Lue
 0 BMPU+19E0..U+19FF Khmer Symbols 3232Khmer
 0 BMPU+1A00..U+1A1F Buginese 3230 Buginese
 0 BMPU+1A20..U+1AAF Tai Tham 144127 Tai Tham
 0 BMPU+1AB0..U+1AFF Combining Diacritical Marks Extended 8031Inherited
 0 BMPU+1B00..U+1B7F Balinese 128124 Balinese
 0 BMPU+1B80..U+1BBF Sundanese 6464 Sundanese
 0 BMPU+1BC0..U+1BFF Batak 6456 Batak
 0 BMPU+1C00..U+1C4F Lepcha 8074 Lepcha
 0 BMPU+1C50..U+1C7F Ol Chiki 4848 Ol Chiki
 0 BMPU+1C80..U+1C8F Cyrillic Extended-C 169Cyrillic
 0 BMPU+1C90..U+1CBF Georgian Extended 4846Georgian
 0 BMPU+1CC0..U+1CCF Sundanese Supplement 168Sundanese
 0 BMPU+1CD0..U+1CFF Vedic Extensions 4843Common (16 characters), Inherited (27 characters)
 0 BMPU+1D00..U+1D7F Phonetic Extensions 128128Cyrillic (2 characters), Greek (15 characters), Latin (111 characters)
 0 BMPU+1D80..U+1DBF Phonetic Extensions Supplement 6464Greek (1 character), Latin (63 characters)
 0 BMPU+1DC0..U+1DFF Combining Diacritical Marks Supplement 6464Inherited
 0 BMPU+1E00..U+1EFF Latin Extended Additional 256256Latin
 0 BMPU+1F00..U+1FFF Greek Extended 256233 Greek
 0 BMPU+2000..U+206F General Punctuation 112111Common (109 characters), Inherited (2 characters)
 0 BMPU+2070..U+209F Superscripts and Subscripts 4842Latin (15 characters), Common (27 characters)
 0 BMPU+20A0..U+20CF Currency Symbols 4833Common
 0 BMPU+20D0..U+20FF Combining Diacritical Marks for Symbols 4833Inherited
 0 BMPU+2100..U+214F Letterlike Symbols 8080Greek (1 character), Latin (4 characters), Common (75 characters)
 0 BMPU+2150..U+218F Number Forms 6460Latin (41 characters), Common (19 characters)
 0 BMPU+2190..U+21FF Arrows 112112Common
 0 BMPU+2200..U+22FF Mathematical Operators 256256Common
 0 BMPU+2300..U+23FF Miscellaneous Technical 256256Common
 0 BMPU+2400..U+243F Control Pictures 6439Common
 0 BMPU+2440..U+245F Optical Character Recognition 3211Common
 0 BMPU+2460..U+24FF Enclosed Alphanumerics 160160Common
 0 BMPU+2500..U+257F Box Drawing 128128Common
 0 BMPU+2580..U+259F Block Elements 3232Common
 0 BMPU+25A0..U+25FF Geometric Shapes 9696Common
 0 BMPU+2600..U+26FF Miscellaneous Symbols 256256Common
 0 BMPU+2700..U+27BF Dingbats 192192Common
 0 BMPU+27C0..U+27EF Miscellaneous Mathematical Symbols-A 4848Common
 0 BMPU+27F0..U+27FF Supplemental Arrows-A 1616Common
 0 BMPU+2800..U+28FF Braille Patterns 256256 Braille
 0 BMPU+2900..U+297F Supplemental Arrows-B 128128Common
 0 BMPU+2980..U+29FF Miscellaneous Mathematical Symbols-B 128128Common
 0 BMPU+2A00..U+2AFF Supplemental Mathematical Operators 256256Common
 0 BMPU+2B00..U+2BFF Miscellaneous Symbols and Arrows 256253Common
 0 BMPU+2C00..U+2C5F Glagolitic 9696 Glagolitic
 0 BMPU+2C60..U+2C7F Latin Extended-C 3232Latin
 0 BMPU+2C80..U+2CFF Coptic 128123Coptic
 0 BMPU+2D00..U+2D2F Georgian Supplement 4840Georgian
 0 BMPU+2D30..U+2D7F Tifinagh 8059 Tifinagh
 0 BMPU+2D80..U+2DDF Ethiopic Extended 9679Ethiopic
 0 BMPU+2DE0..U+2DFF Cyrillic Extended-A 3232Cyrillic
 0 BMPU+2E00..U+2E7F Supplemental Punctuation 12894Common
 0 BMPU+2E80..U+2EFF CJK Radicals Supplement 128115 Han
 0 BMPU+2F00..U+2FDF Kangxi Radicals 224214Han
 0 BMPU+2FF0..U+2FFF Ideographic Description Characters 1616Common
 0 BMPU+3000..U+303F CJK Symbols and Punctuation 6464Han (15 characters), Hangul (2 characters), Common (43 characters), Inherited (4 characters)
 0 BMPU+3040..U+309F Hiragana 9693 Hiragana (89 characters), Common (2 characters), Inherited (2 characters)
 0 BMPU+30A0..U+30FF Katakana 9696 Katakana (93 characters), Common (3 characters)
 0 BMPU+3100..U+312F Bopomofo 4843 Bopomofo
 0 BMPU+3130..U+318F Hangul Compatibility Jamo 9694Hangul
 0 BMPU+3190..U+319F Kanbun 1616Common
 0 BMPU+31A0..U+31BF Bopomofo Extended 3232Bopomofo
 0 BMPU+31C0..U+31EF CJK Strokes 4837Common
 0 BMPU+31F0..U+31FF Katakana Phonetic Extensions 1616Katakana
 0 BMPU+3200..U+32FF Enclosed CJK Letters and Months 256255Hangul (62 characters), Katakana (47 characters), Common (146 characters)
 0 BMPU+3300..U+33FF CJK Compatibility 256256Katakana (88 characters), Common (168 characters)
 0 BMPU+3400..U+4DBF CJK Unified Ideographs Extension A 6,5926,592Han
 0 BMPU+4DC0..U+4DFF Yijing Hexagram Symbols 6464Common
 0 BMPU+4E00..U+9FFF CJK Unified Ideographs 20,99220,992Han
 0 BMPU+A000..U+A48F Yi Syllables 1,1681,165 Yi
 0 BMPU+A490..U+A4CF Yi Radicals 6455Yi
 0 BMPU+A4D0..U+A4FF Lisu 4848 Lisu
 0 BMPU+A500..U+A63F Vai 320300 Vai
 0 BMPU+A640..U+A69F Cyrillic Extended-B 9696Cyrillic
 0 BMPU+A6A0..U+A6FF Bamum 9688 Bamum
 0 BMPU+A700..U+A71F Modifier Tone Letters 3232Common
 0 BMPU+A720..U+A7FF Latin Extended-D 224193Latin (188 characters), Common (5 characters)
 0 BMPU+A800..U+A82F Syloti Nagri 4845 Syloti Nagri
 0 BMPU+A830..U+A83F Common Indic Number Forms 1610Common
 0 BMPU+A840..U+A87F Phags-pa 6456 Phags Pa
 0 BMPU+A880..U+A8DF Saurashtra 9682 Saurashtra
 0 BMPU+A8E0..U+A8FF Devanagari Extended 3232Devanagari
 0 BMPU+A900..U+A92F Kayah Li 4848 Kayah Li (47 characters), Common (1 character)
 0 BMPU+A930..U+A95F Rejang 4837 Rejang
 0 BMPU+A960..U+A97F Hangul Jamo Extended-A 3229Hangul
 0 BMPU+A980..U+A9DF Javanese 9691 Javanese (90 characters), Common (1 character)
 0 BMPU+A9E0..U+A9FF Myanmar Extended-B 3231Myanmar
 0 BMPU+AA00..U+AA5F Cham 9683 Cham
 0 BMPU+AA60..U+AA7F Myanmar Extended-A 3232Myanmar
 0 BMPU+AA80..U+AADF Tai Viet 9672 Tai Viet
 0 BMPU+AAE0..U+AAFF Meetei Mayek Extensions 3223 Meetei Mayek
 0 BMPU+AB00..U+AB2F Ethiopic Extended-A 4832Ethiopic
 0 BMPU+AB30..U+AB6F Latin Extended-E 6460Latin (56 characters), Greek (1 character), Common (3 characters)
 0 BMPU+AB70..U+ABBF Cherokee Supplement 8080Cherokee
 0 BMPU+ABC0..U+ABFF Meetei Mayek 6456Meetei Mayek
 0 BMPU+AC00..U+D7AF Hangul Syllables 11,18411,172Hangul
 0 BMPU+D7B0..U+D7FF Hangul Jamo Extended-B 8072Hangul
 0 BMPU+D800..U+DB7F High Surrogates 8960 Unknown
 0 BMPU+DB80..U+DBFF High Private Use Surrogates 1280Unknown
 0 BMPU+DC00..U+DFFF Low Surrogates 1,0240Unknown
 0 BMPU+E000..U+F8FF Private Use Area 6,4006,400Unknown
 0 BMPU+F900..U+FAFF CJK Compatibility Ideographs 512472Han
 0 BMPU+FB00..U+FB4F Alphabetic Presentation Forms 8058Armenian (5 characters), Hebrew (46 characters), Latin (7 characters)
 0 BMPU+FB50..U+FDFF Arabic Presentation Forms-A 688631Arabic (629 characters), Common (2 characters)
 0 BMPU+FE00..U+FE0F Variation Selectors 1616Inherited
 0 BMPU+FE10..U+FE1F Vertical Forms 1610Common
 0 BMPU+FE20..U+FE2F Combining Half Marks 1616Cyrillic (2 characters), Inherited (14 characters)
 0 BMPU+FE30..U+FE4F CJK Compatibility Forms 3232Common
 0 BMPU+FE50..U+FE6F Small Form Variants 3226Common
 0 BMPU+FE70..U+FEFF Arabic Presentation Forms-B 144141Arabic (140 characters), Common (1 character)
 0 BMPU+FF00..U+FFEF Halfwidth and Fullwidth Forms 240225Hangul (52 characters), Katakana (55 characters), Latin (52 characters), Common (66 characters)
 0 BMPU+FFF0..U+FFFF Specials 165Common
  1 SMP U+10000..U+1007F Linear B Syllabary 12888 Linear B
 1 SMPU+10080..U+100FF Linear B Ideograms 128123Linear B
 1 SMPU+10100..U+1013F Aegean Numbers 6457Common
 1 SMPU+10140..U+1018F Ancient Greek Numbers 8079Greek
 1 SMPU+10190..U+101CF Ancient Symbols 6414Greek (1 character), Common (13 characters)
 1 SMPU+101D0..U+101FF Phaistos Disc 4846Common (45 characters), Inherited (1 character)
 1 SMPU+10280..U+1029F Lycian 3229 Lycian
 1 SMPU+102A0..U+102DF Carian 6449 Carian
 1 SMPU+102E0..U+102FF Coptic Epact Numbers 3228Common (27 characters), Inherited (1 character)
 1 SMPU+10300..U+1032F Old Italic 4839 Old Italic
 1 SMPU+10330..U+1034F Gothic 3227 Gothic
 1 SMPU+10350..U+1037F Old Permic 4843 Old Permic
 1 SMPU+10380..U+1039F Ugaritic 3231 Ugaritic
 1 SMPU+103A0..U+103DF Old Persian 6450 Old Persian
 1 SMPU+10400..U+1044F Deseret 8080 Deseret
 1 SMPU+10450..U+1047F Shavian 4848 Shavian
 1 SMPU+10480..U+104AF Osmanya 4840 Osmanya
 1 SMPU+104B0..U+104FF Osage 8072 Osage
 1 SMPU+10500..U+1052F Elbasan 4840 Elbasan
 1 SMPU+10530..U+1056F Caucasian Albanian 6453 Caucasian Albanian
 1 SMPU+10570..U+105BF Vithkuqi 8070 Vithkuqi
 1 SMPU+10600..U+1077F Linear A 384341 Linear A
 1 SMPU+10780..U+107BF Latin Extended-F 6457Latin
 1 SMPU+10800..U+1083F Cypriot Syllabary 6455 Cypriot
 1 SMPU+10840..U+1085F Imperial Aramaic 3231 Imperial Aramaic
 1 SMPU+10860..U+1087F Palmyrene 3232 Palmyrene
 1 SMPU+10880..U+108AF Nabataean 4840 Nabataean
 1 SMPU+108E0..U+108FF Hatran 3226 Hatran
 1 SMPU+10900..U+1091F Phoenician 3229 Phoenician
 1 SMPU+10920..U+1093F Lydian 3227 Lydian
 1 SMPU+10980..U+1099F Meroitic Hieroglyphs 3232 Meroitic Hieroglyphs
 1 SMPU+109A0..U+109FF Meroitic Cursive 9690 Meroitic Cursive
 1 SMPU+10A00..U+10A5F Kharoshthi 9668 Kharoshthi
 1 SMPU+10A60..U+10A7F Old South Arabian 3232 Old South Arabian
 1 SMPU+10A80..U+10A9F Old North Arabian 3232 Old North Arabian
 1 SMPU+10AC0..U+10AFF Manichaean 6451 Manichaean
 1 SMPU+10B00..U+10B3F Avestan 6461 Avestan
 1 SMPU+10B40..U+10B5F Inscriptional Parthian 3230 Inscriptional Parthian
 1 SMPU+10B60..U+10B7F Inscriptional Pahlavi 3227 Inscriptional Pahlavi
 1 SMPU+10B80..U+10BAF Psalter Pahlavi 4829 Psalter Pahlavi
 1 SMPU+10C00..U+10C4F Old Turkic 8073 Old Turkic
 1 SMPU+10C80..U+10CFF Old Hungarian 128108 Old Hungarian
 1 SMPU+10D00..U+10D3F Hanifi Rohingya 6450 Hanifi Rohingya
 1 SMPU+10E60..U+10E7F Rumi Numeral Symbols 3231Arabic
 1 SMPU+10E80..U+10EBF Yezidi 6447 Yezidi
 1 SMPU+10EC0..U+10EFF Arabic Extended-C 643Arabic
 1 SMPU+10F00..U+10F2F Old Sogdian 4840 Old Sogdian
 1 SMPU+10F30..U+10F6F Sogdian 6442 Sogdian
 1 SMPU+10F70..U+10FAF Old Uyghur 6426 Old Uyghur
 1 SMPU+10FB0..U+10FDF Chorasmian 4828 Chorasmian
 1 SMPU+10FE0..U+10FFF Elymaic 3223 Elymaic
 1 SMPU+11000..U+1107F Brahmi 128115 Brahmi
 1 SMPU+11080..U+110CF Kaithi 8068 Kaithi
 1 SMPU+110D0..U+110FF Sora Sompeng 4835 Sora Sompeng
 1 SMPU+11100..U+1114F Chakma 8071 Chakma
 1 SMPU+11150..U+1117F Mahajani 4839 Mahajani
 1 SMPU+11180..U+111DF Sharada 9696 Sharada
 1 SMPU+111E0..U+111FF Sinhala Archaic Numbers 3220Sinhala
 1 SMPU+11200..U+1124F Khojki 8065 Khojki
 1 SMPU+11280..U+112AF Multani 4838 Multani
 1 SMPU+112B0..U+112FF Khudawadi 8069 Khudawadi
 1 SMPU+11300..U+1137F Grantha 12886 Grantha (85 characters), Inherited (1 character)
 1 SMPU+11400..U+1147F Newa 12897 Newa
 1 SMPU+11480..U+114DF Tirhuta 9682 Tirhuta
 1 SMPU+11580..U+115FF Siddham 12892 Siddham
 1 SMPU+11600..U+1165F Modi 9679 Modi
 1 SMPU+11660..U+1167F Mongolian Supplement 3213Mongolian
 1 SMPU+11680..U+116CF Takri 8068 Takri
 1 SMPU+11700..U+1174F Ahom 8065 Ahom
 1 SMPU+11800..U+1184F Dogra 8060 Dogra
 1 SMPU+118A0..U+118FF Warang Citi 9684 Warang Citi
 1 SMPU+11900..U+1195F Dives Akuru 9672 Dives Akuru
 1 SMPU+119A0..U+119FF Nandinagari 9665 Nandinagari
 1 SMPU+11A00..U+11A4F Zanabazar Square 8072 Zanabazar Square
 1 SMPU+11A50..U+11AAF Soyombo 9683 Soyombo
 1 SMPU+11AB0..U+11ABF Unified Canadian Aboriginal Syllabics Extended-A 1616Canadian Aboriginal
 1 SMPU+11AC0..U+11AFF Pau Cin Hau 6457 Pau Cin Hau
 1 SMPU+11B00..U+11B5F Devanagari Extended-A 9610Devanagari
 1 SMPU+11C00..U+11C6F Bhaiksuki 11297 Bhaiksuki
 1 SMPU+11C70..U+11CBF Marchen 8068 Marchen
 1 SMPU+11D00..U+11D5F Masaram Gondi 9675 Masaram Gondi
 1 SMPU+11D60..U+11DAF Gunjala Gondi 8063 Gunjala Gondi
 1 SMPU+11EE0..U+11EFF Makasar 3225 Makasar
 1 SMPU+11F00..U+11F5F Kawi 9686 Kawi
 1 SMPU+11FB0..U+11FBF Lisu Supplement 161Lisu
 1 SMPU+11FC0..U+11FFF Tamil Supplement 6451Tamil
 1 SMPU+12000..U+123FF Cuneiform 1,024922 Cuneiform
 1 SMPU+12400..U+1247F Cuneiform Numbers and Punctuation 128116Cuneiform
 1 SMPU+12480..U+1254F Early Dynastic Cuneiform 208196Cuneiform
 1 SMPU+12F90..U+12FFF Cypro-Minoan 11299 Cypro Minoan
 1 SMPU+13000..U+1342F Egyptian Hieroglyphs 1,0721,072 Egyptian Hieroglyphs
 1 SMPU+13430..U+1345F Egyptian Hieroglyph Format Controls 4838Egyptian Hieroglyphs
 1 SMPU+14400..U+1467F Anatolian Hieroglyphs 640583 Anatolian Hieroglyphs
 1 SMPU+16800..U+16A3F Bamum Supplement 576569Bamum
 1 SMPU+16A40..U+16A6F Mro 4843 Mro
 1 SMPU+16A70..U+16ACF Tangsa 9689 Tangsa
 1 SMPU+16AD0..U+16AFF Bassa Vah 4836 Bassa Vah
 1 SMPU+16B00..U+16B8F Pahawh Hmong 144127 Pahawh Hmong
 1 SMPU+16E40..U+16E9F Medefaidrin 9691 Medefaidrin
 1 SMPU+16F00..U+16F9F Miao 160149 Miao
 1 SMPU+16FE0..U+16FFF Ideographic Symbols and Punctuation 327Han (4 characters), Khitan Small Script (1 character), Nushu (1 character), Tangut (1 character)
 1 SMPU+17000..U+187FF Tangut 6,1446,136 Tangut
 1 SMPU+18800..U+18AFF Tangut Components 768768Tangut
 1 SMPU+18B00..U+18CFF Khitan Small Script 512470 Khitan Small Script
 1 SMPU+18D00..U+18D7F Tangut Supplement 1289Tangut
 1 SMPU+1AFF0..U+1AFFF Kana Extended-B 1613Katakana
 1 SMPU+1B000..U+1B0FF Kana Supplement 256256Hiragana (255 characters), Katakana (1 character)
 1 SMPU+1B100..U+1B12F Kana Extended-A 4835Hiragana (32 characters), Katakana (3 characters)
 1 SMPU+1B130..U+1B16F Small Kana Extension 649Hiragana (4 characters), Katakana (5 characters)
 1 SMPU+1B170..U+1B2FF Nushu 400396 Nüshu
 1 SMPU+1BC00..U+1BC9F Duployan 160143 Duployan
 1 SMPU+1BCA0..U+1BCAF Shorthand Format Controls 164Common
 1 SMPU+1CF00..U+1CFCF Znamenny Musical Notation 208185Common (116 characters), Inherited (69 characters)
 1 SMPU+1D000..U+1D0FF Byzantine Musical Symbols 256246Common
 1 SMPU+1D100..U+1D1FF Musical Symbols 256233Common (211 characters), Inherited (22 characters)
 1 SMPU+1D200..U+1D24F Ancient Greek Musical Notation 8070Greek
 1 SMPU+1D2C0..U+1D2DF Kaktovik Numerals 3220Common
 1 SMPU+1D2E0..U+1D2FF Mayan Numerals 3220Common
 1 SMPU+1D300..U+1D35F Tai Xuan Jing Symbols 9687Common
 1 SMPU+1D360..U+1D37F Counting Rod Numerals 3225Common
 1 SMPU+1D400..U+1D7FF Mathematical Alphanumeric Symbols 1,024996Common
 1 SMPU+1D800..U+1DAAF Sutton SignWriting 688672 SignWriting
 1 SMPU+1DF00..U+1DFFF Latin Extended-G 25637Latin
 1 SMPU+1E000..U+1E02F Glagolitic Supplement 4838Glagolitic
 1 SMPU+1E030..U+1E08F Cyrillic Extended-D 9663Cyrillic
 1 SMPU+1E100..U+1E14F Nyiakeng Puachue Hmong 8071 Nyiakeng Puachue Hmong
 1 SMPU+1E290..U+1E2BF Toto 4831 Toto
 1 SMPU+1E2C0..U+1E2FF Wancho 6459 Wancho
 1 SMPU+1E4D0..U+1E4FF Nag Mundari 4842 Mundari
 1 SMPU+1E7E0..U+1E7FF Ethiopic Extended-B 3228Ethiopic
 1 SMPU+1E800..U+1E8DF Mende Kikakui 224213 Mende Kikakui
 1 SMPU+1E900..U+1E95F Adlam 9688 Adlam
 1 SMPU+1EC70..U+1ECBF Indic Siyaq Numbers 8068Common
 1 SMPU+1ED00..U+1ED4F Ottoman Siyaq Numbers 8061Common
 1 SMPU+1EE00..U+1EEFF Arabic Mathematical Alphabetic Symbols 256143Arabic
 1 SMPU+1F000..U+1F02F Mahjong Tiles 4844Common
 1 SMPU+1F030..U+1F09F Domino Tiles 112100Common
 1 SMPU+1F0A0..U+1F0FF Playing Cards 9682Common
 1 SMPU+1F100..U+1F1FF Enclosed Alphanumeric Supplement 256200Common
 1 SMPU+1F200..U+1F2FF Enclosed Ideographic Supplement 25664Hiragana (1 character), Common (63 characters)
 1 SMPU+1F300..U+1F5FF Miscellaneous Symbols and Pictographs 768768Common
 1 SMPU+1F600..U+1F64F Emoticons 8080Common
 1 SMPU+1F650..U+1F67F Ornamental Dingbats 4848Common
 1 SMPU+1F680..U+1F6FF Transport and Map Symbols 128118Common
 1 SMPU+1F700..U+1F77F Alchemical Symbols 128124Common
 1 SMPU+1F780..U+1F7FF Geometric Shapes Extended 128103Common
 1 SMPU+1F800..U+1F8FF Supplemental Arrows-C 256150Common
 1 SMPU+1F900..U+1F9FF Supplemental Symbols and Pictographs 256256Common
 1 SMPU+1FA00..U+1FA6F Chess Symbols 11298Common
 1 SMPU+1FA70..U+1FAFF Symbols and Pictographs Extended-A 144107Common
 1 SMPU+1FB00..U+1FBFF Symbols for Legacy Computing 256212Common
  2 SIP U+20000..U+2A6DF CJK Unified Ideographs Extension B 42,72042,720Han
 2 SIPU+2A700..U+2B73F CJK Unified Ideographs Extension C 4,1604,154Han
 2 SIPU+2B740..U+2B81F CJK Unified Ideographs Extension D 224222Han
 2 SIPU+2B820..U+2CEAF CJK Unified Ideographs Extension E 5,7765,762Han
 2 SIPU+2CEB0..U+2EBEF CJK Unified Ideographs Extension F 7,4887,473Han
 2 SIPU+2EBF0..U+2EE5F CJK Unified Ideographs Extension I 624622Han
 2 SIPU+2F800..U+2FA1F CJK Compatibility Ideographs Supplement 544542Han
  3 TIP U+30000..U+3134F CJK Unified Ideographs Extension G 4,9444,939Han
 3 TIPU+31350..U+323AF CJK Unified Ideographs Extension H 4,1924,192Han
14 SSP U+E0000..U+E007F Tags 12897Common
14 SSPU+E0100..U+E01EF Variation Selectors Supplement 240240Inherited
15 PUA-A U+F0000..U+FFFFF Supplementary Private Use Area-A 65,53665,534Unknown
16 PUA-B U+100000..U+10FFFF Supplementary Private Use Area-B 65,53665,534Unknown
  1. Code point count includes unassigned code points: noncharacter, reserved etc.
  2. The script has one or multiple characters in the block, as defined by the Script Property. This is independent of the block name
  3. "Common" and "Unknown" (Zyyy) and "Inherited" (Zinh or Qaai) refer to Scripts in ISO 15924
  4. Unicode Blocks data file. As of Unicode version 15.1
  5. UAX 24: Unicode Script Property (4 alpha code)
  6. UAX 24: Script data file
  7. Called "C0 Controls and Basic Latin" in ISO/IEC 10646
  8. Called "C1 Controls and Latin-1 Supplement" in ISO/IEC 10646

Moved blocks

The Unicode Stability Policy requires that a character, once assigned, may not be moved or removed, although it may be deprecated. This applies to Unicode 2.0 and all subsequent versions.

Prior to this, the following former blocks were moved:

Former Unicode blocks from before Unicode 2.0
Block rangeHistorical
block name
Version when addedVersion when removedRange now occupied bySuperseded by blockCode pointsAssigned characters Scripts
U+1000..U+105F Tibetan [5] 1.0.01.0.1 Myanmar Tibetan 9671 Tibetan
U+3400..U+3D2D Hangul [6] 1.0.02.0 CJK Unified Ideographs Extension A Hangul Syllables 23502350 Hangul
U+3D2E..U+44B7 Hangul Supplementary-A [6] 1.12.019301930
U+44B8..U+4DFF Hangul Supplementary-B [6] CJK Unified Ideographs Extension A and Yijing Hexagram Symbols 23762376

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

A code point, codepoint or code position is a unique position in a quantized n-dimensional space that has been assigned a semantic meaning.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

A fallback font is a reserve typeface containing symbols for as many Unicode characters as possible. When a display system encounters a character that is not part of the repertoire of any of the other available fonts, a symbol from a fallback font is used instead. Typically, a fallback font will contain symbols representative of the various types of Unicode characters.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

A numeral is a character that denotes a number. The decimal number digits 0–9 are used widely in various writing systems throughout the world, however the graphemes representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes several non-decimal numerals such as Aegean numerals, Roman numerals, counting rod numerals, Mayan numerals, Cuneiform numerals and ancient Greek numerals. There is also a large number of typographical variations of the Western Arabic numerals provided for specialized mathematical use and for compatibility with earlier character sets, such as ² or ②, and composite characters such as ½.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

<span class="mw-page-title-main">Code2000</span> Typeface

Code2000 is a serif and pan-Unicode digital font, which includes characters and symbols from a very large range of writing systems. As of the current final version 1.171 released in 2008, Code2000 is designed and implemented by James Kass to include as much of the Unicode 5.2 standard as practical, and to support OpenType digital typography features. Code2000 supports the Basic Multilingual Plane. Code2001 was a designed to support the Supplementary Multilingual Plane, with ISO 8859-1 characters shared with Code2000 for compatibility. A third font, Code2002, was left substantially unfinished and never officially released.

In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 15.1, five of the planes have assigned code points (characters), and seven are named.

<span class="mw-page-title-main">GNU Unifont</span> Duospaced bitmap font

GNU Unifont is a free Unicode bitmap font created by Roman Czyborra. The main Unifont covers all of the Basic Multilingual Plane (BMP). The "upper" companion covers significant parts of the Supplementary Multilingual Plane (SMP). The "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

The Unicode Standard assigns various properties to each Unicode character and code point.

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the second-to-last block of the Basic Multilingual Plane, followed only by the short Specials block at U+FFF0–FFFF. Its block name in Unicode 1.0 was Halfwidth and Fullwidth Variants.

CJK Unified Ideographs Extension I is a Unicode block comprising CJK Unified Ideographs included in drafts of an amendment to China's GB 18030 standard circulated in 2022 and 2023, which were fast-tracked into Unicode in 2023.

References

  1. 1 2 3 4 "Unicode Blocks data file, Unicode version 15.1". Unicode Consortium. Retrieved 2023-09-12.
  2. "Glossary". www.unicode.org. Retrieved 2022-08-07.
  3. "Private-Use Characters, Noncharacters & Sentinels FAQ". www.unicode.org. Retrieved 2023-07-24.
  4. "Unicode Core Specification, Chapter 4: Character Properties" (PDF). Retrieved 2021-09-15.
  5. "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. Version 1.0. Unicode Consortium.
  6. 1 2 3 "Appendix E: Block Names" (PDF). The Unicode Standard. Version 1.1. Unicode Consortium.