Variation Selectors Supplement | |
---|---|
Range | U+E0100..U+E01EF (240 code points) |
Plane | SSP |
Scripts | Inherited |
Assigned | 240 code points |
Unused | 0 reserved code points |
Unicode version history | |
4.0 (2003) | 240 (+240) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1] [2] |
Variation Selectors Supplement is a Unicode block containing additional variation selectors beyond those found in the Variation Selectors block.
These combining characters are named variation selector-17 (for U+E0100) through to variation selector-256 (U+E01EF), abbreviated VS17 – VS256.
As of 12 December 2017 [update] , VS17 (U+E0100) to VS48 (U+E011F) are used in ideographic variation sequences in the Unicode Ideographic Variation Database (IVD). [3] [4] These selectors are known as Ideographic Variation Selectors (IVS). They are not listed in the list of standardized variation sequence, instead they are listed in another Ideographic Variation Database. [3]
The following IVS collections are currently registered in the IVD: [3]
Region | Name | Purpose | First registered | Last updated | Number of sequences | Chart |
---|---|---|---|---|---|---|
![]() | Adobe-Japan1 | CID-keyed Japanese OpenType fonts. Defines at least one sequence for every Japanese kanji from the Adobe-Japan1 collection present in Unicode, even for those with only one glyph, both as future-proofing and to allow that (Japan-region) glyph to be uniquely referenced. [5] | 2007-12-14 | 2022-09-13 | 14684 | |
![]() | Hanyo-Denshi | Unicode characters corresponding to more than one glyph collected by the Han'yō Denshi programme, a union of the character repertoires of the legacy kanji character sets used by multiple administrative systems in Japan (precursor to Moji Jōhō Kiban). [6] Approximately 60% of the initial registration matches Adobe-Japan1 glyphs, but the existing Adobe-Japan1 variation sequences are not used for them. [7] | 2010-11-14 | 2012-03-02 | 13045 | All Han'yō Denshi sequences |
![]() | Moji_Joho | Unicode characters corresponding to more than one entry in the Moji Jōhō Kiban, a database of kanji used for administrative purposes in Japan. Supersedes and deprecates the Hanyo-Denshi collection, from which it retains 9866 of the existing IVSes. [6] | 2014-05-16 | 2017-12-12 | 11384 | All Moji Jōhō sequences |
![]() | MSARG | Macao Supplementary Character Set (MSCS) | 2016-08-15 | 2020-11-06 | 154 | All MSCS sequences |
![]() | KRName | Standard character variants permitted in personal names in South Korea | 2017-12-12 | 36 | All Korean name sequences |
Similarly to the Moji Jōhō Kiban's role in Japan, the character repertoire of CNS 11643 (including draft revisions) is used for administrative purposes in Taiwan. [8] In some cases, multiple of these correspond to a single Unicode character. [9] Many of these cases are currently handled with mappings to the Supplementary Private Use Area. [9] However, the Taipei Computer Association, which represents the interests of Taiwan in the Ideographic Research Group, has been evaluating the feasibility of registering an additional IVD collection in the future. [9] [10]
Variation Selectors Supplement [1] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+E010x | VS 17 | VS 18 | VS 19 | VS 20 | VS 21 | VS 22 | VS 23 | VS 24 | VS 25 | VS 26 | VS 27 | VS 28 | VS 29 | VS 30 | VS 31 | VS 32 |
U+E011x | VS 33 | VS 34 | VS 35 | VS 36 | VS 37 | VS 38 | VS 39 | VS 40 | VS 41 | VS 42 | VS 43 | VS 44 | VS 45 | VS 46 | VS 47 | VS 48 |
U+E012x | VS 49 | VS 50 | VS 51 | VS 52 | VS 53 | VS 54 | VS 55 | VS 56 | VS 57 | VS 58 | VS 59 | VS 60 | VS 61 | VS 62 | VS 63 | VS 64 |
U+E013x | VS 65 | VS 66 | VS 67 | VS 68 | VS 69 | VS 70 | VS 71 | VS 72 | VS 73 | VS 74 | VS 75 | VS 76 | VS 77 | VS 78 | VS 79 | VS 80 |
U+E014x | VS 81 | VS 82 | VS 83 | VS 84 | VS 85 | VS 86 | VS 87 | VS 88 | VS 89 | VS 90 | VS 91 | VS 92 | VS 93 | VS 94 | VS 95 | VS 96 |
U+E015x | VS 97 | VS 98 | VS 99 | VS 100 | VS 101 | VS 102 | VS 103 | VS 104 | VS 105 | VS 106 | VS 107 | VS 108 | VS 109 | VS 110 | VS 111 | VS 112 |
U+E016x | VS 113 | VS 114 | VS 115 | VS 116 | VS 117 | VS 118 | VS 119 | VS 120 | VS 121 | VS 122 | VS 123 | VS 124 | VS 125 | VS 126 | VS 127 | VS 128 |
U+E017x | VS 129 | VS 130 | VS 131 | VS 132 | VS 133 | VS 134 | VS 135 | VS 136 | VS 137 | VS 138 | VS 139 | VS 140 | VS 141 | VS 142 | VS 143 | VS 144 |
U+E018x | VS 145 | VS 146 | VS 147 | VS 148 | VS 149 | VS 150 | VS 151 | VS 152 | VS 153 | VS 154 | VS 155 | VS 156 | VS 157 | VS 158 | VS 159 | VS 160 |
U+E019x | VS 161 | VS 162 | VS 163 | VS 164 | VS 165 | VS 166 | VS 167 | VS 168 | VS 169 | VS 170 | VS 171 | VS 172 | VS 173 | VS 174 | VS 175 | VS 176 |
U+E01Ax | VS 177 | VS 178 | VS 179 | VS 180 | VS 181 | VS 182 | VS 183 | VS 184 | VS 185 | VS 186 | VS 187 | VS 188 | VS 189 | VS 190 | VS 191 | VS 192 |
U+E01Bx | VS 193 | VS 194 | VS 195 | VS 196 | VS 197 | VS 198 | VS 199 | VS 200 | VS 201 | VS 202 | VS 203 | VS 204 | VS 205 | VS 206 | VS 207 | VS 208 |
U+E01Cx | VS 209 | VS 210 | VS 211 | VS 212 | VS 213 | VS 214 | VS 215 | VS 216 | VS 217 | VS 218 | VS 219 | VS 220 | VS 221 | VS 222 | VS 223 | VS 224 |
U+E01Dx | VS 225 | VS 226 | VS 227 | VS 228 | VS 229 | VS 230 | VS 231 | VS 232 | VS 233 | VS 234 | VS 235 | VS 236 | VS 237 | VS 238 | VS 239 | VS 240 |
U+E01Ex | VS 241 | VS 242 | VS 243 | VS 244 | VS 245 | VS 246 | VS 247 | VS 248 | VS 249 | VS 250 | VS 251 | VS 252 | VS 253 | VS 254 | VS 255 | VS 256 |
Notes
|
The following Unicode-related documents record the purpose and process of defining specific characters in the Variation Selectors Supplement block:
Version | Final code points [a] | Count | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|
4.0 | U+E0100..E01EF | 240 | L2/97-260 | Hiura, Hideki; Kobayashi, Tatsuo (1997-12-01), Plane 14 Variant Tag | |
L2/98-039 | Aliprand, Joan; Winkler, Arnold (1998-02-24), "2.D.4 Variant Tag Mechanism", Preliminary Minutes - UTC #74 & L2 #171, Mountain View, CA - December 5, 1997 | ||||
L2/98-277 | Hiura, Hideki; Kobayashi, Tatsuo (1998-07-29), Plane 14 Variant tag | ||||
L2/98-281R (pdf, html) | Aliprand, Joan (1998-07-31), "III.E.3 Variant Tagging (III.E.3)", Unconfirmed Minutes – UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998 | ||||
L2/01-268 | Freytag, Asmus (2001-06-27), Variant selector | ||||
L2/01-309 | Jenkins, John (2001-08-08), Variation selectors and Han | ||||
L2/01-324R | Davis, Mark (2001-08-17), Variation Selectors [document has incorrect L2 ID number] | ||||
L2/01-295R | Moore, Lisa (2001-11-06), "88-M5", Minutes from the UTC/L2 meeting #88 | ||||
L2/02-154 | N2403 | Umamaheswaran, V. S. (2002-04-22), "7.12", Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore, 2001-10-15/19 | |||
L2/02-372 | N2453 (pdf, doc) | Umamaheswaran, V. S. (2002-10-30), "M42.21 (Amendment 1 to 10646-2)", Unconfirmed minutes of WG 2 meeting 42 | |||
|
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.
GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0.
The CNS 11643 character set, also officially known as the Chinese Standard Interchange Code or CSIC, is officially the standard character set of Taiwan. Published and draft editions of CNS 11643 remain the source standards for Unicode reference glyphs for CJK Unified Ideographs submitted for use in Taiwan, and the character repertoire of CNS 11643 continues to be updated and used for administrative purposes in Taiwan.
The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Set. IRG is tasked with preparing and reviewing sets of CJK unified ideographs for eventual inclusion in both ISO/IEC 10646 and The Unicode Standard. The IRG is composed of representatives from national standards bodies from China, Japan, South Korea, Vietnam, and other regions that have historically used Chinese characters, as well as experts from liaison organizations such as the SAT Daizōkyō Text Database Committee (SAT), Taipei Computer Association (TCA), and the Unicode Technical Committee (UTC). The group holds two meetings every year lasting 4-5 days each, subsequently reporting its activities to its parent ISO/IEC JTC 1/SC 2 (SC2/WG2) committee.
Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType computer fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.
Biangbiang noodles, alternatively known as youpo chemian in Chinese, are a type of Chinese noodle originating from Shaanxi cuisine. The noodles, touted as one of the "eight curiosities" of Shaanxi (陕西八大怪), are described as being like a belt, owing to their thickness and length.
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 16.0, Unicode defines a total of 97,680 characters.
Chinese characters may have several variant forms—visually distinct glyphs that represent the same underlying meaning and pronunciation. Variants of a given character are allographs of one another, and many are directly analogous to allographs present in the English alphabet, such as the double-storey ⟨a⟩ and single-storey ⟨ɑ⟩ variants of the letter A, with the latter more commonly appearing in handwriting. Some contexts require usage of specific variants.
Ken Roger Lunde is an American specialist in information processing for East Asian languages.
KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.
CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.
Tatsuo Kobayashi is a Japanese web architect who specializes in international standardization.
A variant form is an alternate glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character.
CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese, Japanese, Korean and Vietnamese characters. When contrasted with other blocks containing CJK Unified Ideographs, it is also referred to as the Unified Repertoire and Ordering (URO).
CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.
CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2002 and 2006, plus five "urgently needed" characters added in Unicode versions 14.0 and 15.0, some of which had previously been mistakenly unified with other characters.
CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.
Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards.
Manichaean is a Unicode block containing characters historically used for writing Sogdian, Parthian, and the dialects of Fars.
CJK Unified Ideographs Extension I is a Unicode block comprising CJK Unified Ideographs included in drafts of an amendment to China's GB 18030 standard circulated in 2022 and 2023, which were fast-tracked into Unicode in 2023.
Note that all Adobe-Japan1-6 kanji, except those twenty seven pointed out above, are given IVS assignments, including those that have only one form assigned. This is to ensure that each Adobe-Japan1-6 kanji can be uniquely and explicitly identified without referencing their default (IVS-less) encoding, and because kanji may be added in future Adobe-Japan1 Supplements that may be variants of such kanji.