Variation Selectors (Unicode block)

Last updated
Variation Selectors
RangeU+FE00..U+FE0F
(16 code points)
Plane BMP
Scripts Inherited
Assigned16 code points
Unused0 reserved code points
Unicode version history
3.2 (2002)16 (+16)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Variation Selectors is a Unicode block containing 16 variation selectors used to specify a glyph variant for a preceding character. They are currently used to specify standardized variation sequences for mathematical symbols, emoji symbols, 'Phags-pa letters, and CJK unified ideographs corresponding to CJK compatibility ideographs. At present only standardized variation sequences with VS1, VS2, VS3, VS15 and VS16 have been defined; VS15 and VS16 are reserved to request that a character should be displayed as text or as an emoji respectively. [3] [4]

Contents

These combining characters are named variation selector-1 (for U+FE00) through to variation selector-16 (U+FE0F), and are abbreviated VS1 – VS16. Each applies to the immediately preceding character.

As of Unicode 13.0: [5]

Variation Selectors [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+FE0x VS 
1
 VS 
2
 VS 
3
 VS 
4
 VS 
5
 VS 
6
 VS 
7
 VS 
8
 VS 
9
 VS 
10
 VS 
11
 VS 
12
 VS 
13
 VS 
14
 VS 
15
 VS 
16
Notes
1. ^ As of Unicode version 15.1

This list is continued in the Variation Selectors Supplement.

See also

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Variation Selectors block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
3.2U+FE00..FE0F16L2/97-260Hiura, Hideki; Kobayashi, Tatsuo (1997-12-01), Plane 14 Variant Tag
L2/98-039 Aliprand, Joan; Winkler, Arnold (1998-02-24), "2.D.4 Variant Tag Mechanism", Preliminary Minutes - UTC #74 & L2 #171, Mountain View, CA - December 5, 1997
L2/98-277Hiura, Hideki; Kobayashi, Tatsuo (1998-07-29), Plane 14 Variant tag
L2/98-281R (pdf, html)Aliprand, Joan (1998-07-31), "III.E.3 Variant Tagging (III.E.3)", Unconfirmed Minutes - UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998
L2/00-187 Moore, Lisa (2000-08-23), "Variation Selector", UTC minutes -- Boston, August 8-11, 2000
L2/01-268 Freytag, Asmus (2001-06-27), Variant selector
L2/01-309 Jenkins, John (2001-08-08), Variation selectors and Han
L2/01-324R Davis, Mark (2001-08-17), Variation Selectors [document has incorrect L2 ID number]
L2/01-295R Moore, Lisa (2001-11-06), "88-M5", Minutes from the UTC/L2 meeting #88
L2/02-154 N2403 Umamaheswaran, V. S. (2002-04-22), "7.12", Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore, 2001-10-15/19
L2/17-086 Burge, Jeremy; et al. (2017-03-27), Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component
L2/17-103 Moore, Lisa (2017-05-18), "E.1.7 Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component", UTC #151 Minutes
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not explicitly categorize these characters as being "letterlike."

Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example:

Supplemental Arrows-B is a Unicode block containing miscellaneous arrows, arrow tails, crossing arrows used in knot descriptions, curved arrows, and harpoons.

Miscellaneous Symbols and Arrows is a Unicode block containing arrows and geometric shapes with various fills, astrological symbols, technical symbols, intonation marks, and others.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Miscellaneous Symbols and Pictographs is a Unicode block containing meteorological and astronomical symbols, emoji characters largely for compatibility with Japanese telephone carriers' implementations of Shift JIS, and characters originally from the Wingdings and Webdings fonts found in Microsoft Windows.

A variant form is a different glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character.

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

Mahjong Tiles is a Unicode block containing characters depicting the standard set of tiles used in the game of Mahjong.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Egyptian Hieroglyphs is a Unicode block containing the Gardiner's sign list of Egyptian hieroglyphs.

<span class="mw-page-title-main">Enclosed Ideographic Supplement</span> Unicode character block

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the second-to-last block of the Basic Multilingual Plane, followed only by the short Specials block at U+FFF0–FFFF. Its block name in Unicode 1.0 was Halfwidth and Fullwidth Variants.

Transport and Map Symbols is a Unicode block containing transportation and map icons, largely for compatibility with Japanese telephone carriers' emoji implementations of Shift JIS, and to encode characters in the Wingdings and Wingdings 2 character sets.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. "StandardizedVariants.txt". Unicode Consortium. 2015-11-20. Retrieved 2016-08-28.
  4. "Emoji Variation Sequences". Unicode Consortium. 2020-09-18. Retrieved 2020-11-18.
  5. "UCD: Standardized Variation Sequences". Unicode Consortium.