Ideographic Description Characters | |
---|---|
Range | U+2FF0..U+2FFF (16 code points) |
Plane | BMP |
Scripts | Common |
Assigned | 16 code points |
Unused | 0 reserved code points |
Source standards | GBK (U+2FF0–U+2FFB only) |
Unicode version history | |
3.0 (1999) | 12 (+12) |
15.1 (2023) | 16 (+4) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1] [2] |
Ideographic Description Characters is a Unicode block containing graphic characters used for describing CJK ideographs. They are used in Ideographic Description Sequences (IDS) to provide a description of an ideograph, in terms of what other ideographs make it up and how they are laid out relative to one another. [3] An IDS provides the reader with a description of an ideograph that cannot be represented properly, usually because it is not encoded in Unicode; rendering systems are not intended to automatically compose the pieces into a complete ideograph, and the descriptions are not standardized.
U+2FF0 to U+2FFB were introduced from GBK; U+2FFC to U+2FFF were devised later and introduced in Unicode 15.1 (2023).
Ideographic Description Characters [1] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+2FFx | ⿰ | ⿱ | ⿲ | ⿳ | ⿴ | ⿵ | ⿶ | ⿷ | ⿸ | ⿹ | ⿺ | ⿻ | | | | |
Notes
|
Ideographic Description Sequences are sequences of characters that represent a Chinese character structure as defined by the Unicode standard.
Below are the 16 characters as defined by Unicode in this block:
Unicode | Char | Meaning | Example 1 | IDS | Example 2 | IDS |
---|---|---|---|---|---|---|
U+2FF0 | ⿰ | Two components combined left to right | 相 | ⿰木目 | 𠁢 | ⿰丨㇍ |
U+2FF1 | ⿱ | Two components combined above to below | 杏 | ⿱木口 | 𠚤 | ⿱𠂊丶 |
U+2FF2 | ⿲ | Three components combined left to middle and right | 衍 | ⿲彳氵亍 | 𠂗 | ⿲丿夕乚 |
U+2FF3 | ⿳ | Three components combined above to middle and below | 京 | ⿳亠口小 | 𠋑 | ⿳亼目口 |
U+2FF4 | ⿴ | One component fully wrapping another component | 回 | ⿴囗口 | 𠀬 | ⿴㐁人 |
U+2FF5 | ⿵ | One component surround three sides of another component (opening at bottom) | 凰 | ⿵几皇 | 𧓉 | ⿵齊虫 |
U+2FF6 | ⿶ | One component surround three sides of another component (opening at top) | 凶 | ⿶凵㐅 | 义 | ⿶乂丶 |
U+2FF7 | ⿷ | One component surround three sides of another component (opening at right) | 匠 | ⿷匚斤 | 𧆬 | ⿷虎九 |
U+2FF8 | ⿸ | One component surround top and left side of another component | 病 | ⿸疒丙 | 𤆯 | ⿸耂火 |
U+2FF9 | ⿹ | One component surround top and right side of another component | 戒 | ⿹戈廾 | 𢧌 | ⿹或壬 |
U+2FFA | ⿺ | One component surround bottom and left side of another component | 超 | ⿺走召 | 𥘶 | ⿺礼分 |
U+2FFB | ⿻ | Two components overlapped | 巫 | ⿻工从 | 𣏃 | ⿻木⿻コ一 |
U+2FFC | | One component surround three sides of another component (opening at left) | 㕚 | 叉丶 | 𬺹 | コ二 |
U+2FFD | | One component surround bottom and right side of another component | 氷 | 水丶 | 斗 | ⺀十 |
U+2FFE | | Horizontal reflection | 卐 | 卍 | 𣥄 | 正 |
U+2FFF | | Rotation | 𠕄 | 凹 | 𠄔 | 予 |
Two other related ideographic description characters are not encoded in this Unicode block, but of which may be used in ideographic description sequences:
Unicode | Char | Block | Meaning | Example 1 | IDS | Example 2 | IDS |
---|---|---|---|---|---|---|---|
U+303E | 〾 | CJK Symbols and Punctuation | Variant but not equivalent | 㬵 (U+3B35) | 〾胶 (U+80F6) [4] | 𫜵 | 〾爫 [5] |
U+31EF | | CJK Strokes | Subtraction | 乒 | 兵丶 | 𧰨 | 豕一 |
This is the syntax of IDS in EBNF:
IDS :=Ideographic |Radical |CJK_Stroke |Private Use |U+FF1F |IDS_UnaryOperator IDS |IDS_BinaryOperator IDS IDS |IDS_TrinaryOperator IDS IDS IDS CJK_Stroke :=U+31C0 |U+31C1 |...|U+31E3IDS_UnaryOperator :=U+2FFE |U+2FFFIDS_BinaryOperator :=U+2FF0 |U+2FF1 |U+2FF4 |...|U+2FFD |U+31EFIDS_TrinaryOperator:=U+2FF2 |U+2FF3
The following Unicode-related documents record the purpose and process of defining specific characters in the Ideographic Description Characters block:
Version | Final code points [lower-alpha 1] | Count | UTC ID | L2 ID | WG2 ID | IRG ID | Document |
---|---|---|---|---|---|---|---|
3.0 | U+2FF0..2FFB | 12 | X3L2/95-111 | N1284 | Ideographic Structure Symbol (additional request), 1995-11-07 | ||
N1303 (html, doc) | Umamaheswaran, V. S.; Ksar, Mike (1996-01-26), "8.13 Ideographic structure symbols", Minutes of Meeting 29, Tokyo | ||||||
N1348 | Ideographic Components and Composition Scheme, 1996-02-05 | ||||||
N1357 | Revised Ideographic Structure Symbols, 1996-04-12 | ||||||
N1353 | Umamaheswaran, V. S.; Ksar, Mike (1996-06-25), "9", Draft minutes of WG2 Copenhagen Meeting # 30 | ||||||
L2/97-026 | N1494 | IRG proposal: Ideographic structure character, 1996-06-27 | |||||
N1430 | N365 | Proposal Summary Form: Ideographic Structure Character, 1996-08-01 | |||||
N1453 | Ksar, Mike; Umamaheswaran, V. S. (1996-12-06), "9.6 Ideographic Structure Characters", WG 2 Minutes - Quebec Meeting 31 | ||||||
L2/97-023 | N1486 | N437 | IRG #8 Resolutions, 1997-01-16 | ||||
N1489 | Supplement to Ideographic Components and Composition Schemes, 1997-01-16 | ||||||
N1490 | N436 | Response to WG2 question on Ideographic Structure Characters, 1997-01-16 | |||||
L2/97-030 | N1503 (pdf, doc) | Umamaheswaran, V. S.; Ksar, Mike (1997-04-01), "9.6", Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24 | |||||
L2/97-114 | N1544 (html, doc) | N453 | Sato, T. K. (1997-04-08), Questions on the "Han structure method" described in WG2 N1490 (IRG N436) | ||||
L2/97-255R | Aliprand, Joan (1997-12-03), "4.B.2 Ideographic Structure Characters", Approved Minutes – UTC #73 & L2 #170 joint meeting, Palo Alto, CA – August 4-5, 1997 | ||||||
N1680 | Project Sub-Division Proposal on Scheme of Ideograph Description Sequence, 1997-12-18 | ||||||
N1782 | Clause X Ideographic Description Sequence (IDS) – IRG N575, 1998-05-06 | ||||||
L2/98-158 | Aliprand, Joan; Winkler, Arnold (1998-05-26), "SC2 SC2 Action re Ideographic Description Sequences", Draft Minutes – UTC #76 & NCITS Subgroup L2 #173 joint meeting, Tredyffrin, Pennsylvania, April 20-22, 1998 | ||||||
N1842 | Proposed text for a Draft for amendment 28 - Ideographic Description Sequences, 1998-06-03 | ||||||
L2/98-286 | N1703 | Umamaheswaran, V. S.; Ksar, Mike (1998-07-02), "9.5", Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20, The original proposal was to use character composition. It has changed from being composition to description over its three year development. | |||||
L2/98-317 | N1892 (pdf, doc) | Combined CD registration and consideration ballot on WD for 10646-1/Amd. 28, AMENDMENT 28: Ideographic description characters, 1998-10-22 | |||||
L2/99-010 | N1903 (pdf, html, doc) | Umamaheswaran, V. S. (1998-12-30), "10.3", Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25 | |||||
L2/99-072.1 | N1971 | Irish Comments on SC 2 N 3186, 1999-01-19 | |||||
L2/99-072 | N1970 (html, doc) | Summary of Voting on SC 2 N 3186, PDAM ballot on WD for 10646-1/Amd. 28: Ideographic description characters, 1999-02-05 | |||||
N2023 | Paterson, Bruce (1999-04-06), FPDAM 28 Text - Ideographic Description Characters | ||||||
L2/99-120 | Text for FPDAM ballot of ISO/IEC 10646, Amd. 28 - Ideographic description characters, 1999-04-07 | ||||||
UTC/1999-014 | Jenkins, John (1999-06-01), Recursion depth limit for IDC's | ||||||
UTC/1999-015 | Whistler, Ken (1999-06-01), Re: Brief note on length of ideograph descriptions | ||||||
UTC/1999-020 | Jenkins, John (1999-06-04), Diagram and language [for Ideograph Description Sequences] | ||||||
L2/99-176R | Moore, Lisa (1999-11-04), "Recursion Limit for Ideographic Description Characters", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999 | ||||||
L2/99-232 | N2003 | Umamaheswaran, V. S. (1999-08-03), "6.1.2 PDAM28 - Ideographic Description Characters", Minutes of WG 2 meeting 36, Fukuoka, Japan, 1999-03-09--15 | |||||
L2/99-253 | N2067 | Summary of Voting on SC 2 N 3312, ISO 10646-1/FPDAM 28 - Ideographic description characters, 1999-08-19 | |||||
L2/99-301 | N2123 | Disposition of Comments Report on SC 2 N 3312, ISO/IEC 10646-1/FPDAM 28, AMENDMENT 28: Ideographic description characters, 1999-09-20 | |||||
L2/99-302 | N2124 | Paterson, Bruce (1999-09-24), Revised Text for FDAM ballot of ISO/IEC 10646-1/FDAM 28, AMENDMENT 28: Ideographic description characters | |||||
L2/00-010 | N2103 | Umamaheswaran, V. S. (2000-01-05), "6.4.3", Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13—16 | |||||
L2/00-045 | Summary of FDAM voting: ISO 10646 Amd. 28: Ideographic description characters, 2000-01-31 | ||||||
L2/02-221 | N2480 | Cook, Richard (2002-05-18), Proposal to add Ideographic Description Characters (IDC) to the UCS | |||||
L2/02-436 | N2534 | N955 | IRG Radical Classification, 2002-11-21 | ||||
L2/12-087 | Proposed Changes to ISO/IEC 10646 Annex I, Ideographic Description Characters, 2012-02-09 | ||||||
L2/12-007 | Moore, Lisa (2012-02-14), "Consensus 130-C13", UTC #130 / L2 #227 Minutes, Submit L2/12-087 on extensions to ideographic description sequences to WG2. | ||||||
L2/15-065 | Jenkins, John (2015-02-02), Proposal to Add IDS Links to Online Unihan Database | ||||||
L2/15-070 | Davis, Mark (2015-02-03), IDS in Unihan | ||||||
L2/15-313 | Lunde, Ken (2015-11-03), Request for IDS Data | ||||||
15.1 | U+2FFC..2FFF | 4 | L2/17-386 | N2273 | Yang, Tao; Chan, Eiso; Wang, Yifan (2017-10-13), Submission of 3 IDCes | ||
L2/17-379 | Lunde, Ken (2017-10-20), "Proposed Ideographic Description Characters (IDCs)", IRG #49 Liaison Report | ||||||
L2/18-012 | Yang, Tao; Chan, Eiso; Wang, Yifan (2018-01-05), Proposal of Four IDCs | ||||||
L2/18-168 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Chapman, Chris; Cook, Richard (2018-04-28), "22. IDCs", Recommendations to UTC #155 April-May 2018 on Script Proposals | ||||||
L2/21-118R | N2492 | Lunde, Ken; Jenkins, John H. (2021-08-11), Preliminary proposal to add a new provisional kIDS property (Unihan) | |||||
L2/22-136 | West, Andrew (2022-07-08), Feedback on Proposals to Encode New Ideographic Description Characters | ||||||
L2/22-191 | N2572 | Lunde, Ken; Jenkins, John; West, Andrew (2022-08-24), Proposal to encode five new Ideographic Description Characters | |||||
L2/22-227 | SAT Feedback to "Preliminary proposal to add a new provisional kIDS property (Unihan)" (IRGN2492) and "Proposal to encode five new Ideographic Description Characters" (IRGN2572), 2022-08-29 | ||||||
L2/22-228 | Fan, Ming (2022-09-02), Feedback on IRGN2572 "Proposal to encode 5 new ideograph description characters" | ||||||
L2/22-247 | Lunde, Ken (2022-11-01), "29", CJK & Unihan Group Recommendations for UTC #173 Meeting | ||||||
L2/22-241 | Constable, Peter (2022-11-09), "E.1 29", Approved Minutes of UTC Meeting 173 | ||||||
|
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.
The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Set. IRG is tasked with preparing and reviewing sets of CJK unified ideographs for eventual inclusion in both ISO/IEC 10646 and The Unicode Standard. The IRG is composed of representatives from national standards bodies from China, Japan, South Korea, Vietnam, and other regions that have historically used Chinese characters, as well as experts from liaison organizations such as the Taipei Computer Association (TCA) and the Unicode Technical Committee (UTC). The group holds two meetings every year lasting 4-5 days each, subsequently reporting its activities to its parent ISO/IEC JTC 1/SC 2 (SC2/WG2) committee.
Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.
The 214 Kangxi radicals, also known as Zihui radicals, were collated in the 18th-century Kangxi Dictionary to aid categorization of Chinese characters. They are primarily sorted by stroke count. They are the most popular system of radicals for dictionaries that order characters by radical and stroke count. They are encoded in Unicode alongside other CJK characters, under the block "Kangxi radicals", while graphical variants are included with in the "CJK Radicals Supplement".
Several systems have been proposed for describing the internal structure of Chinese characters, including their strokes, components, and the stroke order, and the location of each in the character's ideal square. This information is useful for identifying variants of characters that are unified into one code point by Unicode and ISO/IEC 10646, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode. Many aim to work for regular script, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.
Biangbiang noodles, alternatively known as youpo chemian in Chinese, are a type of Chinese noodle originating from Shaanxi cuisine. The noodles, touted as one of the "eight curiosities" of Shaanxi (陕西八大怪), are described as being like a belt, owing to their thickness and length.
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.
Chinese characters may have several variant forms—visually distinct glyphs that represent the same underlying meaning and pronunciation. Variants of a given character are allographs of one another, and many are directly analogous to allographs present in the English alphabet, such as the double-storey ⟨a⟩ and single-storey ⟨ɑ⟩ variants of the letter A, with the latter more commonly appearing in handwriting. Some contexts require usage of specific variants.
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 15.1, five of the planes have assigned code points (characters), and seven are named.
CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs submitted to the Ideographic Research Group between 1992 and 1998, plus ten ideographs added in Unicode 13.0 which had previously been mistakenly unified with others.
CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.
A variant form is an alternate glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character.
CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese, Japanese, Korean and Vietnamese characters. When contrasted with other blocks containing CJK Unified Ideographs, it is also referred to as the Unified Repertoire and Ordering (URO).
CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.
CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2002 and 2006, plus five "urgently needed" characters added in Unicode versions 14.0 and 15.0, some of which had previously been mistakenly unified with other characters.
CJK Unified Ideographs Extension D is a Unicode block containing uncommon CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, some of which are in current use. Much smaller than most Unicode blocks for CJK unified ideographs, Extension D consists of characters which were submitted to the Ideographic Research Group as "urgently needed characters" between 2006 and 2009. Characters submitted during the same period which were needed less urgently were included in CJK Unified Ideographs Extension E instead.
CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.
Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.
CJK Unified Ideographs Extension E is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2006 and 2013, excluding the characters submitted as "urgently needed" between 2006 and 2009, which were included in CJK Unified Ideographs Extension D.
CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language, which were submitted to the Ideographic Research Group between 2012 and 2015.