Ideographic Description Characters

Last updated
Ideographic Description Characters
RangeU+2FF0..U+2FFF
(16 code points)
Plane BMP
Scripts Common
Assigned16 code points
Unused0 reserved code points
Source standards GBK (U+2FF0–U+2FFB only)
Unicode version history
3.0 (1999)12 (+12)
15.1 (2023)16 (+4)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Ideographic Description Characters is a Unicode block containing graphic characters used for describing CJK ideographs. They are used in Ideographic Description Sequences (IDS) to provide a description of an ideograph, in terms of what other ideographs make it up and how they are laid out relative to one another. [3] An IDS provides the reader with a description of an ideograph that cannot be represented properly, usually because it is not encoded in Unicode; rendering systems are not intended to automatically compose the pieces into a complete ideograph, and the descriptions are not standardized.

Contents

U+2FF0 to U+2FFB were introduced from GBK; U+2FFC to U+2FFF were devised later and introduced in Unicode 15.1 (2023).

Block

Ideographic Description Characters [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+2FFx⿿
Notes
1. ^ As of Unicode version 15.1

Ideographic Description Sequences

Ideographic Description Sequences are sequences of characters that represent a Chinese character structure as defined by the Unicode standard.

Below are the 16 characters as defined by Unicode in this block:

UnicodeCharMeaningExample 1IDSExample 2IDS
U+2FF0Two components combined left to right⿰木目𠁢⿰丨㇍
U+2FF1Two components combined above to below⿱木口𠚤⿱𠂊丶
U+2FF2Three components combined left to middle and right⿲彳氵亍𠂗⿲丿夕乚
U+2FF3Three components combined above to middle and below⿳亠口小𠋑⿳亼目口
U+2FF4One component fully wrapping another component⿴囗口𠀬⿴㐁人
U+2FF5One component surround three sides of another component (opening at bottom)⿵几皇𧓉⿵齊虫
U+2FF6One component surround three sides of another component (opening at top)⿶凵㐅⿶乂丶
U+2FF7One component surround three sides of another component (opening at right)⿷匚斤𧆬⿷虎九
U+2FF8One component surround top and left side of another component⿸疒丙𤆯⿸耂火
U+2FF9One component surround top and right side of another component⿹戈廾𢧌⿹或壬
U+2FFAOne component surround bottom and left side of another component⿺走召𥘶⿺礼分
U+2FFBTwo components overlapped⿻工从𣏃⿻木⿻コ一
U+2FFCOne component surround three sides of another component (opening at left)⿼叉丶𬺹⿼コ二
U+2FFDOne component surround bottom and right side of another component⿽水丶⿽⺀十
U+2FFEHorizontal reflection⿾卍𣥄⿾正
U+2FFF⿿Rotation𠕄⿿凹𠄔⿿予

Two other related ideographic description characters are not encoded in this Unicode block, but of which may be used in ideographic description sequences:

UnicodeCharBlockMeaningExample 1IDSExample 2IDS
U+303E CJK Symbols and Punctuation Variant but not equivalent㬵 (U+3B35)〾胶 (U+80F6) [4] 𫜵〾爫 [5]
U+31EF CJK Strokes Subtraction㇯兵丶𧰨㇯豕一


This is the syntax of IDS in EBNF:

IDS :=Ideographic |Radical |CJK_Stroke |Private Use |U+FF1F |IDS_UnaryOperator IDS |IDS_BinaryOperator IDS IDS |IDS_TrinaryOperator IDS IDS IDS CJK_Stroke :=U+31C0 |U+31C1 |...|U+31E3IDS_UnaryOperator :=U+2FFE |U+2FFFIDS_BinaryOperator :=U+2FF0 |U+2FF1 |U+2FF4 |...|U+2FFD |U+31EFIDS_TrinaryOperator:=U+2FF2 |U+2FF3

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Ideographic Description Characters block:

Version Final code points [lower-alpha 1] Count UTC  ID L2  ID WG2  ID IRG  IDDocument
3.0U+2FF0..2FFB12X3L2/95-111N1284Ideographic Structure Symbol (additional request), 1995-11-07
N1303 (html, doc)Umamaheswaran, V. S.; Ksar, Mike (1996-01-26), "8.13 Ideographic structure symbols", Minutes of Meeting 29, Tokyo
N1348Ideographic Components and Composition Scheme, 1996-02-05
N1357Revised Ideographic Structure Symbols, 1996-04-12
N1353 Umamaheswaran, V. S.; Ksar, Mike (1996-06-25), "9", Draft minutes of WG2 Copenhagen Meeting # 30
L2/97-026N1494IRG proposal: Ideographic structure character, 1996-06-27
N1430N365Proposal Summary Form: Ideographic Structure Character, 1996-08-01
N1453 Ksar, Mike; Umamaheswaran, V. S. (1996-12-06), "9.6 Ideographic Structure Characters", WG 2 Minutes - Quebec Meeting 31
L2/97-023N1486 N437 IRG #8 Resolutions, 1997-01-16
N1489Supplement to Ideographic Components and Composition Schemes, 1997-01-16
N1490 N436 Response to WG2 question on Ideographic Structure Characters, 1997-01-16
L2/97-030 N1503 (pdf, doc)Umamaheswaran, V. S.; Ksar, Mike (1997-04-01), "9.6", Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24
L2/97-114 N1544 (html, doc) N453 Sato, T. K. (1997-04-08), Questions on the "Han structure method" described in WG2 N1490 (IRG N436)
L2/97-255R Aliprand, Joan (1997-12-03), "4.B.2 Ideographic Structure Characters", Approved Minutes – UTC #73 & L2 #170 joint meeting, Palo Alto, CA – August 4-5, 1997
N1680 Project Sub-Division Proposal on Scheme of Ideograph Description Sequence, 1997-12-18
N1782 Clause X Ideographic Description Sequence (IDS) – IRG N575, 1998-05-06
L2/98-158 Aliprand, Joan; Winkler, Arnold (1998-05-26), "SC2 SC2 Action re Ideographic Description Sequences", Draft Minutes – UTC #76 & NCITS Subgroup L2 #173 joint meeting, Tredyffrin, Pennsylvania, April 20-22, 1998
N1842 Proposed text for a Draft for amendment 28 - Ideographic Description Sequences, 1998-06-03
L2/98-286 N1703 Umamaheswaran, V. S.; Ksar, Mike (1998-07-02), "9.5", Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20, The original proposal was to use character composition. It has changed from being composition to description over its three year development.
L2/98-317 N1892 (pdf, doc)Combined CD registration and consideration ballot on WD for 10646-1/Amd. 28, AMENDMENT 28: Ideographic description characters, 1998-10-22
L2/99-010 N1903 (pdf, html, doc)Umamaheswaran, V. S. (1998-12-30), "10.3", Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25
L2/99-072.1 N1971 Irish Comments on SC 2 N 3186, 1999-01-19
L2/99-072 N1970 (html, doc)Summary of Voting on SC 2 N 3186, PDAM ballot on WD for 10646-1/Amd. 28: Ideographic description characters, 1999-02-05
N2023 Paterson, Bruce (1999-04-06), FPDAM 28 Text - Ideographic Description Characters
L2/99-120 Text for FPDAM ballot of ISO/IEC 10646, Amd. 28 - Ideographic description characters, 1999-04-07
UTC/1999-014 Jenkins, John (1999-06-01), Recursion depth limit for IDC's
UTC/1999-015 Whistler, Ken (1999-06-01), Re: Brief note on length of ideograph descriptions
UTC/1999-020 Jenkins, John (1999-06-04), Diagram and language [for Ideograph Description Sequences]
L2/99-176R Moore, Lisa (1999-11-04), "Recursion Limit for Ideographic Description Characters", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999
L2/99-232 N2003 Umamaheswaran, V. S. (1999-08-03), "6.1.2 PDAM28 - Ideographic Description Characters", Minutes of WG 2 meeting 36, Fukuoka, Japan, 1999-03-09--15
L2/99-253 N2067 Summary of Voting on SC 2 N 3312, ISO 10646-1/FPDAM 28 - Ideographic description characters, 1999-08-19
L2/99-301 N2123 Disposition of Comments Report on SC 2 N 3312, ISO/IEC 10646-1/FPDAM 28, AMENDMENT 28: Ideographic description characters, 1999-09-20
L2/99-302 N2124 Paterson, Bruce (1999-09-24), Revised Text for FDAM ballot of ISO/IEC 10646-1/FDAM 28, AMENDMENT 28: Ideographic description characters
L2/00-010 N2103 Umamaheswaran, V. S. (2000-01-05), "6.4.3", Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13—16
L2/00-045 Summary of FDAM voting: ISO 10646 Amd. 28: Ideographic description characters, 2000-01-31
L2/02-221 N2480 Cook, Richard (2002-05-18), Proposal to add Ideographic Description Characters (IDC) to the UCS
L2/02-436 N2534 N955IRG Radical Classification, 2002-11-21
L2/12-087 Proposed Changes to ISO/IEC 10646 Annex I, Ideographic Description Characters, 2012-02-09
L2/12-007 Moore, Lisa (2012-02-14), "Consensus 130-C13", UTC #130 / L2 #227 Minutes, Submit L2/12-087 on extensions to ideographic description sequences to WG2.
L2/15-065 Jenkins, John (2015-02-02), Proposal to Add IDS Links to Online Unihan Database
L2/15-070 Davis, Mark (2015-02-03), IDS in Unihan
L2/15-313 Lunde, Ken (2015-11-03), Request for IDS Data
15.1U+2FFC..2FFF4 L2/17-386 N2273 Yang, Tao; Chan, Eiso; Wang, Yifan (2017-10-13), Submission of 3 IDCes
L2/17-379 Lunde, Ken (2017-10-20), "Proposed Ideographic Description Characters (IDCs)", IRG #49 Liaison Report
L2/18-012 Yang, Tao; Chan, Eiso; Wang, Yifan (2018-01-05), Proposal of Four IDCs
L2/18-168 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Chapman, Chris; Cook, Richard (2018-04-28), "22. IDCs", Recommendations to UTC #155 April-May 2018 on Script Proposals
L2/21-118R N2492Lunde, Ken; Jenkins, John H. (2021-08-11), Preliminary proposal to add a new provisional kIDS property (Unihan)
L2/22-136 West, Andrew (2022-07-08), Feedback on Proposals to Encode New Ideographic Description Characters
L2/22-191 N2572Lunde, Ken; Jenkins, John; West, Andrew (2022-08-24), Proposal to encode five new Ideographic Description Characters
L2/22-227 SAT Feedback to "Preliminary proposal to add a new provisional kIDS property (Unihan)" (IRGN2492) and "Proposal to encode five new Ideographic Description Characters" (IRGN2572), 2022-08-29
L2/22-228 Fan, Ming (2022-09-02), Feedback on IRGN2572 "Proposal to encode 5 new ideograph description characters"
L2/22-247 Lunde, Ken (2022-11-01), "29", CJK & Unihan Group Recommendations for UTC #173 Meeting
L2/22-241 Constable, Peter (2022-11-09), "E.1 29", Approved Minutes of UTC Meeting 173
  1. Proposed code points and characters names may differ from final code points and names

See also

Related Research Articles

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Set. IRG is tasked with preparing and reviewing sets of CJK unified ideographs for eventual inclusion in both ISO/IEC 10646 and The Unicode Standard. The IRG is composed of representatives from national standards bodies from China, Japan, South Korea, Vietnam, and other regions that have historically used Chinese characters, as well as experts from liaison organizations such as the Taipei Computer Association (TCA) and the Unicode Technical Committee (UTC). The group holds two meetings every year lasting 4-5 days each, subsequently reporting its activities to its parent ISO/IEC JTC 1/SC 2 (SC2/WG2) committee.

<i>Mojikyō</i> Character encoding scheme

Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

The 214 Kangxi radicals, also known as Zihui radicals, were collated in the 18th-century Kangxi Dictionary to aid categorization of Chinese characters. They are primarily sorted by stroke count. They are the most popular system of radicals for dictionaries that order characters by radical and stroke count. They are encoded in Unicode alongside other CJK characters, under the block "Kangxi radicals", while graphical variants are included with in the "CJK Radicals Supplement".

Several systems have been proposed for describing the internal structure of Chinese characters, including their strokes, components, and the stroke order, and the location of each in the character's ideal square. This information is useful for identifying variants of characters that are unified into one code point by Unicode and ISO/IEC 10646, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode. Many aim to work for regular script, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.

<span class="mw-page-title-main">Biangbiang noodles</span> Type of Chinese noodles

Biangbiang noodles, alternatively known as youpo chemian in Chinese, are a type of Chinese noodle originating from Shaanxi cuisine. The noodles, touted as one of the "eight curiosities" of Shaanxi (陕西八大怪), are described as being like a belt, owing to their thickness and length.

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.

<span class="mw-page-title-main">Variant Chinese characters</span> Chinese characters outside of a standard

Chinese characters may have several variant forms—visually distinct glyphs that represent the same underlying meaning and pronunciation. Variants of a given character are allographs of one another, and many are directly analogous to allographs present in the English alphabet, such as the double-storey ⟨a⟩ and single-storey ⟨ɑ⟩ variants of the letter A, with the latter more commonly appearing in handwriting. Some contexts require usage of specific variants.

In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 15.1, five of the planes have assigned code points (characters), and seven are named.

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs submitted to the Ideographic Research Group between 1992 and 1998, plus ten ideographs added in Unicode 13.0 which had previously been mistakenly unified with others.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

A variant form is an alternate glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character.

CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese, Japanese, Korean and Vietnamese characters. When contrasted with other blocks containing CJK Unified Ideographs, it is also referred to as the Unified Repertoire and Ordering (URO).

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2002 and 2006, plus five "urgently needed" characters added in Unicode versions 14.0 and 15.0, some of which had previously been mistakenly unified with other characters.

CJK Unified Ideographs Extension D is a Unicode block containing uncommon CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, some of which are in current use. Much smaller than most Unicode blocks for CJK unified ideographs, Extension D consists of characters which were submitted to the Ideographic Research Group as "urgently needed characters" between 2006 and 2009. Characters submitted during the same period which were needed less urgently were included in CJK Unified Ideographs Extension E instead.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

<span class="mw-page-title-main">Enclosed Ideographic Supplement</span> Unicode character block

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

CJK Unified Ideographs Extension E is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2006 and 2013, excluding the characters submitted as "urgently needed" between 2006 and 2009, which were included in CJK Unified Ideographs Extension D.

CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language, which were submitted to the Ideographic Research Group between 2012 and 2015.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. IDS are described in chapter 18.2 of the Unicode Standard 9.0 on pages 689 through 692.
  4. "「㬵(U+3B35)」和「胶(U+80F6)」为什么在《康熙字典》收录了两次? - 知乎". www.zhihu.com. Retrieved 2023-09-21.
  5. "基本集扩充字考(五・完结)附扩充块新增字考". 知乎专栏 (in Chinese). Retrieved 2023-09-21.