CJK Unified Ideographs Extension I

Last updated
CJK Unified Ideographs Extension I
RangeU+2EBF0..U+2EE5F
(624 code points)
Plane SIP
Scripts Han
Assigned622 code points
Unused2 reserved code points
Unicode version history
15.1 (2023)622 (+622)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

CJK Unified Ideographs Extension I is a Unicode block comprising CJK Unified Ideographs included in drafts of an amendment to China's GB 18030 standard circulated in 2022 and 2023, which were fast-tracked into Unicode in 2023.

Contents

Background

Unlike most other sets of CJK unified ideographs, Extension I was not prepared and submitted by the Ideographic Research Group (IRG). [3]

GB 18030 is a mandatory national standard of the People's Republic of China (PRC). It defines a Unicode Transformation Format which retains compatibility with existing data in the earlier GBK and EUC-CN character encodings, and specifies particular Unicode characters which devices sold in China must support. [4] Its 2022 edition, GB 18030-2022, changed a number of required characters to map to standard Unicode code points, rather than to private use area code points.

In late 2022, the PRC made a draft of a further amendment to be made to GB 18030 available for public consultation. This draft would have placed 897 new sinographic characters in Plane 10 (hexadecimal: 0A), a yet-untitled astral Unicode plane. [5] This was motivated by a "strong need of citizen real-name certification in China". [6] Since it would impact ISO/IEC 10646 (the Universal Coded Character Set, the ISO standard synchronised with Unicode), the draft was circulated in ISO/IEC JTC 1/SC 2, the ISO subcommittee responsible for ISO 10646. The Chinese national body maintained that "ISO/IEC 10646 do not specify the purpose of the 0A plane", which ISO 10646 denotes as "reserved for future standardization", and that this use was therefore "not inappropriate". [5]

However, since the intent of ISO 10646 was for Plane 10 to be reserved for future allocation by ISO 10646 and Unicode via their usual ballot process, not for it to be allocated unilaterally by national standards bodies, this proposed move was criticised by experts and other national bodies as one which would "destabilize the synchronization" between GB 18030 and ISO/IEC 10646 (and thus Unicode), and which would make it impossible to conform to both with a single implementation, [5] effectively forking Unicode. At its meeting in March 2023, the IRG emphasised the importance of providing any subsequent GB 18030 amendment drafts to IRG experts in a timely manner, and of not "using the ISO/IEC 10646 standard inappropriately". [7]

As an alternative, the repertoire (eventually reduced to 622 characters after expert review) was fast-tracked into Unicode version 15.1 in September 2023, as the CJK Unified Ideographs Extension I block. [5] The characters constitute the "GIDC23" Unihan source, [8] defined as sourced from the "ID system of the Ministry of Public Security of China, 2023". [9] The CJK Unified Ideographs Extension D block was cited as a precedent, since it comprised a repertoire of urgently needed characters (UNCs) from IRG member bodies, whereas the IRG working-set initially slated to become Extension D would instead become Extension E. [10] For compactness, the block was allocated to the available space in the Supplementary Ideographic Plane after CJK Unified Ideographs Extension F, as opposed to on the Tertiary Ideographic Plane after CJK Unified Ideographs Extension H; this means that the CJK extension blocks are no longer in alphabetical order by extension letter. [11] Following this, the draft GB 18030 amendment was modified to use the Extension I code points. [6]

At its next meeting in October 2023, the IRG expressed concerns about bypassing the IRG for large collections of CJK characters, and noted that two of the characters in Extension I had, for the purposes of other regions' character sources, previously been unified with existing characters under IRG unification rules: [3] [12]

In response, the IRG recommended that, in future, submitters of proposed CJK characters be required to provide information about the impact on other CJK character sources of any disunifications proposed by the submission, and that the IRG be given time to review all large submissions of CJK characters. The IRG encouraged the Chinese body to propose solutions to the issues caused by the addition of these two characters at the next IRG meeting. [3]

Block

CJK Unified Ideographs Extension I [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+2EBFx𮯰𮯱𮯲𮯳𮯴𮯵𮯶𮯷𮯸𮯹𮯺𮯻𮯼𮯽𮯾𮯿
U+2EC0x𮰀𮰁𮰂𮰃𮰄𮰅𮰆𮰇𮰈𮰉𮰊𮰋𮰌𮰍𮰎𮰏
U+2EC1x𮰐𮰑𮰒𮰓𮰔𮰕𮰖𮰗𮰘𮰙𮰚𮰛𮰜𮰝𮰞𮰟
U+2EC2x𮰠𮰡𮰢𮰣𮰤𮰥𮰦𮰧𮰨𮰩𮰪𮰫𮰬𮰭𮰮𮰯
U+2EC3x𮰰𮰱𮰲𮰳𮰴𮰵𮰶𮰷𮰸𮰹𮰺𮰻𮰼𮰽𮰾𮰿
U+2EC4x𮱀𮱁𮱂𮱃𮱄𮱅𮱆𮱇𮱈𮱉𮱊𮱋𮱌𮱍𮱎𮱏
U+2EC5x𮱐𮱑𮱒𮱓𮱔𮱕𮱖𮱗𮱘𮱙𮱚𮱛𮱜𮱝𮱞𮱟
U+2EC6x𮱠𮱡𮱢𮱣𮱤𮱥𮱦𮱧𮱨𮱩𮱪𮱫𮱬𮱭𮱮𮱯
U+2EC7x𮱰𮱱𮱲𮱳𮱴𮱵𮱶𮱷𮱸𮱹𮱺𮱻𮱼𮱽𮱾𮱿
U+2EC8x𮲀𮲁𮲂𮲃𮲄𮲅𮲆𮲇𮲈𮲉𮲊𮲋𮲌𮲍𮲎𮲏
U+2EC9x𮲐𮲑𮲒𮲓𮲔𮲕𮲖𮲗𮲘𮲙𮲚𮲛𮲜𮲝𮲞𮲟
U+2ECAx𮲠𮲡𮲢𮲣𮲤𮲥𮲦𮲧𮲨𮲩𮲪𮲫𮲬𮲭𮲮𮲯
U+2ECBx𮲰𮲱𮲲𮲳𮲴𮲵𮲶𮲷𮲸𮲹𮲺𮲻𮲼𮲽𮲾𮲿
U+2ECCx𮳀𮳁𮳂𮳃𮳄𮳅𮳆𮳇𮳈𮳉𮳊𮳋𮳌𮳍𮳎𮳏
U+2ECDx𮳐𮳑𮳒𮳓𮳔𮳕𮳖𮳗𮳘𮳙𮳚𮳛𮳜𮳝𮳞𮳟
U+2ECEx𮳠𮳡𮳢𮳣𮳤𮳥𮳦𮳧𮳨𮳩𮳪𮳫𮳬𮳭𮳮𮳯
U+2ECFx𮳰𮳱𮳲𮳳𮳴𮳵𮳶𮳷𮳸𮳹𮳺𮳻𮳼𮳽𮳾𮳿
U+2ED0x𮴀𮴁𮴂𮴃𮴄𮴅𮴆𮴇𮴈𮴉𮴊𮴋𮴌𮴍𮴎𮴏
U+2ED1x𮴐𮴑𮴒𮴓𮴔𮴕𮴖𮴗𮴘𮴙𮴚𮴛𮴜𮴝𮴞𮴟
U+2ED2x𮴠𮴡𮴢𮴣𮴤𮴥𮴦𮴧𮴨𮴩𮴪𮴫𮴬𮴭𮴮𮴯
U+2ED3x𮴰𮴱𮴲𮴳𮴴𮴵𮴶𮴷𮴸𮴹𮴺𮴻𮴼𮴽𮴾𮴿
U+2ED4x𮵀𮵁𮵂𮵃𮵄𮵅𮵆𮵇𮵈𮵉𮵊𮵋𮵌𮵍𮵎𮵏
U+2ED5x𮵐𮵑𮵒𮵓𮵔𮵕𮵖𮵗𮵘𮵙𮵚𮵛𮵜𮵝𮵞𮵟
U+2ED6x𮵠𮵡𮵢𮵣𮵤𮵥𮵦𮵧𮵨𮵩𮵪𮵫𮵬𮵭𮵮𮵯
U+2ED7x𮵰𮵱𮵲𮵳𮵴𮵵𮵶𮵷𮵸𮵹𮵺𮵻𮵼𮵽𮵾𮵿
U+2ED8x𮶀𮶁𮶂𮶃𮶄𮶅𮶆𮶇𮶈𮶉𮶊𮶋𮶌𮶍𮶎𮶏
U+2ED9x𮶐𮶑𮶒𮶓𮶔𮶕𮶖𮶗𮶘𮶙𮶚𮶛𮶜𮶝𮶞𮶟
U+2EDAx𮶠𮶡𮶢𮶣𮶤𮶥𮶦𮶧𮶨𮶩𮶪𮶫𮶬𮶭𮶮𮶯
U+2EDBx𮶰𮶱𮶲𮶳𮶴𮶵𮶶𮶷𮶸𮶹𮶺𮶻𮶼𮶽𮶾𮶿
U+2EDCx𮷀𮷁𮷂𮷃𮷄𮷅𮷆𮷇𮷈𮷉𮷊𮷋𮷌𮷍𮷎𮷏
U+2EDDx𮷐𮷑𮷒𮷓𮷔𮷕𮷖𮷗𮷘𮷙𮷚𮷛𮷜𮷝𮷞𮷟
U+2EDEx𮷠𮷡𮷢𮷣𮷤𮷥𮷦𮷧𮷨𮷩𮷪𮷫𮷬𮷭𮷮𮷯
U+2EDFx𮷰𮷱𮷲𮷳𮷴𮷵𮷶𮷷𮷸𮷹𮷺𮷻𮷼𮷽𮷾𮷿
U+2EE0x𮸀𮸁𮸂𮸃𮸄𮸅𮸆𮸇𮸈𮸉𮸊𮸋𮸌𮸍𮸎𮸏
U+2EE1x𮸐𮸑𮸒𮸓𮸔𮸕𮸖𮸗𮸘𮸙𮸚𮸛𮸜𮸝𮸞𮸟
U+2EE2x𮸠𮸡𮸢𮸣𮸤𮸥𮸦𮸧𮸨𮸩𮸪𮸫𮸬𮸭𮸮𮸯
U+2EE3x𮸰𮸱𮸲𮸳𮸴𮸵𮸶𮸷𮸸𮸹𮸺𮸻𮸼𮸽𮸾𮸿
U+2EE4x𮹀𮹁𮹂𮹃𮹄𮹅𮹆𮹇𮹈𮹉𮹊𮹋𮹌𮹍𮹎𮹏
U+2EE5x𮹐𮹑𮹒𮹓𮹔𮹕𮹖𮹗𮹘𮹙𮹚𮹛𮹜𮹝
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Unified Ideographs Extension I block:

Related Research Articles

In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese.

<span class="mw-page-title-main">GB 18030</span> Official Chinese character encoding

GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0.

The ConScript Unicode Registry is a volunteer project to coordinate the assignment of code points in the Unicode Private Use Areas (PUA) for the encoding of artificial scripts, such as those for constructed languages. It was founded by John Cowan and was maintained by him and Michael Everson. It is not affiliated with the Unicode Consortium.

<span class="mw-page-title-main">Chinese Character Code for Information Interchange</span> Character encoding standard

The Chinese Character Code for Information Interchange or CCCII is a character set developed by the Chinese Character Analysis Group in Taiwan. It was first published in 1980, and significantly expanded in 1982 and 1987.

The CNS 11643 character set, also officially known as the Chinese Standard Interchange Code or CSIC, is officially the standard character set of Taiwan. In practice, variants of the related Big5 character set are de facto standard.

The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Set. IRG is tasked with preparing and reviewing sets of CJK unified ideographs for eventual inclusion in both ISO/IEC 10646 and The Unicode Standard. The IRG is composed of representatives from national standards bodies from China, Japan, South Korea, Vietnam, and other regions that have historically used Chinese characters, as well as experts from liaison organizations such as the SAT Daizōkyō Text Database Committee (SAT), Taipei Computer Association (TCA), and the Unicode Technical Committee (UTC). The group holds two meetings every year lasting 4-5 days each, subsequently reporting its activities to its parent ISO/IEC JTC 1/SC 2 (SC2/WG2) committee.

<i>Mojikyō</i> Character encoding scheme

Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

<span class="mw-page-title-main">Biangbiang noodles</span> Type of Chinese noodles

Biangbiang noodles, alternatively known as youpo chemian in Chinese, are a type of Chinese noodle originating from Shaanxi cuisine. The noodles, touted as one of the "eight curiosities" of Shaanxi (陕西八大怪), are described as being like a belt, owing to their thickness and length.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.

<span class="mw-page-title-main">Ken Lunde</span>

Ken Roger Lunde is an American specialist in information processing for East Asian languages.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

The CCITT Chinese Primary Set is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992. It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex. It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165, and encodable in the ISO-2022-CN-EXT code version.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

The regional indicator symbols are a set of 26 alphabetic Unicode characters (A–Z) intended to be used to encode ISO 3166-1 alpha-2 two-letter country codes in a way that allows optional special treatment.

Tatsuo Kobayashi is a Japanese web architect who specializes in international standardization.

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

Tamil All Character Encoding (TACE16) is a scheme for encoding the Tamil script in the Private Use Area of Unicode, implementing a syllabary-based character model differing from the modified-ISCII model used by Unicode's existing Tamil implementation.

International Ideographs Core (IICore) is a subset of up to ten thousand CJK Unified Ideographs characters, which can be implemented on devices with limited memories and capability that make it not feasible to implement the full ISO 10646/Unicode standard.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-09-12.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-09-12.
  3. 1 2 3 Ideographic Research Group (2023-10-20). "Recommendation IRG M61.12: Issue of Extension I to Other CJK Source Characters (IRGN2635 & Feedback, IRGN2622)" (PDF). IRG Meeting #61 Recommendations and Action Items. ISO/IEC JTC1/SC2 N4885, WG2 N5243, IRG N2620; UTC L2/23-250.
  4. Kaplan, Michael S (2013-03-28). "You call it GB18030, I call it UTF-GBK..." Sorting it all out.
  5. 1 2 3 4 United States National Body (May 1, 2023). "USNB Comments on Draft 2 of GB 18030-2022 Amendment 1 and recommendation for ISO/IEC 10646:2020 Amendment 2" (PDF). ISO/IEC JTC1/SC2 N4852, WG2 N5222; UTC L2/23-115.
  6. 1 2 China National Body (2023-10-13). "IRG #61 Activity Report" (PDF). ISO/IEC JTC1/SC2/WG2/IRG N2623; UTC L2/23-240.
  7. Ideographic Research Group (2023-03-24). "Recommendation IRG M60.7: Draft GB18030-2022 Amendment Feedback (IRGN2591, IRGN2605)" (PDF). IRG Meeting #60 Recommendations and Action Items. ISO/IEC JTC1/SC2 N4840, WG2 N5205, IRG N2600; UTC L2/23-087.
  8. "CJK Unified Ideographs Extension I" (PDF). The Unicode Standard, Version 15.1. Unicode Consortium. 2023.
  9. Lunde, Ken; Cook, Richard, eds. (2023-09-01). "kIRG_GSource". Unicode Han Database (Unihan). Unicode 15.1.0. UAX #38.
  10. Lunde, Ken (2023-04-22). "03) L2/23-100: GB 18030-2022 Amendment, Draft 2 + Disposition of Comments, Draft 1" (PDF). CJK & Unihan Group Recommendations for UTC #175 Meeting. UTC L2/23-082.
  11. "CJK/Unihan Changes". Unicode 15.1.0. Unicode Consortium. 2023-09-12. To keep the CJK block ranges as compact as possible, Extension I has been added to Plane 2, instead of directly after Extension H on Plane 3. Implementers should also check that their code does not assume that CJK extensions all occur in alphabetic order by the extension letter.
  12. 1 2 3 Sim, Cheon-hyeong (2023-05-17). "2. Newly introduced half-duplicated characters" (PDF). Application for Horizontal Extensions of Multiple Sources in CJK-ExtI. pp. 3–5. ISO/IEC JTC1/SC2/WG2/IRG N2635. (Note: the referenced document refers to an earlier draft of Extension I with code points that differ from those in the final version accepted into Unicode. U+2ED90 in the referenced document corresponds to U+2ED9D𮶝<RESERVED-2ED9D> in the final version, while U+2EDD1 in the referenced document corresponds to U+2EDE0𮷠<RESERVED-2EDE0> in the final version.)
  13. "CJK Unified Ideographs" (PDF). The Unicode Standard, Version 15.0. Unicode Consortium. p. 823.
  14. Japan National Body (2023-04-24). "WG2 n5221 data file: Proposed Horizontal Extension" (PDF). Request for Horizontal Extension in the J-column of ISO/IEC 10646 (PDF). p. 414. ISO/IEC JTC1/SC2/WG2 N5221; UTC L2/23-144.
  15. Japan National Body (2023-04-24). "WG2 n5221 data file: Proposed Horizontal Extension" (PDF). Request for Horizontal Extension in the J-column of ISO/IEC 10646 (PDF). p. 458. ISO/IEC JTC1/SC2/WG2 N5221; UTC L2/23-144.
  16. Suignard, Michel, ed. (2024-01-03). "Disposition of comments on CDAM2.3 to ISO/IEC 10646 6th edition" (PDF). ISO/IEC JTC1/SC2/WG2 N5245, UTC L2/24-016.

Further reading