The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters . During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 16.0, Unicode defines a total of 97,680 characters. [1]
The term ideographs is a misnomer, as the Chinese script is not ideographic but rather logographic.
Until the early 20th century, Vietnam also used Chinese characters (Chữ Nôm), so sometimes the abbreviation CJKV is used.
The Ideographic Research Group (IRG) is responsible for developing extensions to the encoded repertoires of CJK unified ideographs. IRG processes proposals for new CJK unified ideographs submitted by its member bodies, and after undergoing several rounds of expert review, IRG submits a consolidated set of characters to ISO/IEC JTC 1/SC 2 Working Group 2 (WG2) and the Unicode Technical Committee (UTC) for consideration for inclusion in the ISO/IEC 10646 and Unicode standards. The following IRG member bodies have been involved in the standardization of CJK unified ideographs:
The ideographs submitted by the UTC and the United Kingdom are not specific to any particular region, but are characters which have been suggested for encoding by individual experts. The ideographs submitted by SAT are required for the SAT Daizōkyō text database.
The table below gives the numbers of encoded CJK unified ideographs for each IRG source for Unicode 16.0. [2] The total number of characters (260,840) far exceeds the number of encoded CJK unified ideographs (97,680) as many characters have more than one source.
Country or region | Character count |
---|---|
China | 66,564 |
Hong Kong | 17,654 |
Macau | 344 |
Taiwan (TCA) | 58,601 |
Japan | 52,560 |
South Korea | 20,874 |
North Korea | 23,975 |
Vietnam | 13,284 |
United Kingdom | 2,503 |
SAT | 3,455 |
UTC | 1,026 |
Total | 260,840 |
The majority of characters submitted by the UTC to the IRG are derived from Unicode Technical Committee (UTC) documents. [3] Other sources include:
The basic block named CJK Unified Ideographs (4E00–9FFF) contains 20,992 basic Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system, hanja in Korea, and chữ Nôm characters in Vietnamese. Many characters in this block are used in all three writing systems, while others are in only one or two of the three. The first 20,902 characters in the block are arranged according to the Kangxi Dictionary ordering of radicals. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical order.
The block is the result of Han unification, [4] which was somewhat controversial within East Asia. [5] Since Chinese, Japanese and Korean characters were coded in the same location, the appearance of a selected glyph could depend on the particular font being used. However, the source separation rule states that characters encoded separately in an earlier character set would remain separate in the new Unicode encoding. [6]
Using variation selectors, it is possible to specify certain variant CJK ideograms within Unicode. [7] The Adobe-Japan1 character set, which has 14,684 ideographic variation sequences, [8] is an extreme example of the use of variation selectors. [9]
Note: Most characters appear in multiple sources, so the sum of individual character counts (108,480) is far greater than the number of encoded characters (20,992). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | G0 | GB 2312-80 | 6,763 | 20,933 |
G1 | GB 12345-90 | 2,202 | ||
G3 | GB 7589-87 traditional form | 4,834 | ||
G5 | GB 7590-87 traditional form | 2,841 | ||
G7 | Modern Chinese general character chart (Simplified Chinese: 现代汉语通用字表) | 42 | ||
G8 | GB 8565-88 | 203 | ||
GCE | National Academy for Educational Research | 4 | ||
GDM | Place name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China | 2 | ||
GE | GB16500-95 | 3,770 | ||
GFC | Modern Chinese Standard Dictionary (现代汉语规范词典第二版) | 2 | ||
GGFZ | Tongyong Guifan Hanzi Zidian (通用规范汉字字典) | 1 | ||
GH | GB/T 15564-1995 | 59 | ||
GHZ | Hanyu Da Zidian (漢語大字典) | 1 | ||
GHZR | Hanyu Da Zidian 2nd ed. (汉语大字典, 第二版) | 1 | ||
GK | GB 12052-89 | 89 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 16 | ||
GKX | Kangxi Dictionary (康熙字典) | 5 | ||
GLK | Longkan Shoujian (龍龕手鑑) | 1 | ||
GT | Standard Telegraph Codebook (revised), 1983 | 8 | ||
GU | No source (the original source reference may have been moved) | 88 | ||
GZFY | Hanyu Fangyan Dacidian (汉语方言大词典) | 1 | ||
Hong Kong | H | Hong Kong Supplementary Character Set, 2008 | 2,292 | 15,376 |
HB0 | Computer Chinese Glyph and Character Code Mapping Table, Technical Report C-26 (電腦用中文字型與字碼對照表, 技術通報C-26) | 9 | ||
HB1 | Big-5, Level 1 | 5,401 | ||
HB2 | Big-5, Level 2 | 7,650 | ||
HD | Hong Kong Supplementary Character Set, 2016 | 24 | ||
Japan | J0 | JIS X 0208-1990 | 6,356 | 18,249 |
J1 | JIS X 0212-1990 | 3,058 | ||
J13 | JIS X 0213:2004 level-3 characters replacing J1 characters | 1,037 | ||
J13A | JIS X 0213:2004 level-3 character addendum from JIS X 0213:2000 level-3 replacing J1 character | 2 | ||
J14 | JIS X 0213:2004 level-4 characters replacing J1 characters | 1,704 | ||
J3 | JIS X 0213:2004 Level 3 | 95 | ||
J3A | JIS X 0213:2004 Level 3 addendum | 7 | ||
J4 | JIS X 0213:2004 Level 4 | 301 | ||
JARIB | ARIB STD-B24 | 3 | ||
JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 5,686 | ||
South Korea | K0 | KS C 5601-87 (now KS X 1001:2004) | 4,620 | 15,442 |
K1 | KS C 5657-91 (now KS X 1002:2001) | 2,855 | ||
K2 | PKS C 5700-1:1994 | 7,911 | ||
K3 | PKS C 5700-2:1994 | 1 | ||
K4 | PKS 5700-3:1998 | 4 | ||
K6 | KS X 1027-5:2014 | 49 | ||
KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 1 | ||
KU | No source (the original source reference may have been moved) | 1 | ||
North Korea | KP0 | KPS 9566-97 | 4,652 | 15,010 |
KP1 | KPS 10721-2000 | 10,358 | ||
Macau | MA | HKSCS-2008 | 29 | 200 |
MB1 | Big Five | 10 | ||
MB2 | Big Five | 7 | ||
MC | MCSCS Reference | 3 | ||
MD | MCSCS horizontal extensions | 127 | ||
MDH | MCSCS horizontal extensions | 24 | ||
Taiwan | T1 | CNS 11643-1992 plane 1 | 5,413 | 18,384 |
T2 | CNS 11643-1992 plane 2 | 7,651 | ||
T3 | CNS 11643-1992 plane 3 | 4,144 | ||
T4 | CNS 11643-1992 plane 4 | 894 | ||
T5 | CNS 11643-1992 plane 5 | 64 | ||
T6 | CNS 11643-1992 plane 6 | 31 | ||
T7 | CNS 11643-1992 plane 7 | 16 | ||
TB | CNS 11643-2007 plane 11 | 2 | ||
TC | CNS 11643-2007 plane 12 | 2 | ||
TE | CNS 11643-2007 plane 14 | 9 | ||
TF | CNS 11643-2007 plane 15 | 158 | ||
Vietnam | V0 | TCVN 5773:1993 | 599 | 4,808 |
V1 | TCVN 6056:1995 | 3,305 | ||
V2 | VHN 01-1998 | 759 | ||
V3 | VHN 02-1998 | 91 | ||
V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 19 | ||
VN | Vietnamese horizontal extensions | 35 | ||
n/a | UTC | UTC sources | 78 | 78 |
In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters were assigned to between U+9FA6 and U+9FBB code points. Since then, other additions were added to this block for various reasons, all summarized in the version history section below.
The block named CJK Unified Ideographs Extension A (3400–4DBF) contains 6,592 additional characters in the range U+3400 through U+4DBF.
Note: Most characters appear in more than one source, so the sum of individual character counts (23,954) is far greater than the number of encoded characters (6,592). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | G3 | GB 7589-87 traditional form | 2,391 | 6,197 |
G5 | GB 7590-87 traditional form | 1,226 | ||
G7 | Modern Chinese general character chart | 120 | ||
GGFZ | Tongyong Guifan Hanzi Zidian (通用规范汉字字典) | 2 | ||
GHZ | Hanyu Da Zidian (漢語大字典) | 340 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 3 | ||
GKX | Kangxi Dictionary (康熙字典) | 1,889 | ||
GS | Singapore Chinese characters [note 1] | 226 | ||
Hong Kong | H | Hong Kong Supplementary Character Set, 2008 | 572 | 572 |
Japan | J3 | JIS X 0213:2004 Level 3 | 2 | 5,856 |
J4 | JIS X 0213:2004 Level 4 | 78 | ||
JA | Japanese IT Vendors Contemporary Ideographs, 1993 | 574 | ||
JA3 | JIS X 0213:2004 level-3 characters replacing JA characters | 17 | ||
JA4 | JIS X 0213:2004 level-4 characters replacing JA characters | 67 | ||
JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 5,118 | ||
South Korea | K3 | PKS C 5700-2:1994 | 1,833 | 1,867 |
K4 | PKS 5700-3:1998 | 2 | ||
K6 | KS X 1027-5:2014 | 28 | ||
KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 3 | ||
KU | No source (the original source reference may have been moved) | 1 | ||
North Korea | KP0 | KPS 9566-97 | 1 | 3,191 |
KP1 | KPS 10721-2000 | 3,190 | ||
Macau | MA | HKSCS-2008 | 4 | 12 |
MD | MCSCS horizontal extensions | 8 | ||
Taiwan | T3 | CNS 11643-1992 plane 3 | 2,179 | 5,916 |
T4 | CNS 11643-1992 plane 4 | 2,919 | ||
T5 | CNS 11643-1992 plane 5 | 399 | ||
T6 | CNS 11643-1992 plane 6 | 200 | ||
T7 | CNS 11643-1992 plane 7 | 133 | ||
TE | CNS 11643-2007 plane 14 | 1 | ||
TF | CNS 11643-2007 plane 15 | 85 | ||
United Kingdom | UK | IRG N2107R2 | 3 | 3 |
Vietnam | V0 | TCVN 5773:1993 | 140 | 319 |
V2 | VHN 01-1998 | 149 | ||
V3 | VHN 02-1998 | 19 | ||
V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 5 | ||
VN | Vietnamese horizontal extensions | 6 | ||
n/a | UTC | UTC sources | 21 | 21 |
The block named CJK Unified Ideographs Extension B (20000–2A6DF) contains 42,720 characters in the range U+20000 through U+2A6DF. These include most of the characters used in the Kangxi Dictionary that are not in the basic CJK Unified Ideographs block, as well as many Hán-Nôm characters that were formerly used to write Vietnamese.
20000-215FF, 21600-230FF, 23100-245FF, 24600-260FF, 26100-275FF, 27600-290FF, 29100-2A6DF.
Note: Many characters appear in more than one source, so the sum of individual character counts (99,784) is far greater than the number of encoded characters (42,720). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | G3 | GB 7589-87 traditional form | 1 | 30,550 |
G4K | Siku Quanshu (四庫全書) | 477 | ||
GBK | Encyclopedia of China (中國大百科全書) | 86 | ||
GCH | Cihai (辞海) | 247 | ||
GCY | Ciyuan (辭源) | 66 | ||
GFZ | Founder Press System | 65 | ||
GGFZ | Tongyong Guifan Hanzi Zidian (通用规范汉字字典) | 5 | ||
GHC | Hanyu Da Cidian (漢語大詞典) | 553 | ||
GHF | Hanwen fodian yinan suzi huishi yu yanjiu (漢文佛典疑難俗字彙釋與研究) | 1 | ||
GHZ | Hanyu Da Zidian (漢語大字典) | 10,507 | ||
GHZR | Hanyu Da Zidian 2nd ed. (汉语大字典, 第二版) | 1 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 17 | ||
GKX | Kangxi Dictionary (康熙字典) | 18,469 | ||
GU | No source (the original source reference may have been moved) | 55 | ||
Hong Kong | H | Hong Kong Supplementary Character Set, 2008 | 1,703 | 1,703 |
Japan | J3 | JIS X 0213:2004 Level 3 | 25 | 25,745 |
J3A | JIS X 0213:2004 Level 3 addendum | 1 | ||
J4 | JIS X 0213:2004 Level 4 | 277 | ||
JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 25,442 | ||
South Korea | K1 | KS C 5657-91 (now KS X 1002:2001) | 1 | 395 |
K4 | PKS 5700-3:1998 | 166 | ||
K6 | KS X 1027-5:2014 | 214 | ||
KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 14 | ||
North Korea | KP1 | KPS 10721-2000 | 5,765 | 5,765 |
Macau | MA | HKSCS-2008 | 9 | 38 |
MC | MCSCS Reference | 2 | ||
MD | MCSCS horizontal extensions | 27 | ||
Taiwan | T3 | CNS 11643-1992 plane 3 | 25 | 30,193 |
T4 | CNS 11643-1992 plane 4 | 3,408 | ||
T5 | CNS 11643-1992 plane 5 | 8,111 | ||
T6 | CNS 11643-1992 plane 6 | 5,934 | ||
T7 | CNS 11643-1992 plane 7 | 6,299 | ||
TA | CNS 11643-2007 plane 10 | 8 | ||
TB | CNS 11643-2007 plane 11 | 6 | ||
TC | CNS 11643-2007 plane 12 | 1 | ||
TF | CNS 11643-2007 plane 15 | 6,401 | ||
United Kingdom | UK | IRG N2107R2 | 12 | 12 |
Vietnam | V0 | TCVN 5773:1993 | 1,570 | 5,299 |
V1 | TCVN 6056:1995 | 1 | ||
V2 | VHN 01-1998 | 2,286 | ||
V3 | VHN 02-1998 | 422 | ||
V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 33 | ||
VN | Vietnamese horizontal extensions | 987 | ||
n/a | SAT | SAT Daizōkyō Text Database | 1 | 84 |
UTC | UTC sources | 83 |
The block named CJK Unified Ideographs Extension C (2A700–2B73F) contains 4,154 characters in the range U+2A700 through U+2B739. It was initially added in Unicode 5.2 (2009).
Note: Some characters appear in more than one source, so the sum of individual character counts (4,634) is greater than the number of encoded characters (4,154). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GBK | Encyclopedia of China (中國大百科全書) | 74 | 1,130 |
GCH | Cihai (辞海) | 264 | ||
GCY | Ciyuan (辭源) | 1 | ||
GCYY | Chinese Academy of Surveying and Mapping ideographs | 55 | ||
GDM | Place name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China | 1 | ||
GFZ | Founder Press System | 1 | ||
GGFZ | Tongyong Guifan Hanzi Zidian (通用规范汉字字典) | 2 | ||
GGH | Gudai Hanyu Cidian (古代汉语词典) | 51 | ||
GHC | Hanyu Da Cidian (漢語大詞典) | 14 | ||
GHZ | Hanyu Da Zidian (漢語大字典) | 1 | ||
GHZR | Hanyu Da Zidian 2nd ed. (汉语大字典, 第二版) | 1 | ||
GJZ | Commercial Press ideographs | 61 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 6 | ||
GKX | Kangxi Dictionary (康熙字典) | 6 | ||
GXC | Xiandai Hanyu Cidian (现代汉语词典) | 25 | ||
GZFY | Hanyu Fangyan Dacidian (汉语方言大词典) | 202 | ||
GZJW | Yin Zhou Jinwen Jicheng Yinde (殷周金文集成引得) | 365 | ||
Hong Kong | H | Hong Kong Supplementary Character Set, 2008 | 1 | 1 |
Japan | JK | Japanese Kokuji Collection | 367 | 431 |
JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 64 | ||
South Korea | K5 | Korean IRG Hanja Character Set | 404 | 406 |
K6 | KS X 1027-5:2014 | 1 | ||
KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 1 | ||
North Korea | KP1 | KPS 10721-2000 | 8 | 8 |
Macau | MC | MCSCS Reference | 17 | 21 |
MD | MCSCS horizontal extensions | 4 | ||
Taiwan | T5 | CNS 11643-1992 plane 5 | 1 | 1,752 |
TC | CNS 11643-2007 plane 12 | 634 | ||
TD | CNS 11643-2007 plane 13 | 766 | ||
TE | CNS 11643-2007 plane 14 | 350 | ||
TU | No source (the original source reference may have been moved) | 1 | ||
United Kingdom | UK | IRG N2107R2 | 1 | 1 |
Vietnam | V0 | TCVN 5773:1993 | 4 | 795 |
V1 | TCVN 6056:1995 | 2 | ||
V2 | VHN 01-1998 | 1 | ||
V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 782 | ||
VN | Vietnamese horizontal extensions | 6 | ||
n/a | UTC | UTC sources | 89 | 89 |
The block named CJK Unified Ideographs Extension D (2B740–2B81F) contains 222 characters in the range U+2B740 through U+2B81D that were added in Unicode 6.0 (2010).
Note: Some characters appear in more than one source, so the sum of individual character counts (239) is greater than the number of encoded characters (222). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GCH | Cihai (辞海) | 1 | 78 |
GDM | Place name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China | 1 | ||
GIDC | ID System of the Ministry of Public Security of China | 9 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 2 | ||
GXC | Xiandai Hanyu Cidian (现代汉语词典) | 4 | ||
GXM | Characters for use in personal names in China from Public Order Administration, Ministry of Public Security of the People's Republic of China | 22 | ||
GZH | Zhonghua Zihai (中华字海) | 39 | ||
Japan | JH | Hanyo-Denshi Program (汎用電子情報交換環境整備プログラム) | 107 | 117 |
JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 10 | ||
Taiwan | TB | CNS 11643-2007 plane 11 | 24 | 24 |
n/a | UTC | UTC sources | 20 | 20 |
The block named CJK Unified Ideographs Extension E (2B820–2CEAF) contains 5,762 characters in the range U+2B820 through U+2CEA1 that were added in Unicode 8.0 (2015).
Note: Some characters appear in more than one source, so the sum of individual character counts (5,919) is greater than the number of encoded characters (5,762). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GBK | Encyclopedia of China (中國大百科全書) | 15 | 2,822 |
GCH | Cihai (辞海) | 112 | ||
GCY | Ciyuan (辭源) | 3 | ||
GCYY | Chinese Academy of Surveying and Mapping ideographs | 98 | ||
GDZ | Geology Press ideographs | 1 | ||
GGFZ | Tongyong Guifan Hanzi Zidian (通用规范汉字字典) | 4 | ||
GGH | Gudai Hanyu Cidian (古代汉语词典) | 175 | ||
GHC | Hanyu Da Cidian (漢語大詞典) | 7 | ||
GIDC | ID System of the Ministry of Public Security of China | 37 | ||
GJZ | Commercial Press ideographs | 147 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 2 | ||
GKX | Kangxi Dictionary (康熙字典) | 22 | ||
GRM | People's Daily ideographs | 3 | ||
GU | No source (the original source reference may have been moved) | 1 | ||
GWZ | Hanyu Da Cidian Press ideographs | 12 | ||
GXC | Xiandai Hanyu Cidian (现代汉语词典) | 57 | ||
GXH | Xinhua Zidian (新华字典) | 4 | ||
GZFY | Hanyu Fangyan Dacidian (汉语方言大词典) | 712 | ||
GZJW | Yin Zhou Jinwen Jicheng Yinde (殷周金文集成引得) | 1,410 | ||
Hong Kong | HD | Hong Kong Supplementary Character Set, 2016 | 1 | 1 |
Japan | JK | Japanese Kokuji Collection | 415 | 503 |
JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 88 | ||
South Korea | KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 7 | 7 |
Macau | MC | MCSCS Reference | 48 | 51 |
MD | MCSCS horizontal extensions | 3 | ||
Taiwan | T3 | CNS 11643-1992 plane 3 | 2 | 1,261 |
TB | CNS 11643-2007 plane 11 | 2 | ||
TC | CNS 11643-2007 plane 12 | 323 | ||
TD | CNS 11643-2007 plane 13 | 595 | ||
TE | CNS 11643-2007 plane 14 | 339 | ||
United Kingdom | UK | IRG N2107R2 | 2 | 2 |
Vietnam | V0 | TCVN 5773:1993 | 6 | 1,036 |
V2 | VHN 01-1998 | 1 | ||
V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 1,023 | ||
VN | Vietnamese horizontal extensions | 6 | ||
n/a | UTC | UTC sources | 236 | 236 |
The block named CJK Unified Ideographs Extension F (2CEB0–2EBEF) contains 7,473 characters in the range U+2CEB0 through 2EBE0 that were added in Unicode 10.0 (2017). It includes more than 1,000 Sawndip characters for Zhuang.
Note: Some characters appear in more than one source, so the sum of individual character counts (7,775) is greater than the number of encoded characters (7,473). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GCY | Ciyuan (辭源) | 122 | 1,309 |
GFC | Modern Chinese Standard Dictionary (现代汉语规范词典第二版) | 27 | ||
GIDC | ID System of the Ministry of Public Security of China | 1 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 5 | ||
GLGYJ | Zhuang Liao Songs Research (壮族嘹歌研究) | 1 | ||
GOCD | Oxford English-Chinese Chinese-English Dictionary (牛津英汉汉英词典) | 2 | ||
GPGLG | Zhuang Folk Song Culture Series - Pingguo County Liao Songs (壮族民歌文化丛书•平果嘹歌) | 70 | ||
GXHZ | Xinhua Da Zidian (新华大字典) | 51 | ||
GZ | Ancient Zhuang Character Dictionary (古壮字字典) | 995 | ||
GZJW | Yin Zhou Jinwen Jicheng Yinde (殷周金文集成引得) | 33 | ||
GZYS | Chinese Ancient Ethnic Characters Research (中国民族古文字研究) | 2 | ||
Hong Kong | HD | Hong Kong Supplementary Character Set, 2016 | 1 | 1 |
Japan | JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 1,646 | 1,646 |
South Korea | KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 1,810 | 1,810 |
Macau | MC | MCSCS Reference | 22 | 22 |
Taiwan | T3 | CNS 11643-1992 plane 3 | 1 | 3 |
T6 | CNS 11643-1992 plane 6 | 1 | ||
TC | CNS 11643-2007 plane 12 | 1 | ||
United Kingdom | UK | IRG N2107R2 | 2 | 2 |
Vietnam | V0 | TCVN 5773:1993 | 1 | 17 |
V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 8 | ||
VN | Vietnamese horizontal extensions | 8 | ||
n/a | SAT | SAT Daizōkyō Text Database | 2,884 | 2,965 |
UTC | UTC sources | 81 |
A block named CJK Unified Ideographs Extension G was added as part of Unicode 13.0 to the Tertiary Ideographic Plane in the range U+30000 through U+3134F, containing 4,939 characters. [13]
Note: Some characters appear in more than one source, so the sum of individual character counts (5,081) is greater than the number of encoded characters (4,939). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GHZR | Hanyu Da Zidian 2nd ed. (汉语大字典, 第二版) | 878 | 2,082 |
GPGLG | Zhuang Folk Song Culture Series - Pingguo County Liao Songs (壮族民歌文化丛书•平果嘹歌) | 13 | ||
GZ | Ancient Zhuang Character Dictionary (古壮字字典) | 1,191 | ||
South Korea | KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 435 | 435 |
Taiwan | T13 | CNS 11643 (pending new version) plane 19 | 347 | 353 |
TB | CNS 11643-2007 plane 11 | 3 | ||
TC | CNS 11643-2007 plane 12 | 2 | ||
TD | CNS 11643-2007 plane 13 | 1 | ||
United Kingdom | UK | IRG N2107R2 | 1,566 | 1,566 |
Vietnam | V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 6 | 76 |
VN | Vietnamese horizontal extensions | 70 | ||
n/a | SAT | SAT Daizōkyō Text Database | 329 | 569 |
UTC | UTC sources | 240 |
A block named CJK Unified Ideographs Extension H was added as part of Unicode 15.0 to the Tertiary Ideographic Plane in the range U+31350 through U+323AF, containing 4,192 characters. [14]
Note: Some characters appear in more than one source, so the sum of individual character counts (4,309) is greater than the number of encoded characters (4,192). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GDM | Place name characters from the Public Order Administration, Ministry of Public Security of the People's Republic of China | 128 | 829 |
GHC | Hanyu Da Cidian (漢語大詞典) | 27 | ||
GKJ | Terms in Sciences and Technologies (科技用字) approved by the China National Committee for Terms in Sciences and Technologies (CNCTST) | 30 | ||
GLGYJ | Zhuang Liao Songs Research (壮族嘹歌研究) | 11 | ||
GPGLG | Zhuang Folk Song Culture Series - Pingguo County Liao Songs (壮族民歌文化丛书•平果嘹歌) | 14 | ||
GU | No source (the original source reference may have been moved) | 1 | ||
GXM | Characters for use in personal names in China from Public Order Administration, Ministry of Public Security of the People's Republic of China | 216 | ||
GZ | Ancient Zhuang Character Dictionary (古壮字字典) | 285 | ||
GZA-1 | A Vibrant and Unbroken Transmission—Filial Piety and Zhuang Funeral Songs (生生不息的传承•孝与壮族行孝歌之研究) | 6 | ||
GZA-2 | Annotated Long Zhuang Morality Songs (壮族伦理道德长诗传扬歌译注) | 38 | ||
GZA-3 | Compendium of Old Zhuang Folksong Texts—Wooing Songs vol. 1—Liao Songs (壮族民歌古籍集成•情歌(一)嘹歌) | 2 | ||
GZA-4 | Compendium of Old Zhuang Folksong Texts—Wooing Songs vol. 1—Fwen Nganx (壮族民歌古籍集成•情歌(二)欢𭪤) | 11 | ||
GZA-6 | Zhuang Proverbs from China (中国壮族谚语) | 59 | ||
GZA-7 | Ancient Remembrance—Zhuang Creation Myth Songs (远古的追忆•壮族创世神话古歌研究) | 1 | ||
South Korea | KC | Korean History On-Line (한국 역사 정보 통합 시스템) | 512 | 512 |
North Korea | KP1 | KPS 10721-2000 | 1 | 1 |
Taiwan | T12 | CNS 11643 (pending new version) plane 18 | 7 | 714 |
T13 | CNS 11643 (pending new version) plane 19 | 696 | ||
T4 | CNS 11643-1992 plane 4 | 1 | ||
T6 | CNS 11643-1992 plane 6 | 1 | ||
TB | CNS 11643-2007 plane 11 | 5 | ||
TC | CNS 11643-2007 plane 12 | 3 | ||
TE | CNS 11643-2007 plane 14 | 1 | ||
United Kingdom | UK | IRG N2232R | 917 | 917 |
Vietnam | V0 | TCVN 5773:1993 | 6 | 931 |
V4 | Kho Chữ Hán Nôm Mã Hoá (Hán Nôm Coded Character Repertoire) | 74 | ||
VN | Vietnamese horizontal extensions | 851 | ||
n/a | SAT | SAT Daizōkyō Text Database | 241 | 405 |
UTC | UTC sources | 164 |
A block named CJK Unified Ideographs Extension I was added as part of Unicode 15.1 to the Supplementary Ideographic Plane in the range U+2EBF0 through U+2EE5F, containing 622 characters. [15]
Note: Some characters appear in more than one source, making the sum of individual character counts (625) more than the number of encoded characters (622). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GIDC23 | ID system of the Ministry of Public Security of China, 2023 | 622 | 622 |
Japan | JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 1 | 1 |
n/a | UTC | UTC sources | 2 | 2 |
The block named CJK Compatibility Ideographs (F900–FAFF) was created to retain round-trip compatibility with other standards.
However, twelve characters in this block actually have the "Unified Ideograph" property: U+FA0E 﨎, U+FA0F 﨏, U+FA11 﨑, U+FA13 﨓, U+FA14 﨔, U+FA1F 﨟, U+FA21 﨡, U+FA23 﨣, U+FA24 﨤, U+FA27 﨧, U+FA28 﨨, and U+FA29 﨩. [1] None of the other characters in this and other "Compatibility" blocks relate to CJK unification.
While 龜 and 亀 are not considered unifiable, it is not clear why U+FA20蘒CJK COMPATIBILITY IDEOGRAPH-FA20 is considered equivalent to U+8612蘒CJK UNIFIED IDEOGRAPH-8612.
Note: All characters appear in more than one source, so the sum of individual character counts (40) is greater than the number of encoded characters (12). [10]
Country or region | Code | Source [11] | Character count | Total |
---|---|---|---|---|
China | GU | No source (the original source reference may have been moved) | 12 | 12 |
Japan | J3 | JIS X 0213:2004 Level 3 | 3 | 12 |
J4 | JIS X 0213:2004 Level 4 | 3 | ||
JA | Japanese IT Vendors Contemporary Ideographs, 1993 | 1 | ||
JA3 | JIS X 0213:2004 level-3 characters replacing JA characters | 1 | ||
JMJ | Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業) | 4 | ||
Taiwan | TF | CNS 11643-2007 plane 15 | 1 | 1 |
Vietnam | V0 | TCVN 5773:1993 | 3 | 3 |
n/a | UTC | UTC sources | 12 | 12 |
The character U+4039 (䀹) was a unification of two different characters (one with jiā 夾 phonetic and one with shǎn 㚒 phonetic) until Unicode 5.0. However, they were lexically different characters that should not have been unified; they have different pronunciations and different meanings.
The proposal of disunification of U+4039 [16] was accepted for Unicode 5.1, encoding a new character at U+9FC3 (鿃) to represent shǎn.
In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (𠅻), U+204AF (𠒯) and U+24CB2 (𤲲). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones. [17]
Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded by mistake. [18] Additionally, an ISO/IEC JTC 1/SC 2 report has found that six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a de facto disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake: [19]
Apart from the ten blocks of "Unified Ideographs," Unicode has about a dozen more blocks with not-unified CJK-characters. These are mainly CJK radicals, strokes, punctuation, marks, symbols and compatibility characters. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different. An example of a not-unified CJK-character is U+3007〇IDEOGRAPHIC NUMBER ZERO in the CJK Symbols and Punctuation block. Although it is not covered under "CJK Unified Ideographs", it is treated as a CJK-character for all other intents and purposes. [20]
Four blocks of compatibility characters are included for compatibility with legacy text handling systems and older character sets:
They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means. Therefore, their use is discouraged.
The blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A, being parts of the Basic Multilingual Plane, are supported by the majority of the CJK fonts. However, Japanese and Korean fonts usually have fewer characters (about 13,000 and 8,000, respectively) than Chinese. Extensions B, C, D are supported by additional fonts MingLiU-ExtB, MingLiU_HKSCS-ExtB, PMingLiU-ExtB, SimSun-ExtB included in Microsoft Windows since Vista. [21]
Unicode version | Addition | Plane | Characters added | Total characters |
---|---|---|---|---|
1.0 (1991) | CJK Unified Ideographs | Basic Multilingual Plane (BMP) | 20,902 | 20,914 |
CJK Compatibility Ideographs | BMP | 12 | ||
3.0 (1999) | CJK Unified Ideographs Extension A | BMP | 6,582 | 27,496 |
3.1 (2001) | CJK Unified Ideographs Extension B | Supplementary Ideographic Plane (SIP) | 42,711 | 70,207 |
4.1 (2005) | CJK Unified Ideographs: Ideographs from HKSCS-2004 and GB 18030-2000 not in ISO 10646 | BMP | 22 | 70,229 |
5.1 (2008) | CJK Unified Ideographs: Ideographs from Adobe Japan and disunification of U+4039 | BMP | 8 | 70,237 |
5.2 (2009) | CJK Unified Ideographs Extension C | SIP | 4,149 | 74,394 |
8 other characters from ARIB #47, #95, #93 and HKSCS | BMP | 8 | ||
6.0 (2010) | CJK Unified Ideographs Extension D | SIP | 222 | 74,616 |
6.1 (2012) | 1 character corresponding to Adobe-Japan1-6 CID+20156 | BMP | 1 | 74,617 |
8.0 (2015) | CJK Unified Ideographs Extension E | SIP | 5,762 | 80,388 |
9 other characters | BMP | 9 | ||
10.0 (2017) | CJK Unified Ideographs Extension F | SIP | 7,473 | 87,882 |
21 other characters | BMP | 21 | ||
11.0 (2018) | CJK Unified Ideographs | BMP | 5 | 87,887 |
13.0 (2020) | CJK Unified Ideographs | BMP | 13 | 92,856 |
CJK Unified Ideographs Extension A | BMP | 10 | ||
CJK Unified Ideographs Extension B | SIP | 7 | ||
CJK Unified Ideographs Extension G | Tertiary Ideographic Plane (TIP) | 4,939 | ||
14.0 (2021) | CJK Unified Ideographs | BMP | 3 | 92,865 |
CJK Unified Ideographs Extension B | SIP | 2 | ||
CJK Unified Ideographs Extension C | SIP | 4 | ||
15.0 (2022) | CJK Unified Ideographs Extension C | SIP | 1 | 97,058 |
CJK Unified Ideographs Extension H | TIP | 4,192 | ||
15.1 (2023) | CJK Unified Ideographs Extension I | SIP | 622 | 97,680 |
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.
GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0.
The CNS 11643 character set, also officially known as the Chinese Standard Interchange Code or CSIC, is officially the standard character set of Taiwan. In practice, variants of the related Big5 character set are de facto standard.
The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Set. IRG is tasked with preparing and reviewing sets of CJK unified ideographs for eventual inclusion in both ISO/IEC 10646 and The Unicode Standard. The IRG is composed of representatives from national standards bodies from China, Japan, South Korea, Vietnam, and other regions that have historically used Chinese characters, as well as experts from liaison organizations such as the SAT Daizōkyō Text Database Committee (SAT), Taipei Computer Association (TCA), and the Unicode Technical Committee (UTC). The group holds two meetings every year lasting 4-5 days each, subsequently reporting its activities to its parent ISO/IEC JTC 1/SC 2 (SC2/WG2) committee.
Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.
In Unicode, two glyphs are said to be Z-variants if they share the same etymology but have slightly different appearances and different Unicode code points. For example, the Unicode characters U+8AAA 說 and U+8AAC 説 are Z-variants. The notion of Z-variance is only applicable to the "CJKV scripts"—Chinese, Japanese, Korean and Vietnamese—and is a subtopic of Han unification.
Biangbiang noodles, alternatively known as youpo chemian in Chinese, are a type of Chinese noodle originating from Shaanxi cuisine. The noodles, touted as one of the "eight curiosities" of Shaanxi (陕西八大怪), are described as being like a belt, owing to their thickness and length.
A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.
Chinese characters may have several variant forms—visually distinct glyphs that represent the same underlying meaning and pronunciation. Variants of a given character are allographs of one another, and many are directly analogous to allographs present in the English alphabet, such as the double-storey ⟨a⟩ and single-storey ⟨ɑ⟩ variants of the letter A, with the latter more commonly appearing in handwriting. Some contexts require usage of specific variants.
Ken Roger Lunde is an American specialist in information processing for East Asian languages.
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 16.0, five of the planes have assigned code points (characters), and seven are named.
KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.
A variant form is an alternate glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character.
CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.
CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2002 and 2006, plus five "urgently needed" characters added in Unicode versions 14.0 and 15.0, some of which had previously been mistakenly unified with other characters.
CJK Unified Ideographs Extension D is a Unicode block containing uncommon CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, some of which are in current use. Much smaller than most Unicode blocks for CJK unified ideographs, Extension D consists of characters which were submitted to the Ideographic Research Group as "urgently needed characters" between 2006 and 2009. Characters submitted during the same period which were needed less urgently were included in CJK Unified Ideographs Extension E instead.
CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.
CJK Unified Ideographs Extension E is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2006 and 2013, excluding the characters submitted as "urgently needed" between 2006 and 2009, which were included in CJK Unified Ideographs Extension D.
CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language, which were submitted to the Ideographic Research Group between 2012 and 2015.
CJK Unified Ideographs Extension I is a Unicode block comprising CJK Unified Ideographs included in drafts of an amendment to China's GB 18030 standard circulated in 2022 and 2023, which were fast-tracked into Unicode in 2023.
{{cite web}}
: CS1 maint: bot: original URL status unknown (link)