CJK Symbols and Punctuation | |
---|---|
Range | U+3000..U+303F (64 code points) |
Plane | BMP |
Scripts | Han (15 char.) Hangul (2 char.) Common (43 char.) Inherited (4 char.) |
Assigned | 64 code points |
Unused | 0 reserved code points |
Unicode version history | |
1.0.0 (1991) | 56 (+56) |
1.0.1 (1992) | 56 (+0) |
1.1 (1993) | 57 (+1) |
3.0 (1999) | 61 (+4) |
3.2 (2002) | 64 (+3) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1] [2] In Unicode 1.0.1, during the process of unifying with ISO 10646, the "IDEOGRAPHIC DITTO MARK" (仝) was unified with the unified ideograph at U+4EDD, allowing the Japanese Industrial Standard symbol to be moved from U+32FF in the Enclosed CJK Letters and Months block to the vacated code point at U+3004. [3] |
CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.
CJK Symbols and Punctuation [1] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+300x | ID SP | 、 | 。 | 〃 | 〄 | 々 | 〆 | 〇 | 〈 | 〉 | 《 | 》 | 「 | 」 | 『 | 』 |
U+301x | 【 | 】 | 〒 | 〓 | 〔 | 〕 | 〖 | 〗 | 〘 | 〙 | 〚 | 〛 | 〜 | 〝 | 〞 | 〟 |
U+302x | 〠 | 〡 | 〢 | 〣 | 〤 | 〥 | 〦 | 〧 | 〨 | 〩 | 〪 | 〫 | 〬 | 〭 | 〮 | 〯 |
U+303x | 〰 | 〱 | 〲 | 〳 | 〴 | 〵 | 〶 | 〷 | 〸 | 〹 | 〺 | 〻 | 〼 | 〽 | 〾 | 〿 |
Notes
|
The block has variation sequences defined for East Asian punctuation positional variants. [4] [5] They use U+FE00 VARIATION SELECTOR-1 (VS01) and U+FE01 VARIATION SELECTOR-2 (VS02):
U+ | 3001 | 3002 | Description |
base code point | 、 | 。 | |
base + VS01 | 、︀ | 。︀ | corner-justified form |
base + VS02 | 、︁ | 。︁ | centered form |
Quotation marks and other punctuation have expected differences in behaviour in vertical and horizontal text. The quotation marks 「...」, 『...』 and 〝...〟 rotate 90 degrees, as follows:
See also General Punctuation, for variation selectors and CJK behaviour of the Latin quotation marks ‘...’ and “...”.
The CJK Symbols and Punctuation block contains one Chinese character: U+3007〇IDEOGRAPHIC NUMBER ZERO. Although it is not covered under "Unified Ideographs", it is treated as a CJK character for all other intents and purposes. [6]
The CJK Symbols and Punctuation block contains two emoji: U+3030 and U+303D. [7] [8]
The block has four standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the two emoji, both of which default to a text presentation. [9]
U+ | 3030 | 303D |
base code point | 〰 | 〽 |
base+VS15 (text) | 〰︎ | 〽︎ |
base+VS16 (emoji) | 〰️ | 〽️ |
In Unicode 1.0.1, two changes were made to this block in order to make Unicode 1.0.1 a proper subset of ISO 10646: [10] [11] [12]
The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Symbols and Punctuation block:
Version | Final code points [lower-alpha 1] | Count | L2 ID | WG2 ID | IRG ID | Document |
---|---|---|---|---|---|---|
1.0.0 | U+3000..3003, 3005..3037, 303F | 56 | (to be determined) | |||
L2/11-402 | Iancu, Laurențiu (2011-10-20), Proposal to change the General_Category of Hangul tone marks U+302E and U+302F | |||||
L2/14-198 | N4606 | Komatsu, Hiroyuki (2014-08-06), Proposal for the modification of the sample character layout of WAVE_DASH (U+301C) | ||||
L2/14-177 | Moore, Lisa (2014-10-17), "WAVE_DASH (B.15.3)", UTC #140 Minutes | |||||
L2/16-052 | N4603 (pdf, doc) | Umamaheswaran, V. S. (2015-09-01), "M63.11v", Unconfirmed minutes of WG 2 meeting 63, Reverse the shape of current glyph for 301C WAVE DASH as requested in document N4606 | ||||
L2/17-056 | Lunde, Ken (2017-02-13), Proposal to add standardized variation sequences | |||||
L2/17-436 | Lunde, Ken (2018-01-21), Proposal to add standardized variation sequences for fullwidth East Asian punctuation | |||||
L2/18-039 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Cook, Richard (2018-01-19), "24. Fullwidth East Asian Punctuation", Recommendations to UTC #154 January 2018 on Script Proposals | |||||
L2/18-115 | Moore, Lisa (2018-05-09), "Consensus 154-C17", UTC #155 Minutes, Add 16 standardized variation sequences based on L2/17-436R, for Unicode 12.0. | |||||
L2/23-167 | Koo, Night (2023-07-01), Proposal to update representative glyph of U+3029 SUZHOU NUMERAL NINE | |||||
L2/23-227 | Chan, Eiso (2023-10-07), Feedback on L2/23-167 (Proposal to update representative glyph of U+3029 SUZHOU NUMERAL NINE) | |||||
L2/23-237R | Lunde, Ken (2023-11-02), "14 [Affects U+3029]", CJK & Unihan Group Recommendations for UTC #177 Meeting | |||||
L2/23-231 | Constable, Peter (2023-12-08), "Section 14 [Affects U+3029]", UTC #177 Minutes | |||||
L2/23-281 | Koo, Night (2023-11-28), Update Suzhou numerals in CJK Symbols font (GitHub issue) [Affects U+3021-3029] | |||||
L2/24-012 | Lunde, Ken (2024-01-11), "19 [Affects U+3021-3029]", CJK & Unihan Group Recommendations for UTC #178 Meeting | |||||
L2/24-006 | Constable, Peter (2024-01-31), "Section 19 [Affects U+3021-3029]", UTC #178 Minutes | |||||
1.0.1 | U+3004 | 1 | (to be determined) | |||
3.0 | U+3038..303A | 3 | L2/97-017 | N1182 | N202 | Proposal to add 210 KangXi Radicals and 3 HANGZHOU Numbers in BMP for compatibility, 1995-03-23 |
N1203 | Umamaheswaran, V. S.; Ksar, Mike (1995-05-03), "6.1.11", Unconfirmed minutes of SC2/WG2 Meeting 27, Geneva | |||||
N1303 (html, doc) | Umamaheswaran, V. S.; Ksar, Mike (1996-01-26), Minutes of Meeting 29, Tokyo | |||||
L2/97-284 | N1629 | N486 | Zhang, Zhoucai (1997-07-07), Kangxi Radicals and Hangzhou Numerals | |||
L2/97-255R | Aliprand, Joan (1997-12-03), "4.B.1 Hangzhou Numerals", Approved Minutes – UTC #73 & L2 #170 joint meeting, Palo Alto, CA – August 4-5, 1997, Motion [#73-M9]: That the UTC concurs with SC2/WG2 Resolution M32.11, and accepts the 3 Hangzhou numeral characters. | |||||
L2/98-112 | N1629R | Zhang, Zhoucai (1998-03-19), Kangxi Radicals, Hangzhou Numerals | ||||
L2/98-332 | N1923 | Combined PDAM registration and consideration ballot on WD for ISO/IEC 10646-1/Amd. 15, AMENDMENT 15: Kang Xi radicals and CJK radicals supplement, 1998-10-28 | ||||
L2/99-073 | N1968 (html, doc) | Summary of Voting on SC 2 N 3213, PDAM ballot on WD for 10646-1/Amd. 15: Kang Xi radicals and CJK radicals supplement, 1999-02-08 | ||||
L2/99-119 | Text for FPDAM ballot of ISO/IEC 10646, Amd. 15 - Kang Xi radicals and CJK radicals supplement, 1999-04-07 | |||||
L2/99-232 | N2003 | Umamaheswaran, V. S. (1999-08-03), "6.1.1 PDAM15 - Kang Xi & CJK Radicals", Minutes of WG 2 meeting 36, Fukuoka, Japan, 1999-03-09--15 | ||||
L2/99-252 | N2065 | Summary of Voting on SC 2 N 3311, ISO 10646-1/FPDAM 15 - Kang Xi radicals and CJK radicals supplement, 1999-08-19 | ||||
L2/99-300 | N2122 | Paterson, Bruce (1999-09-21), Revised Text for FDAM ballot of ISO/IEC 10646-1/FDAM 15, AMENDMENT 15: Kang Xi radicals and CJK radicals supplement | ||||
L2/00-044 | Summary of FDAM voting: ISO 10646 Amendment 15: Kang Xi radicals and CJK radicals supplement, 2000-01-31 | |||||
L2/23-281 | Koo, Night (2023-11-28), "19:none", Update Suzhou numerals in CJK Symbols font (GitHub issue) | |||||
L2/24-012 | Lunde, Ken (2024-01-11), "19", CJK & Unihan Group Recommendations for UTC #178 Meeting | |||||
L2/24-006 | Constable, Peter (2024-01-31), "Section 19", UTC #178 Minutes | |||||
U+303E | 1 | N1431 | N406, N406A | Ideographic Variation Mark, 1996-06-27 | ||
N1453 | Ksar, Mike; Umamaheswaran, V. S. (1996-12-06), "9.7 Ideographic Variation Mark", WG 2 Minutes - Quebec Meeting 31 | |||||
L2/97-023 | N1486 | N437 | IRG #8 Resolutions, 1997-01-16 | |||
N1489 | Supplement to Ideographic Components and Composition Schemes, 1997-01-16 | |||||
N1490 | N436 | "Response related to N1431 (Ideographic Variation Mark)", Response to WG2 question on Ideographic Structure Characters, 1997-01-16 | ||||
L2/97-024 | N1491 | IRG proposal: Ideographic variant character, 1997-01-19 | ||||
L2/97-030 | N1503 (pdf, doc) | Umamaheswaran, V. S.; Ksar, Mike (1997-04-01), "9.5", Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24 | ||||
L2/97-114 | N1544 (html, doc) | N453 | Sato, T. K. (1997-04-08), Questions on the "Han structure method" described in WG2 N1490 (IRG N436) | |||
N1678 (pdf, doc) | Further explanation on Variation Mark, 1997-12-18 | |||||
L2/98-100 | N1728 | Ad-hoc report on ideographic variation indicator, 1998-03-18 | ||||
L2/98-158 | Aliprand, Joan; Winkler, Arnold (1998-05-26), "Ideographic Variation Indicator", Draft Minutes – UTC #76 & NCITS Subgroup L2 #173 joint meeting, Tredyffrin, Pennsylvania, April 20-22, 1998 | |||||
L2/98-286 | N1703 | Umamaheswaran, V. S.; Ksar, Mike (1998-07-02), "9.3", Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20 | ||||
L2/98-321 | N1905 | Revised text of 10646-1/FPDAM 23, AMENDMENT 23: Bopomofo Extended and other characters, 1998-10-22 | ||||
L2/23-281 | Koo, Night (2023-11-28), Update Suzhou numerals in CJK Symbols font (GitHub issue) | |||||
L2/24-012 | Lunde, Ken (2024-01-11), "19", CJK & Unihan Group Recommendations for UTC #178 Meeting | |||||
L2/24-006 | Constable, Peter (2024-01-31), "Section 19", UTC #178 Minutes | |||||
3.2 | U+303B..303D | 3 | L2/99-238 | Consolidated document containing 6 Japanese proposals, 1999-07-15 | ||
N2092 | Addition of forty eight characters, 1999-09-13 | |||||
L2/00-024 | Shibano, Kohji (2000-01-31), JCS proposal revised | |||||
L2/00-098, L2/00-098-page5 | N2195 | Rationale for non-Kanji characters proposed by JCS committee, 2000-03-15 | ||||
L2/00-234 | N2203 (rtf, txt) | Umamaheswaran, V. S. (2000-07-21), "8.20", Minutes from the SC2/WG2 meeting in Beijing, 2000-03-21 -- 24 | ||||
L2/00-298 | N2258 | Sato, T. K. (2000-09-04), JIS X 0213 symbols part-2 | ||||
L2/00-342 | N2278 | Sato, T. K.; Everson, Michael; Whistler, Ken; Freytag, Asmus (2000-09-20), Ad hoc Report on Japan feedback N2257 and N2258 | ||||
L2/01-050 | N2253 | Umamaheswaran, V. S. (2001-01-21), "7.16 JIS X0213 Symbols", Minutes of the SC2/WG2 meeting in Athens, September 2000 | ||||
L2/01-114 | N2328 | Summary of Voting on SC 2 N 3503, ISO/IEC 10646-1: 2000/PDAM 1, 2001-03-09 | ||||
L2/11-438 [lower-alpha 2] [lower-alpha 3] | N4182 | Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429) | ||||
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.
Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not explicitly categorize these characters as being "letterlike."
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.
The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.
Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.
CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs submitted to the Ideographic Research Group between 1992 and 1998, plus ten ideographs added in Unicode 13.0 which had previously been mistakenly unified with others.
Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.
A variant form is an alternate glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character.
CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese, Japanese, Korean and Vietnamese characters. When contrasted with other blocks containing CJK Unified Ideographs, it is also referred to as the Unified Repertoire and Ordering (URO).
CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.
CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2002 and 2006, plus five "urgently needed" characters added in Unicode versions 14.0 and 15.0, some of which had previously been mistakenly unified with other characters.
CJK Unified Ideographs Extension D is a Unicode block containing uncommon CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, some of which are in current use. Much smaller than most Unicode blocks for CJK unified ideographs, Extension D consists of characters which were submitted to the Ideographic Research Group as "urgently needed characters" between 2006 and 2009. Characters submitted during the same period which were needed less urgently were included in CJK Unified Ideographs Extension E instead.
CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.
Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.
General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.
Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.
Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the second-to-last block of the Basic Multilingual Plane, followed only by the short Specials block at U+FFF0–FFFF. Its block name in Unicode 1.0 was Halfwidth and Fullwidth Variants.
Variation Selectors is a Unicode block containing 16 variation selectors used to specify a glyph variant for a preceding character. They are currently used to specify standardized variation sequences for mathematical symbols, emoji symbols, 'Phags-pa letters, and CJK unified ideographs corresponding to CJK compatibility ideographs. At present only standardized variation sequences with VS1–VS4, VS7, VS15 and VS16 have been defined; VS15 and VS16 are reserved to request that a character should be displayed as text or as an emoji respectively.
CJK Unified Ideographs Extension E is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2006 and 2013, excluding the characters submitted as "urgently needed" between 2006 and 2009, which were included in CJK Unified Ideographs Extension D.