Bopomofo (Unicode block)

Bopomofo
Range	U+3100..U+312F; (48 code points)
Plane	BMP
Scripts	Bopomofo
Major alphabets	Phonetic Chinese
Assigned	43 code points
Unused	5 reserved code points
Source standards	GB 2312
Unicode version history
1.0.0	40 (+40)
5.1	41 (+1)
10.0	42 (+1)
11.0	43 (+1)
	This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

Last updated September 26, 2019

Bopomofo is a Unicode block containing phonetic characters for Chinese. The original set of 40 Bopomofo characters is based on the Chinese standard GB 2312. Additional Bopomofo characters can be found in the Bopomofo Extended block.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

GB/T 2312-1980 is a key official character set of the People's Republic of China, used for simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB abbreviates Guojia Biaozhun (国家标准), which means national standard in Chinese. GB2312 (1980) has been superseded by GBK and GB18030, which include additional characters, but GB2312 remains in widespread use as a subset of those encodings.

Bopomofo Extended is a Unicode block containing additional Bopomofo characters for writing phonetic Min Nan, Hakka Chinese, Hmu, and Ge. The basic set of Bopomofo characters can be found in the Bopomofo block.

Block

Bopomofo ^[1]^[2] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+310x						ㄅ	ㄆ	ㄇ	ㄈ	ㄉ	ㄊ	ㄋ	ㄌ	ㄍ	ㄎ	ㄏ
U+311x	ㄐ	ㄑ	ㄒ	ㄓ	ㄔ	ㄕ	ㄖ	ㄗ	ㄘ	ㄙ	ㄚ	ㄛ	ㄜ	ㄝ	ㄞ	ㄟ
U+312x	ㄠ	ㄡ	ㄢ	ㄣ	ㄤ	ㄥ	ㄦ	ㄧ	ㄨ	ㄩ	ㄪ	ㄫ	ㄬ	ㄭ	ㄮ	ㄯ
Notes 1. ^ As of Unicode version 12.0 2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Bopomofo block:

Version	Final code points^{[lower-alpha 1]}	Count	L2 ID	WG2 ID	Document
1.0.0	U+3105..312C	40			(to be determined)
			L2/14-189		Lunde, Ken (2014-08-01), BOPOMOFO LETTER I
				N4609	Reply to your company's inquiry to usage of Bopomofo Letter I, 2014-08-20
			L2/14-177		Moore, Lisa (2014-10-17), "Bopomofo Letter I (B.15.1)", UTC #140 Minutes
			L2/16-052	N4603 (pdf, doc)	Umamaheswaran, V. S. (2015-09-01), "M63.11u", Unconfirmed minutes of WG 2 meeting 63, Rotate the glyph for 3127 BOPOMOFO LETTER I by 90 degrees, with an appropriate change inthe annotation, per request in document N4609
5.1	U+312D	1	L2/06-338	N3179	Everson, Michael; Ho, H. W.; West, Andrew (2006-10-19), Proposal to encode one Bopomofo character in the UCS
			L2/06-324R2		Moore, Lisa (2006-11-29), "Consensus 109-C24", UTC #109 Minutes
			L2/07-215	N3246	Proposal to encode two Bopomofo characters in UCS, 2007-04-20
			L2/07-268	N3253 (pdf, doc)	Umamaheswaran, V. S. (2007-07-26), "M50.19", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27
10.0	U+312E	1	L2/15-284	N4695	West, Andrew; Liang, Hai (2015-10-22), Discussion of 2CEF1
			L2/15-254		Moore, Lisa (2015-11-16), "Consensus 145-C20", UTC #145 Minutes, Approve U+312E BOPOMOFO LETTER O WITH DOT ABOVE for encoding in a future version of the standard. See document L2/15-270.
				N4739	"M64.05e", Unconfirmed minutes of WG 2 meeting 64, 2016-08-31
11.0	U+312F	1	L2/16-106	N4717	West, Andrew (2016-04-21), Proposal to encode one Bopomofo letter
			L2/16-121		Moore, Lisa (2016-05-20), "C.16", UTC #147 Minutes
				N4756	TCA Feedback on JTC1/SC2/WG2 N4717, 2016-09-22
				N4873R (pdf, doc)	"7.2.1 T1", Unconfirmed minutes of WG 2 meeting 65, 2018-03-16
↑ Proposed code points and characters names may differ from final code points and names

Related Research Articles

In typography, a dingbat is an ornament, character, or spacer used in typesetting, often employed for the creation of box frames. The term continues to be used in the computer industry to describe fonts that have symbols and shapes in the positions designated for alphabetical or numeric characters.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (U+E000–U+F8FF), and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0-25FF.

Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not explicitly categorise these characters as being "letterlike".

Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example:

Supplemental Arrows-B is a Unicode block containing miscellaneous arrows, arrow tails, crossing arrows used in knot descriptions, curved arrows, and harpoons.

Miscellaneous Symbols and Arrows is a Unicode block containing arrows and geometric shapes with various fills.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages.

Unicode is a computing industry standard for the handling of fonts and symbols. Within it is a set of images depicting playing cards, and another depicting the French card suits.

Gurmukhi is a Unicode block containing characters for the Punjabi language, as it is written in India. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, Badaga, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B02..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Hiragana is a Unicode block containing hiragana characters for the Japanese language.

Ideographic Description Characters is a Unicode block containing graphic characters used for describing CJK ideographs. They are used in Ideographic Description Sequences (IDS) to provide a description of a ideograph, in terms of what other ideographs make it up and how they are laid out relative to one another. An IDS provides the reader with a description of an ideograph that cannot be represented properly, usually because it is not encoded in Unicode; rendering systems are not intended to automatically compose the pieces into a complete ideograph, and the descriptions are not standardized.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. During the unification with ISO 10646 for version 1.1, the Japanese Industrial Standard Symbol was reassigned from the code point U+32FF at the end of the block to U+3004. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

Mahjong Tiles is a Unicode block containing characters depicting the standard set of tiles used in the game of Mahjong.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the last of the Basic Multilingual Plane excepting the short Specials block at U+FFF0–FFFF.

References

↑ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[final-3] Proposed code points and characters names may differ from final code points and names

[1] "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.

[2] "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.

[1]

[2]

Bopomofo (Unicode block)

Contents

Block

History

See also

Related Research Articles

References