Bopomofo (Unicode block)

Last updated
Bopomofo
RangeU+3100..U+312F
(48 code points)
Plane BMP
Scripts Bopomofo
Major alphabetsPhonetic Chinese
Assigned43 code points
Unused5 reserved code points
Source standards GB 2312
Unicode version history
1.0.040 (+40)
5.141 (+1)
10.042 (+1)
11.043 (+1)
Note: [1] [2]

Bopomofo is a Unicode block containing phonetic characters for Chinese. The original set of 40 Bopomofo characters is based on the Chinese standard GB 2312. Additional Bopomofo characters can be found in the Bopomofo Extended block.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

GB/T 2312-1980 is a key official character set of the People's Republic of China, used for simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB abbreviates Guojia Biaozhun (国家标准), which means national standard in Chinese. GB2312 (1980) has been superseded by GBK and GB18030, which include additional characters, but GB2312 remains in widespread use as a subset of those encodings.

Bopomofo Extended is a Unicode block containing additional Bopomofo characters for writing phonetic Min Nan, Hakka Chinese, Hmu, and Ge. The basic set of Bopomofo characters can be found in the Bopomofo block.

Contents

Block

Bopomofo [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+310x
U+311x
U+312x
Notes
1. ^ As of Unicode version 12.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Bopomofo block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
1.0.0U+3105..312C40(to be determined)
L2/14-189 Lunde, Ken (2014-08-01), BOPOMOFO LETTER I
N4609 Reply to your company's inquiry to usage of Bopomofo Letter I, 2014-08-20
L2/14-177 Moore, Lisa (2014-10-17), "Bopomofo Letter I (B.15.1)", UTC #140 Minutes
L2/16-052 N4603 (pdf, doc)Umamaheswaran, V. S. (2015-09-01), "M63.11u", Unconfirmed minutes of WG 2 meeting 63, Rotate the glyph for 3127 BOPOMOFO LETTER I by 90 degrees, with an appropriate change inthe annotation, per request in document N4609
5.1U+312D1 L2/06-338 N3179 Everson, Michael; Ho, H. W.; West, Andrew (2006-10-19), Proposal to encode one Bopomofo character in the UCS
L2/06-324R2 Moore, Lisa (2006-11-29), "Consensus 109-C24", UTC #109 Minutes
L2/07-215 N3246 Proposal to encode two Bopomofo characters in UCS, 2007-04-20
L2/07-268 N3253 (pdf, doc)Umamaheswaran, V. S. (2007-07-26), "M50.19", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27
10.0U+312E1 L2/15-284 N4695 West, Andrew; Liang, Hai (2015-10-22), Discussion of 2CEF1
L2/15-254 Moore, Lisa (2015-11-16), "Consensus 145-C20", UTC #145 Minutes, Approve U+312E BOPOMOFO LETTER O WITH DOT ABOVE for encoding in a future version of the standard. See document L2/15-270.
N4739 "M64.05e", Unconfirmed minutes of WG 2 meeting 64, 2016-08-31
11.0U+312F1 L2/16-106 N4717 West, Andrew (2016-04-21), Proposal to encode one Bopomofo letter
L2/16-121 Moore, Lisa (2016-05-20), "C.16", UTC #147 Minutes
N4756 TCA Feedback on JTC1/SC2/WG2 N4717, 2016-09-22
N4873R (pdf, doc)"7.2.1 T1", Unconfirmed minutes of WG 2 meeting 65, 2018-03-16
  1. Proposed code points and characters names may differ from final code points and names

See also

Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization.

Related Research Articles

Dingbat typographic symbol

In typography, a dingbat is an ornament, character, or spacer used in typesetting, often employed for the creation of box frames. The term continues to be used in the computer industry to describe fonts that have symbols and shapes in the positions designated for alphabetical or numeric characters.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (U+E000U+F8FF), and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0-25FF.

Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not explicitly categorise these characters as being "letterlike".

Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example:

Supplemental Arrows-B is a Unicode block containing miscellaneous arrows, arrow tails, crossing arrows used in knot descriptions, curved arrows, and harpoons.

Miscellaneous Symbols and Arrows is a Unicode block containing arrows and geometric shapes with various fills.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages.

Unicode is a computing industry standard for the handling of fonts and symbols. Within it is a set of images depicting playing cards, and another depicting the French card suits.

Gurmukhi is a Unicode block containing characters for the Punjabi language, as it is written in India. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, Badaga, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B02..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Hiragana is a Unicode block containing hiragana characters for the Japanese language.

Ideographic Description Characters is a Unicode block containing graphic characters used for describing CJK ideographs. They are used in Ideographic Description Sequences (IDS) to provide a description of a ideograph, in terms of what other ideographs make it up and how they are laid out relative to one another. An IDS provides the reader with a description of an ideograph that cannot be represented properly, usually because it is not encoded in Unicode; rendering systems are not intended to automatically compose the pieces into a complete ideograph, and the descriptions are not standardized.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. During the unification with ISO 10646 for version 1.1, the Japanese Industrial Standard Symbol was reassigned from the code point U+32FF at the end of the block to U+3004. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

Mahjong Tiles is a Unicode block containing characters depicting the standard set of tiles used in the game of Mahjong.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the last of the Basic Multilingual Plane excepting the short Specials block at U+FFF0–FFFF.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.