Cham (Unicode block)

Last updated
Cham
RangeU+AA00..U+AA5F
(96 code points)
Plane BMP
Scripts Cham
Major alphabetsEastern Cham
Assigned83 code points
Unused13 reserved code points
Unicode version history
5.1 (2008)83 (+83)
Chart
Code chart
Note: [1] [2]

Cham is a Unicode block containing characters of the Cham script, which is used for writing the Cham language, primarily used for the Eastern dialect in Cambodia and Vietnam.

A separate block for Western Cham, used in Cambodia, was first proposed to Unicode in 2016. As of May 2022 it is still being finalized. [3]

Cham [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+AA0x
U+AA1x
U+AA2x
U+AA3x
U+AA4x
U+AA5x
Notes
1. ^ As of Unicode version 15.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Cham block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
5.1U+AA00..AA36, AA40..AA4D, AA50..AA59, AA5C..AA5F83N1126Cham script [proposal summary form], 1994-10-14
N1203 Umamaheswaran, V. S.; Ksar, Mike (1995-05-03), "6.1.2.2", Unconfirmed minutes of SC2/WG2 Meeting 27, Geneva
L2/97-143 N1578 Everson, Michael (1997-04-06), Cham encoding discussion
L2/97-124 N1559 Everson, Michael (1997-05-01), Proposal for encoding the Cham script in ISO/IEC 10646
L2/97-288 N1603 Umamaheswaran, V. S. (1997-10-24), "8.22", Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June – 4 July 1997
L2/99-081 N1960 Everson, Michael (1999-02-01), Response to Ngo Trung Viet on feedback from Cham experts
N1997 Nhan, Ngo Than (1999-02-26), Response to Michael Everson
L2/06-257 N3120 Everson, Michael (2006-08-06), Proposal for encoding the Cham script in the BMP of the UCS
L2/06-231 Moore, Lisa (2006-08-17), "C.14", UTC #108 Minutes
N3153 (pdf, doc)Umamaheswaran, V. S. (2007-02-16), "M49.18", Unconfirmed minutes of WG 2 meeting 49 AIST, Akihabara, Tokyo, Japan; 2006-09-25/29
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

Cham or CHAM may refer to:

<span class="mw-page-title-main">Cham script</span> Abugida writing system

The Cham script is a Brahmic abugida used to write Cham, an Austronesian language spoken by some 245,000 Chams in Vietnam and Cambodia. It is written horizontally left to right, just like other Brahmic abugidas.

Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

Arabic Mathematical Alphabetic Symbols is a Unicode block encoding characters used in Arabic mathematical expressions.

Mandaic is a Unicode block containing characters of the Mandaic script used for writing the historic Eastern Aramaic, also called Classical Mandaic, and the modern Neo-Mandaic language.

Georgian is a Unicode block containing the Mkhedruli and Asomtavruli Georgian characters used to write Modern Georgian, Svan, and Mingrelian languages. Another lower case, Nuskhuri, is encoded in a separate Georgian Supplement block, which is used with the Asomtavruli to write the ecclesiastical Khutsuri Georgian script.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block, added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Khmer Symbols is a Unicode block containing lunar date symbols, used in the writing system of the Khmer (Cambodian) language. For further details see Khmer alphabet – Unicode.

Vedic Extensions is a Unicode block containing characters for representing tones and other vedic symbols in Devanagari and other Indic scripts. Related symbols are defined in two other blocks: Devanagari (U+0900–U+097F) and Devanagari Extended (U+A8E0–U+A8FF).

Phags-pa is a Unicode block containing characters from the 'Phags-pa script promulgated as a national script by Kublai Khan, the founder of the Yuan dynasty. It was used primarily in writing Mongolian and Chinese, although it was intended for the use of all written languages of the Mongol Empire.

Bamum is a Unicode block containing the characters of stage-G Bamum script, used for modern writing of the Bamum language of western Cameroon. Characters for writing earlier orthographies are contained in a Bamum Supplement block.

Javanese is a Unicode block containing aksara Jawa characters traditionally used for writing the Javanese language.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the second-to-last block of the Basic Multilingual Plane, followed only by the short Specials block at U+FFF0–FFFF. Its block name in Unicode 1.0 was Halfwidth and Fullwidth Variants.

Manichaean is a Unicode block containing characters historically used for writing Sogdian, Parthian, and the dialects of Fars.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Multani is a Unicode block containing characters used for writing the Multani alphabet, a Brahmic script used in the Multan region of Punjab and in northern Sindh in Pakistan. The script is now obsolete, but was historically used to write the Saraiki language.

Ideographic Symbols and Punctuation is a Unicode block containing symbols and punctuation marks used by ideographic scripts such as Tangut and Nüshu.

The Khmer keyboard includes several keyboard layouts for Khmer script.

References