Cherokee Supplement

Last updated
Cherokee Supplement
RangeU+AB70..U+ABBF
(80 code points)
Plane BMP
Scripts Cherokee
Major alphabetsCherokee
Assigned80 code points
Unused0 reserved code points
Unicode version history
8.0 (2015)80 (+80)
Chart
Code chart
Note: [1] [2]

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase. [3]

Cherokee Supplement [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+AB7xꭿ
U+AB8x
U+AB9x
U+ABAx
U+ABBxꮿ
Notes
1. ^ As of Unicode version 14.0

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Cherokee Supplement block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
8.0U+AB70..ABBF80 L2/13-200 Moore, Lisa (2013-11-18), "C.6.1", UTC #137 Minutes
L2/14-064R N4537R Everson, Michael (2014-02-25), Revised proposal for the addition of Cherokee characters
L2/14-100 Moore, Lisa (2014-05-13), "Consensus 139-C13", UTC #139 Minutes
N4553 (pdf, doc)Umamaheswaran, V. S. (2014-09-16), "M62.07b", Minutes of WG 2 meeting 62 Adobe, San Jose, CA, USA
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

PETSCII Character encoding on Commodore computers

PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines (CBM)'s 8-bit home computers, starting with the PET from 1977 and including the C16, C64, C116, C128, CBM-II, Plus/4, and VIC-20.

Cherokee syllabary Writing system invented by Sequoyah to write the Cherokee language

The Cherokee syllabary is a syllabary invented by Sequoyah in the late 1810s and early 1820s to write the Cherokee language. His creation of the syllabary is particularly noteworthy as he was illiterate until the creation of his syllabary. He first experimented with logograms, but his system later developed into a syllabary. In his system, each symbol represents a syllable rather than a single phoneme; the 85 characters provide a suitable method for writing Cherokee. Although some symbols resemble Latin, Greek, Cyrillic, and Glagolitic letters, they are not used to represent the same sounds.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script, see the list of languages by writing system. More or less complementary to scripts are symbols and Unicode control characters.

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). Controls C1 (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.

IPA Extensions is a block (0250–02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the English alphabet.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Georgian is a Unicode block containing the Mkhedruli and Asomtavruli Georgian characters used to write Modern Georgian, Svan, and Mingrelian languages. Another lower case, Nuskhuri, is encoded in a separate Georgian Supplement block, which is used with the Asomtavruli to write the ecclesiastical Khutsuri Georgian script.

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block, added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages.

Deseret is a Unicode block containing characters in the Deseret alphabet, which were invented by The Church of Jesus Christ of Latter-day Saints to write English. The Deseret block was derived from an earlier private use encoding in the ConScript Unicode Registry, like the Shavian and Phaistos Disc encodings. The block was added in version 3.1 of the Unicode Standard; the letters Oi and Ew, both uppercase and lowercase, were added in version 4.0.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Lisu is a Unicode block containing characters of the Fraser alphabet, which is used to write the Lisu language. This alphabet consists of glyphs resembling capital letters in the basic Latin alphabet in their standard form and horizontally or vertically mirrored.

Glagolitic Supplement is a Unicode block containing supplementary characters used in the Glagolitic script. It currently contains 38 combining letters.

Georgian Extended is a Unicode block containing Georgian Mtavruli letters that function as uppercase versions of their Mkhedruli counterparts in the Georgian block. Unlike all other casing scripts in Unicode, there is no title casing between Mkhedruli and Mtavruli letters, because Mtavruli is typically used only in all-caps text, although there have been some historical attempts at capitalization.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. "The Unicode Standard Version 13.0 – Core Specification" (PDF). The Unicode Consortium. Retrieved 20 May 2021.