Cherokee (Unicode block)

Last updated
Cherokee
RangeU+13A0..U+13FF
(96 code points)
Plane BMP
Scripts Cherokee
Major alphabetsCherokee
Assigned92 code points
Unused4 reserved code points
Unicode version history
3.0 (1999)85 (+85)
8.0 (2015)92 (+7)
Code chart
Note: [1] [2]

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block (U+13A0 to U+13FF) contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block (U+AB70 to U+ABBF), added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase. [3]

Cherokee [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+13Ax
U+13Bx
U+13Cx
U+13Dx
U+13Ex
U+13Fx
Notes
1. ^ As of Unicode version 14.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Cherokee block:

Version Final code points [lower-alpha 1] Count UTC  ID L2  ID WG2  IDDocument
3.0U+13A0..13F485UTC/1991-102McGowan, Rick (1991-10-24), Cherokee block description and chart draft
UTC/1995-027 N1172 Everson, Michael (1995-03-14), Proposal for encoding the Cherokee script
UTC/1995-xxx "Cherokee Proposal", Unicode Technical Committee Meeting #65, Minutes, 1995-06-02
UTC/1996-016Gourd, Charles (1996-03-05), Cherokee Syllabary
UTC/1996-015Everson, Michael (1996-03-08), Re: Cherokee Nation's ordering
UTC/1996-017Everson, Michael (1996-03-14), Proposal for encoding the Cherokee script
N1362Initial comments on encoding Cherokee into ISO/IEC 10646, 1996-04-01
X3L2/96-034N1356Suignard, Michel (1996-04-17), US position concerning the referenced proposal to encode the Cherokee script
N1353 Umamaheswaran, V. S.; Ksar, Mike (1996-06-25), "8.11", Draft minutes of WG2 Copenhagen Meeting # 30
UTC/1996-027.2 Greenfield, Steve (1996-07-01), "B. Cherokee", UTC #69 Minutes (PART 2)
N1453 Ksar, Mike; Umamaheswaran, V. S. (1996-12-06), "8.12", WG 2 Minutes - Quebec Meeting 31
N1476Paterson, Bruce (1996-12-09), Draft pDAM 12 - Cherokee
N1596Summary of Voting on SC 2 N 2807, Combined PDAM Registration and FPDAM ballot: Amendment 12: Cherokee Script, 1997-06-17
L2/97-288 N1603 Umamaheswaran, V. S. (1997-10-24), "6.4", Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June - 4 July 1997
L2/98-130 Text for FDAM ballot ISO 10646 Amendment 12 - Cherokee, 1998-03-05
L2/14-026 Moore, Lisa (2014-02-17), "Motion 138-M2", UTC #138 Minutes, Any proposal to make the Cherokee script bicameral, should make the existing Cherokee letters uppercase. The UTC deems that this choice would provide better backward compatibility with existing implementations.
8.0U+13F5, 13F8..13FD7 L2/13-190 N4487 Everson, Michael; Feeling, Durbin (2013-10-24), Proposal for the addition of Cherokee characters
L2/13-210 Anderson, Deborah; Whistler, Ken; McGowan, Rick; Pournader, Roozbeh (2013-10-31), "3", Recommendations to UTC #137 November 2013 on Script Proposals
L2/14-064R N4537R Everson, Michael (2014-02-25), Revised proposal for the addition of Cherokee characters
L2/14-100 Moore, Lisa (2014-05-13), "Consensus 139-C13", UTC #139 Minutes
L2/14-187 Whistler, Ken (2014-07-31), Cherokee casing decision may break identifier syntax
N4553 (pdf, doc)Umamaheswaran, V. S. (2014-09-16), "M62.07a", Minutes of WG 2 meeting 62 Adobe, San Jose, CA, USA
L2/15-214 Lunde, Ken (2015-07-30), Phoreus Cherokee type specimen sheet
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

ASCII American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.

PETSCII Character encoding on Commodore computers

PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines (CBM)'s 8-bit home computers, starting with the PET from 1977 and including the C16, C64, C116, C128, CBM-II, Plus/4, and VIC-20.

Cherokee syllabary Writing system invented by Sequoyah to write the Cherokee language

The Cherokee syllabary is a syllabary invented by Sequoyah in the late 1810s and early 1820s to write the Cherokee language. His creation of the syllabary is particularly noteworthy as he was illiterate until the creation of his syllabary. He first experimented with logograms, but his system later developed into a syllabary. In his system, each symbol represents a syllable rather than a single phoneme; the 85 characters provide a suitable method for writing Cherokee. Although some symbols resemble Latin, Greek, Cyrillic, and Glagolitic letters, they are not used to represent the same sounds.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Small caps Lowercase characters that resemble uppercase letters except smaller in height

In typography, small caps are lowercase characters typeset with glyphs that resemble uppercase letters (capitals) but reduced in height and weight, close to the surrounding lowercase letters or text figures. This is technically not a case-transformation, but a substitution of glyphs, although the effect is often approximated by case-transformation and scaling. Small caps are used in running text as a form of emphasis that is less dominant than all uppercase text, and as a method of emphasis or distinctiveness for text alongside or instead of italics, or when boldface is inappropriate. For example, the text "Text in small caps" appears as Text in small caps in small caps. Small caps can be used to draw attention to the opening phrase or line of a new section of text, or to provide an additional style in a dictionary entry where many parts must be typographically differentiated.

In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script, see the list of languages by writing system. More or less complementary to scripts are symbols and Unicode control characters.

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.

The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the English alphabet.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

The Unicode Standard assigns various properties to each Unicode character and code point.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

Deseret is a Unicode block containing characters in the Deseret alphabet, which were invented by The Church of Jesus Christ of Latter-day Saints to write English. The Deseret block was derived from an earlier private use encoding in the ConScript Unicode Registry, like the Shavian and Phaistos Disc encodings. The block was added in version 3.1 of the Unicode Standard; the letters Oi and Ew, both uppercase and lowercase, were added in version 4.0.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Atari ST character set

The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC, and like that set includes ASCII codes 32–126, extended codes for accented letters (diacritics), and other symbols. It differs from code page 437 in using other dingbats at code points 0–31, in exchanging the box-drawing characters 176–223 for the Hebrew alphabet and other symbols, and exchanging code points 158, 236 and 254–255 with the symbols for sharp S, line integral, cubed and macron.

The GEM character set is the character set of Digital Research's graphical user interface GEM on Intel platforms. It is based on code page 437, the original character set of the IBM PC, and like that set includes ASCII codes 32–126, extended codes for accented letters (diacritics), and other symbols. It differs from code page 437 in using other dingbats at code points 0–31, in exchanging the box-drawing characters 176–223 for international characters and other symbols, and exchanging code point 236 with the symbol for line integral. However, GEM is more similar to code page 865 because the codepoints of Ø and ø match the codepoints in that codepage.

VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.

Georgian Extended is a Unicode block containing Georgian Mtavruli letters that function as uppercase versions of their Mkhedruli counterparts in the Georgian block. Unlike all other casing scripts in Unicode, there is no title casing between Mkhedruli and Mtavruli letters, because Mtavruli is typically used only in all-caps text, although there have been some historical attempts at capitalization.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. "The Unicode Standard Version 13.0 – Core Specification" (PDF). The Unicode Consortium. Retrieved 20 May 2021.