Lisu (Unicode block)

Last updated
Lisu
RangeU+A4D0..U+A4FF
(48 code points)
Plane BMP
Scripts Lisu
Major alphabetsFraser Lisu
Assigned48 code points
Unused0 reserved code points
Unicode version history
5.2 (2009)48 (+48)
Note: [1] [2]

Lisu is a Unicode block containing characters of the Fraser alphabet, which is used to write the Lisu language. This alphabet (and by extension the block) consists of glyphs resembling capital letters in the basic Latin alphabet in their standard form and horizontally or vertically mirrored.

Contents

The addition of the block was subject to significant debate as to whether allocating a new block was necessary for the alphabet or if the turned letters not already in Unicode could instead be added to an existing block for the Latin script. However, since the Lisu letters only visually resemble their Latin counterparts and are semantically different, the former approach was ultimately taken.

This block is supported by a few fonts including Noto Sans Lisu, Lisu Unicode, DejaVu Sans, Horta, Montagel, Quivira, Segoe UI (since Windows 8), and Highway Gothic (Wide, version 2.0.3).

In Unicode 13.0, a new block was also assigned for a single supplementary Lisu character used for the Naxi language, Lisu Supplement.

Lisu [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+A4Dx
U+A4Ex
U+A4Fx
Notes
1. ^ As of Unicode version 14.0

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Lisu block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
5.2U+A4D0..A4FF48 N3353 (pdf, doc)Umamaheswaran, V. S. (2007-10-10), "M51.30", Unconfirmed minutes of WG 2 meeting 51 Hanzhou, China; 2007-04-24/27
L2/07-297 N3323 Everson, Michael (2007-09-11), Towards an encoding of the Fraser script in the UCS
L2/07-294 N3326 Cook, Richard (2007-09-15), Fraser's Lisu orthography
L2/07-357 N3317R2 Proposal for encoding the Old Lisu script in the BMP of the UCS, 2007-10-10
L2/07-423 Documentation on legacy encodings of the Old Lisu script, 2007-12-29
L2/08-019 N3424 Cheuk, Adrian (2008-01-28), Proposal for encoding the Old Lisu script in the BMP of the UCS
L2/08-003 Moore, Lisa (2008-02-14), "Lisu", UTC #114 Minutes
L2/08-318 N3453 (pdf, doc)Umamaheswaran, V. S. (2008-08-13), "M52.10", Unconfirmed minutes of WG 2 meeting 52
L2/09-247 Hosken, Martin (2009-07-10), Discussion and proposal of default Lisu sort order
  1. Proposed code points and characters names may differ from final code points and names

See also

Related Research Articles

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic. It was designed to cover languages using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian, Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933 to 1990, but it is missing the Ukrainian letter ge, ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine. As a result, IBM created Code page 1124.

Fraser script Alphabetic writing system

The Fraser or Old Lisu script, is an artificial script invented around 1915 by Sara Ba Thaw, a Karen preacher from Myanmar and improved by the missionary James O. Fraser, to write the Lisu language. It is a single-case (unicameral) alphabet. It was also used for the Naxi language, e.g. the 1932 Naxi Gospel of Mark and used in the Zaiwa or Atsi language e.g. the 1938 Atsi Gospel of Mark.

Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin and Greek letters and decimal digits that enable mathematicians to denote different notions with different letter styles. The letters in various fonts often have specific, fixed meanings in particular areas of mathematics. By providing uniformity over numerous mathematical articles and books, these conventions help to read mathematical formulas.

The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike. For example, a regular user of example.com may be lured to click a link where the Latin character "a" is replaced with the Cyrillic character "а".

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista). Latin Extended-F contains characters for phonetic transcription.

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). Controls C1 (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Tai Tham script

The Tai Tham script, also historically known simply as Tua Tham or 'dharma letters', also known as Lanna script or Tua Mueang, is a writing system used for Northern Thai, Tai Lü, and Khün, all three belonging to the group of Southwestern Tai languages. In addition, the Lanna script is used for Lao Tham and other dialect variants in Buddhist palm-leaf manuscripts and notebooks. The script is also known as Tham or Yuan script.

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.

IPA Extensions is a block (0250–02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the English alphabet.

Latin Extended Additional is a Unicode block.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Syriac is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam.

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block, added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

The Pau Cin Hau scripts, known as Pau Cin Hau lai, or tual lai in Zomi, are two scripts, a logographic script and an alphabetic script created by Pau Cin Hau, a Zomi religious leader from Chin State, Burma. The logographic script consists of 1,050 characters, which is a traditionally significant number based on the number of characters appearing in a religious text. The alphabetic script is a simplified script of 57 characters, which is divided into 21 consonants, 7 vowels, 9 final consonants, and 20 tone, length, and glottal marks. The original script was produced in 1902, but it is thought to have undergone at least two revisions, of which the first revision produced the logographic script.

Lisu Supplement is a Unicode block containing supplementary characters of the Fraser alphabet, which is used to write the Lisu language. This is a supplement to the main Lisu block, with currently only a single character used for the Naxi language assigned to it.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.