Tibetan (Unicode block)

Last updated
Tibetan
RangeU+0F00..U+0FFF
(256 code points)
Plane BMP
Scripts Tibetan (207 char.)
Common (4 char.)
Major alphabetsTibetan
Dzongkha
Assigned211 code points
Unused45 reserved code points
2 deprecated
Unicode version history
1.0.071 (+71)
1.10 (-71)
2.0168 (+168)
3.0193 (+25)
4.1195 (+2)
5.1201 (+6)
5.2205 (+4)
6.0211 (+6)
Note: When unifying with ISO 10646, the Tibetan block was removed with version 1.1, then reintroduced with a new encoding model for version 2.0. [1] [2]

Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of China, Bhutan, Nepal, Mongolia, northern India, eastern Pakistan and Russia. The Tibetan Unicode block is unique for having been allocated as a fallacious virama-based encoding that were unable to distinguish visible srog med and conjunct consonant correctly [note 1] for version 1.0, removed from the Unicode Standard when unifying with ISO 10646 for version 1.1, then reintroduced as an explicit root/subjoined encoding, with a larger block size in version 2.0.

Contents

  1. In most Unicode Indic encodings, although one can force the system to display a visible halanta by using the ZWS symbol, there's no method to force a conjunct consonant rendering, which is crucial when writing Tibetan.

Block

Tibetan [1] [2] [3]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+0F0x
 NB 
U+0F1x
U+0F2x
U+0F3x༿
U+0F4x
U+0F5x
U+0F6x
U+0F7xཿ
U+0F8x
U+0F9x
U+0FAx
U+0FBx྿
U+0FCx
U+0FDx
U+0FEx
U+0FFx
Notes
1. ^ As of Unicode version 13.0
2. ^ Grey areas indicate non-assigned code points
3. ^ Unicode code points U+0F77 and U+0F79 are deprecated in Unicode 5.2 and later

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Tibetan block:

Related Research Articles

Unicode Character encoding standard

Unicode is an information technology (IT) standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020, there is a repertoire of 143,859 characters, with Unicode 13.0 covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.

Soyombo script Mongolian Abugida

The Soyombo script is an abugida developed by the monk and scholar Zanabazar in 1686 to write Mongolian. It can also be used to write Tibetan and Sanskrit.

Tamil script The abugida script used for the Tamil language

The Tamil script is an abugida script that is used by Tamils and Tamil speakers in India, Sri Lanka, Malaysia, Singapore, Indonesia and elsewhere to write the Tamil language, using consonants and diacritics not represented in the Tamil alphabet. Certain minority languages such as Saurashtra, Badaga, Irula and Paniya are also written in the Tamil script.

GB 18030 Unicode character encoding mostly used for Simplified Chinese

GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB2312, CP936, and GBK 1.0.

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Assamese, Bengali (Bangla), Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

The Limbu script is used to write the Limbu language. It is a Brahmic type abugida.

Mahajani is a Laṇḍā mercantile script that was historically used in northern India for writing accounts and financial records in Marwari, Hindi and Punjabi. It is a Brahmic script and is written left-to-right. Mahajani refers to the Hindi word for 'bankers', also known as 'sarrafi' or 'kothival' (merchant).

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (U+E000U+F8FF), and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Meitei script Writing system used to write Meetei language

The Meitei script or Meetei Mayek, is an abugida used for the Meitei language, one of the official languages of the Indian state of Manipur. It was used until the 18th century, when it was replaced by the Bengali alphabet. A few manuscripts survive. In the 20th century, the script experienced a resurgence, and is now used again.

Lepcha script

The Lepcha script, or Róng script, is an abugida used by the Lepcha people to write the Lepcha language. Unusually for an abugida, syllable-final consonants are written as diacritics.

Specials is a short Unicode block allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Siddhaṃ, also known in its later evolved form as Siddhamātṛkā, is a medieval Brahmic abugida, derived from the Gupta script and ancestral to the Assamese alphabets, Bengali alphabet and Maithili alphabet.

Modi script historical script used in the Maratha Empire

Modi is a script used to write the Marathi language, which is the primary language spoken in the state of Maharashtra, India. There are at least two different theories concerning its origin. Modi was an official script used to write Marathi until the 20th century when the Balbodh style of the Devanagari script was promoted as the standard writing system for Marathi.

IPA Extensions is a block (0250–02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks.

Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables can be directly mapped by algorithm to sequences of 2 or 3 characters in the Hangul Jamo Unicode block:

Oriya is a Unicode block containing characters for the Oriya (Odia), Khondi, and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Oriya characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for Tamil language.

Newa is a Unicode block containing characters from the Newa alphabet, which is used to write Nepal Bhasa.

Horizontal square script abugida

The horizontal square script is an abugida developed by the monk and scholar Zanabazar to write Mongolian. It can also be used to write Tibetan and Sanskrit.

Wancho script is an alphabet created between 2001 and 2012 by middle school teacher Banwang Losu in Longding district, Arunachal Pradesh for writing the Wancho language. Letters represent consonants and vowels. Conjunct consonants are not used. Tone is indicated with diacritical marks on vowel letters.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.