Tibetan (Unicode block)

Last updated
Tibetan
RangeU+0F00..U+0FFF
(256 code points)
Plane BMP
Scripts Tibetan (207 char.)
Common (4 char.)
Major alphabetsTibetan
Dzongkha
Assigned211 code points
Unused45 reserved code points
2 deprecated
Unicode version history
2.0168 (+168)
3.0193 (+25)
4.1195 (+2)
5.1201 (+6)
5.2205 (+4)
6.0211 (+6)
Note: [1] [2]
When unifying with ISO 10646, the original Tibetan block was removed in Unicode 1.0.1. [3] The current block (with a new encoding model and a different range) was introduced in version 2.0.

Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of China, Bhutan, Nepal, Mongolia, northern India, eastern Pakistan and Russia.

Contents

Block

Tibetan [1] [2] [3]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+0F0x
 NB 
U+0F1x
U+0F2x
U+0F3x༿
U+0F4x
U+0F5x
U+0F6x
U+0F7xཿ
U+0F8x
U+0F9x
U+0FAx
U+0FBx྿
U+0FCx
U+0FDx
U+0FEx
U+0FFx
Notes
1. ^ As of Unicode version 13.0
2. ^ Grey areas indicate non-assigned code points
3. ^ Unicode code points U+0F77 and U+0F79 are deprecated in Unicode 5.2 and later

Former Tibetan block

Tibetan (Unicode 1.0.0)
RangeU+1000..U+104F
(80 code points)
Plane BMP
Scripts Tibetan
Major alphabetsTibetan
Dzongkha
Status Deleted prior to the release of Unicode 2.0
Now occupied by Myanmar
Unicode version history
1.0.071 (+71)
1.0.10 (-71)
Note: When unifying with ISO 10646, the original Tibetan block was deleted in Unicode 1.0.1. [3] Tibetan was later reintroduced with a new encoding model for Unicode 2.0.

The Tibetan Unicode block is unique for having been allocated in version 1.0.0 with a virama-based encoding that was unable to distinguish visible srog med and conjunct consonant correctly. [note 1] This encoding was removed from the Unicode Standard in version 1.0.1 in the process of unifying with ISO 10646 for version 1.1, [3] then reintroduced as an explicit root/subjoined encoding, with a larger block size, in version 2.0. Moving or removing existing characters has been prohibited by the Unicode Stability Policy for all versions following Unicode 2.0, so the Tibetan characters encoded in Unicode 2.0 and all subsequent versions are immutable.

  1. In most Unicode Indic encodings, although one can force the system to display a visible halanta by using the ZWS symbol, there's no method to force a conjunct consonant rendering, which is crucial when writing Tibetan.

The range of the former Unicode 1.0.0 Tibetan block has been occupied by the Myanmar block since Unicode 3.0. In Microsoft Windows, collation data referring to the old Tibetan block was retained as late as Windows XP, and removed in Windows 2003. [4]

Tibetan (Unicode 1.0.0) [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+100x
U+101x
U+102xཿ
U+103x
U+104x
Notes
1. ^ As of Unicode version 1.0.0. Characters are shown by means of corresponding code points in Unicode 2.0 and all subsequent versions.
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Tibetan block:

Related Research Articles

Devanagari Writing script for many Indian and Nepalese languages

Devanagari, also called Nagari, is a left-to-right abugida (alphasyllabary), based on the ancient Brāhmī script, used in the Indian subcontinent. It was developed in ancient India from the 1st to the 4th century CE and was in regular use by the 7th century CE. The Devanagari script, composed of 47 primary characters including 14 vowels and 33 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

Unicode Character encoding standard

Unicode is an information technology (IT) standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard is maintained by the Unicode Consortium, and as of March 2020, there is a repertoire of 143,859 characters, with Unicode 13.0 covering 154 modern and historic scripts, as well as multiple symbol sets and emoji. The character repertoire of the Unicode Standard is synchronized with ISO/IEC 10646, and both are code-for-code identical.

Malayalam script Brahmic script used commonly to write the Malayalam language

Malayalam script is a Brahmic script used commonly to write the Malayalam language, which is the principal language of Kerala, India, spoken by 45 million people in the world. Malayalam script is also widely used for writing Sanskrit texts in Kerala. Like many other Indic scripts, it is an alphasyllabary (abugida), a writing system that is partially “alphabetic” and partially syllable-based. The modern Malayalam alphabet has 15 vowel letters, 42 consonant letters, and a few other symbols. The Malayalam script is a Vatteluttu alphabet extended with symbols from the Grantha alphabet to represent Indo-Aryan loanwords. The script is also used to write several minority languages such as Paniya, Betta Kurumba, and Ravula. The Malayalam language itself was historically written in several different scripts.

Tibetan script abugida used to write the Tibetic languages and others

The Tibetan script is an abugida of Indic origin used to write certain Tibetic languages, including Tibetan, Dzongkha, Sikkimese, Ladakhi, Jirel and sometimes Balti. It has also been used for some non-Tibetic languages in close cultural contact with Tibet, such as Thakali. The printed form is called uchen script while the hand-written cursive form used in everyday writing is called umê script.

Soyombo script Mongolian Abugida

The Soyombo script is an abugida developed by the monk and scholar Zanabazar in 1686 to write Mongolian. It can also be used to write Tibetan and Sanskrit.

Tamil script The abugida script used for the Tamil language

The Tamil script is an abugida script that is used by Tamils and Tamil speakers in India, Sri Lanka, Malaysia, Singapore, Indonesia and elsewhere to write the Tamil language. Certain minority languages such as Saurashtra, Badaga, Irula and Paniya are also written in the Tamil script.

GB 18030 Unicode character encoding mostly used for Simplified Chinese

GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB2312, CP936, and GBK 1.0.

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Assamese, Bengal (Bangla), Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (U+E000U+F8FF), and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Lepcha script

The Lepcha script, or Róng script, is an abugida used by the Lepcha people to write the Lepcha language. Unusually for an abugida, syllable-final consonants are written as diacritics.

Specials is a short Unicode block allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Chakma script alphabet

The Chakma Script, also called Ojhapath, Ojhopath, Aaojhapath, is an abugida used for the Chakma language.

Siddhaṃ, also known in its later evolved form as Siddhamātṛkā, is a medieval Brahmic abugida, derived from the Gupta script and ancestral to the Assamese alphabets, Bengali alphabet and Maithili alphabet.

Modi script historical script used in the Maratha Empire

Modi is a script used to write the Marathi language, which is the primary language spoken in the state of Maharashtra, India. There are multiple theories concerning its origin. The Modi script was used alongside the Devanagari script to write Marathi until the 20th century when the Balbodh style of the Devanagari script was promoted as the standard writing system for Marathi.

Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables can be directly mapped by algorithm to sequences of two or three characters in the Hangul Jamo Unicode block:

Oriya is a Unicode block containing characters for the Oriya (Odia), Khondi, and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Oriya characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tai Tham is a Unicode block containing characters of the Lanna script used for writing the Northern Thai, Tai Lü, and Khün languages.

Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for Tamil language.

Marchen is a Unicode block containing characters from the Marchen alphabet, which has been used to write the extinct Zhang-Zhung language of the Zhang-zhung culture of Tibet. In modern Bon usage, Marchen is also used to write Tibetan.

Newa is a Unicode block containing characters from the Newa alphabet, which is used to write Nepal Bhasa.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. 1 2 3 "Unicode 1.0.1 Addendum" (PDF). The Unicode Standard. 1992-11-03. Retrieved 2016-07-09.
  4. Kaplan, Michael (2007-08-28). "Every character has a story #29: U+1000^H^H^H^H0f40, (TIBETAN or MYANMAR LETTER KA, depending on when you ask)". Sorting it all out.