Malayalam (Unicode block)

Last updated
Malayalam
RangeU+0D00..U+0D7F
(128 code points)
Plane BMP
Scripts Malayalam
Major alphabetsMalayalam
Assigned118 code points
Unused10 reserved code points
Source standards ISCII
Unicode version history
1.0.0 (1991)78 (+78)
5.1 (2008)95 (+17)
6.0 (2010)98 (+3)
7.0 (2014)99 (+1)
8.0 (2015)100 (+1)
9.0 (2016)114 (+14)
10.0 (2017)117 (+3)
13.0 (2020)118 (+1)
Code chart
Note: [1] [2]

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Contents

Block

Malayalam [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+0D0x
U+0D1x
U+0D2x
U+0D3xി
U+0D4x    
U+0D5x
U+0D6x
U+0D7xൿ
Notes
1. ^ As of Unicode version 14.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Malayalam block:

Related Research Articles

Malayalam script Brahmic script used commonly to write the Malayalam language

Malayalam script is a Brahmic script used commonly to write Malayalam, which is the principal language of Kerala, India, spoken by 45 million people in the world. It is a Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry by the Malayali people. It is one of 22 scheduled languages of India Malayalam script is also widely used for writing Sanskrit texts in Kerala.

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Syriac is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam.

Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.

Gurmukhi is a Unicode block containing characters for the Punjabi language, as it is written in India. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Oriya (Odia), Khondi, and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Oriya characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B02..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Telangana and Andhra Pradesh, India. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.

Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.

CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. Such encodings include the South Korean KS X 1001:1998, Taiwanese Big5, Japanese IBM 32, South Korean KS X 1001:2004, Japanese JIS X 0213, Japanese ARIB STD-B24 and the North Korean KPS 10721-2000 source standards.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.