Telugu | |
---|---|
Range | U+0C00..U+0C7F (128 code points) |
Plane | BMP |
Scripts | Telugu |
Major alphabets | Telugu Gondi Lambadi |
Assigned | 100 code points |
Unused | 28 reserved code points |
Source standards | ISCII |
Unicode version history | |
1.0.0 (1991) | 80 (+80) |
5.1 (2008) | 93 (+13) |
7.0 (2014) | 95 (+2) |
8.0 (2015) | 96 (+1) |
11.0 (2018) | 97 (+1) |
12.0 (2019) | 98 (+1) |
14.0 (2021) | 100 (+2) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1] [2] |
Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Indian states of Andhra Pradesh and Telangana. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Telugu [1] [2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+0C0x | ఀ | ఁ | ం | ః | ఄ | అ | ఆ | ఇ | ఈ | ఉ | ఊ | ఋ | ఌ | ఎ | ఏ | |
U+0C1x | ఐ | ఒ | ఓ | ఔ | క | ఖ | గ | ఘ | ఙ | చ | ఛ | జ | ఝ | ఞ | ట | |
U+0C2x | ఠ | డ | ఢ | ణ | త | థ | ద | ధ | న | ప | ఫ | బ | భ | మ | య | |
U+0C3x | ర | ఱ | ల | ళ | ఴ | వ | శ | ష | స | హ | ఼ | ఽ | ా | ి | ||
U+0C4x | ీ | ు | ూ | ృ | ౄ | ె | ే | ై | ొ | ో | ౌ | ్ | ||||
U+0C5x | ౕ | ౖ | ౘ | ౙ | ౚ | ౝ | ||||||||||
U+0C6x | ౠ | ౡ | ౢ | ౣ | ౦ | ౧ | ౨ | ౩ | ౪ | ౫ | ౬ | ౭ | ౮ | ౯ | ||
U+0C7x | ౷ | ౸ | ౹ | ౺ | ౻ | ౼ | ౽ | ౾ | ౿ | |||||||
Notes |
The following Unicode-related documents record the purpose and process of defining specific characters in the Telugu block:
Version | Final code points [lower-alpha 1] | Count | UTC ID | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|---|
1.0.0 | U+0C01..0C03, 0C05..0C0C, 0C0E..0C10, 0C12..0C28, 0C2A..0C33, 0C35..0C39, 0C3E..0C44, 0C46..0C48, 0C4A..0C4D, 0C55..0C56, 0C60..0C61, 0C66..0C6F | 80 | UTC/1991-056 | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam | ||
UTC/1991-057 | Whistler, Ken, Indic names list | |||||
UTC/1991-048B | Whistler, Ken (1991-03-27), "III. L. Walk In proposals", Draft Minutes from the UTC meeting #46 day 2, 3/27 at Apple | |||||
L2/01-303 | Vikas, Om (2001-07-26), Letter from the Government from India on "Draft for Unicode Standard for Indian Scripts" | |||||
L2/01-304 | Feedback on Unicode Standard 3.0, 2001-08-02 | |||||
L2/01-305 | McGowan, Rick (2001-08-08), Draft UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0" | |||||
L2/01-430R | McGowan, Rick (2001-11-20), UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0" | |||||
5.1 | U+0C3D, 0C58..0C59, 0C62..0C63, 0C78..0C7F | 13 | L2/03-102 | Vikas, Om (2003-03-04), Unicode Standard for Indic Scripts | ||
L2/03-101.9 | Proposed Changes in Indic Scripts [Telugu document], 2003-03-04 | |||||
L2/06-250R [lower-alpha 2] | N3116R | Everson, Michael; Kolichala, Suresh; Venna, Nagarjuna (2006-08-02), Proposal to add eighteen characters for Telugu to the BMP | ||||
L2/06-267 | Ganesan, Naga (2006-07-31), Comments on L2/06-250, Proposal to add eighteen characters for Telugu to the BMP of the UCS | |||||
L2/06-231 | Moore, Lisa (2006-08-17), "C.12.1", UTC #108 Minutes | |||||
N3153 (pdf, doc) | Umamaheswaran, V. S. (2007-02-16), "M49.2", Unconfirmed minutes of WG 2 meeting 49 AIST, Akihabara, Tokyo, Japan; 2006-09-25/29 | |||||
L2/06-324R2 | Moore, Lisa (2006-11-29), "Consensus 109-C1, C.6.1", UTC #109 Minutes | |||||
7.0 | U+0C00 | 1 | L2/10-392R2 | N3964 | Sharma, Shriramana (2010-10-11), Request to encode South Indian CANDRABINDU-s | |
L2/10-440 | Anderson, Deborah; McGowan, Rick; Whistler, Ken (2010-10-27), "5. South Indian Candrabindus", Review of Indic-related L2 documents and Recommendations to the UTC | |||||
L2/10-416R | Moore, Lisa (2010-11-09), "South Indian candrabindu-s (D.8)", UTC #125 / L2 #222 Minutes | |||||
N4103 | "11.2.4 South Indian CANDRABINDU-s", Unconfirmed minutes of WG 2 meeting 58, 2012-01-03 | |||||
U+0C34 | 1 | L2/12-015 | N4214 | Sharma, Shriramana; Kolichala, Suresh; Venna, Nagarjuna; Rajan, Vinodh (2012-01-17), Proposal to encode 0C34 TELUGU LETTER LLLA | ||
L2/12-031 | Anderson, Deborah; McGowan, Rick; Whistler, Ken (2012-01-27), "3. TELUGU LETTER LLLA", Review of Indic-related L2 documents and Recommendations to the UTC | |||||
L2/12-076 | Govt. of Andhra Pradesh's inputs on document no. L2/12-015, 2012-02-01 | |||||
L2/12-007 | Moore, Lisa (2012-02-14), "D.2", UTC #130 / L2 #227 Minutes | |||||
N4253 (pdf, doc) | "M59.16e", Unconfirmed minutes of WG 2 meeting 59, 2012-09-12 | |||||
8.0 | U+0C5A | 1 | L2/12-016 | N4215 | Sharma, Shriramana; Kolichala, Suresh; Venna, Nagarjuna; Rajan, Vinodh (2012-01-18), Proposal to encode 0C5A TELUGU LETTER RRRA | |
L2/12-031 | Anderson, Deborah; McGowan, Rick; Whistler, Ken (2012-01-27), "4. TELUGU LETTER RRRA", Review of Indic-related L2 documents and Recommendations to the UTC | |||||
L2/13-028 | Anderson, Deborah; McGowan, Rick; Whistler, Ken; Pournader, Roozbeh (2013-01-28), "20", Recommendations to UTC on Script Proposals | |||||
L2/13-121 | Kolichala, Suresh (2013-05-09), Letter from Andhra Pradesh re Telugu RRRA | |||||
L2/13-058 | Moore, Lisa (2013-06-12), "Consensus 135-C20", UTC #135 Minutes, Accept U+0C5A TELUGU LETTER RRRA for encoding in a future version of the standard, with properties as given in L2/12-016. | |||||
N4403 (pdf, doc) | Umamaheswaran, V. S. (2014-01-28), "10.3.2 Telugu Letter RRRA", Unconfirmed minutes of WG 2 meeting 61, Holiday Inn, Vilnius, Lithuania; 2013-06-10/14 | |||||
11.0 | U+0C04 | 1 | L2/16-285 | N4798 | A, Srinidhi; A, Sridatta (2016-10-20), Proposal to encode the TELUGU SIGN COMBINING ANUSVARA ABOVE | |
L2/16-342 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu (2016-11-07), "3", Recommendations to UTC #149 November 2016 on Script Proposals | |||||
L2/16-325 | Moore, Lisa (2016-11-18), "D.3", UTC #149 Minutes | |||||
12.0 | U+0C77 | 1 | L2/17-218R | N4860 | A, Srinidhi; A, Sridatta (2017-07-17), Proposal to encode the TELUGU SIGN SIDDHAM | |
L2/17-255 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2017-07-28), "12. Telugu", Recommendations to UTC #152 July-August 2017 on Script Proposals | |||||
L2/17-222 | Moore, Lisa (2017-08-11), "D.13", UTC #152 Minutes | |||||
N4953 (pdf, doc) | "M66.16a", Unconfirmed minutes of WG 2 meeting 66, 2018-03-23 | |||||
14.0 | U+0C3C | 1 | L2/19-401 | Rajan, Vinodh; Sharma, Shriramana; Kolichala, Suresh (2019-12-17), Proposal to Encode Telugu Sign Nukta | ||
L2/19-405 | A, Srinidhi; A, Sridatta (2019-12-29), Additional evidence for the use of Nukta sign in Telugu | |||||
L2/20-046 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2020-01-10), "8. Telugu", Recommendations to UTC #162 January 2020 on Script Proposals | |||||
L2/20-085 | Rajan, Vinodh; Sharma, Shriramana; Kolichala, Suresh (2020-03-18), Revised Proposal to Encode Telugu Sign Nukta | |||||
L2/20-105 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "9a. Telugu Sign Nukta", Recommendations to UTC #163 April 2020 on Script Proposals | |||||
L2/20-102 | Moore, Lisa (2020-05-06), "Consensus 163-C16", UTC #163 Minutes | |||||
U+0C5D | 1 | L2/20-084R | Rajan, Vinodh; Sharma, Shriramana; Kolichala, Suresh (2020-04-18), Revised Proposal to Encode Telugu Letter Nakaara Pollu | |||
L2/20-105 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "9b. Telugu Letter Nakaara Pollu", Recommendations to UTC #163 April 2020 on Script Proposals | |||||
L2/20-102 | Moore, Lisa (2020-05-06), "Consensus 163-C17", UTC #163 Minutes | |||||
Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.
In Indic scripts, the daṇḍa is a punctuation mark. The glyph consists of a single vertical stroke.
Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.
Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:
Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.
CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.
Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.
Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.
Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.
Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.
Hiragana is a Unicode block containing hiragana characters for the Japanese language.
Ideographic Description Characters is a Unicode block containing graphic characters used for describing CJK ideographs. They are used in Ideographic Description Sequences (IDS) to provide a description of an ideograph, in terms of what other ideographs make it up and how they are laid out relative to one another. An IDS provides the reader with a description of an ideograph that cannot be represented properly, usually because it is not encoded in Unicode; rendering systems are not intended to automatically compose the pieces into a complete ideograph, and the descriptions are not standardized.
Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.
Tamil All Character Encoding (TACE16) is a scheme for encoding the Tamil script in the Private Use Area of Unicode, implementing a syllabary-based character model differing from the modified-ISCII model used by Unicode's existing Tamil implementation.
Symbols for Legacy Computing is a Unicode block containing graphic characters that were used for various home computers from the 1970s and 1980s and in Teletext broadcasting standards. It includes characters from the Amstrad CPC, MSX, Mattel Aquarius, RISC OS, MouseText, Atari ST, TRS-80 Color Computer, Oric, Texas Instruments TI-99/4A, TRS-80, Minitel, Teletext, ATASCII, PETSCII, ZX80, and ZX81 character sets. Semigraphics characters are also included in the form of new block-shaped characters, line-drawing characters, and 60 "sextant" characters.