Telugu (Unicode block)

Last updated
Telugu
RangeU+0C00..U+0C7F
(128 code points)
Plane BMP
Scripts Telugu
Major alphabetsTelugu
Gondi
Lambadi
Assigned100 code points
Unused28 reserved code points
Source standards ISCII
Unicode version history
1.0.0 (1991)80 (+80)
5.1 (2008)93 (+13)
7.0 (2014)95 (+2)
8.0 (2015)96 (+1)
11.0 (2018)97 (+1)
12.0 (2019)98 (+1)
14.0 (2021)100 (+2)
Code chart
Note: [1] [2]

Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Telangana and Andhra Pradesh, India. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Contents

Block

Telugu [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+0C0x
U+0C1x
U+0C2x
U+0C3xి
U+0C4x
U+0C5x
U+0C6x
U+0C7x౿
Notes
1. ^ As of Unicode version 14.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Telugu block:

Version Final code points [lower-alpha 1] Count UTC  ID L2  ID WG2  IDDocument
1.0.0U+0C01..0C03, 0C05..0C0C, 0C0E..0C10, 0C12..0C28, 0C2A..0C33, 0C35..0C39, 0C3E..0C44, 0C46..0C48, 0C4A..0C4D, 0C55..0C56, 0C60..0C61, 0C66..0C6F80UTC/1991-056Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam
UTC/1991-057Whistler, Ken, Indic names list
UTC/1991-048B Whistler, Ken (1991-03-27), "III. L. Walk In proposals", Draft Minutes from the UTC meeting #46 day 2, 3/27 at Apple
L2/01-303 Vikas, Om (2001-07-26), Letter from the Government from India on "Draft for Unicode Standard for Indian Scripts"
L2/01-304 Feedback on Unicode Standard 3.0, 2001-08-02
L2/01-305 McGowan, Rick (2001-08-08), Draft UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
L2/01-430R McGowan, Rick (2001-11-20), UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
5.1U+0C3D, 0C58..0C59, 0C62..0C63, 0C78..0C7F13 L2/03-102 Vikas, Om (2003-03-04), Unicode Standard for Indic Scripts
L2/03-101.9 Proposed Changes in Indic Scripts [Telugu document], 2003-03-04
L2/06-250R [lower-alpha 2] N3116R Everson, Michael; Kolichala, Suresh; Venna, Nagarjuna (2006-08-02), Proposal to add eighteen characters for Telugu to the BMP
L2/06-267 Ganesan, Naga (2006-07-31), Comments on L2/06-250, Proposal to add eighteen characters for Telugu to the BMP of the UCS
L2/06-231 Moore, Lisa (2006-08-17), "C.12.1", UTC #108 Minutes
N3153 (pdf, doc)Umamaheswaran, V. S. (2007-02-16), "M49.2", Unconfirmed minutes of WG 2 meeting 49 AIST, Akihabara, Tokyo, Japan; 2006-09-25/29
L2/06-324R2 Moore, Lisa (2006-11-29), "Consensus 109-C1, C.6.1", UTC #109 Minutes
7.0U+0C001 L2/10-392R2 N3964 Sharma, Shriramana (2010-10-11), Request to encode South Indian CANDRABINDU-s
L2/10-440 Anderson, Deborah; McGowan, Rick; Whistler, Ken (2010-10-27), "5. South Indian Candrabindus", Review of Indic-related L2 documents and Recommendations to the UTC
L2/10-416R Moore, Lisa (2010-11-09), "South Indian candrabindu-s (D.8)", UTC #125 / L2 #222 Minutes
N4103 "11.2.4 South Indian CANDRABINDU-s", Unconfirmed minutes of WG 2 meeting 58, 2012-01-03
U+0C341 L2/12-015 N4214 Sharma, Shriramana; Kolichala, Suresh; Venna, Nagarjuna; Rajan, Vinodh (2012-01-17), Proposal to encode 0C34 TELUGU LETTER LLLA
L2/12-031 Anderson, Deborah; McGowan, Rick; Whistler, Ken (2012-01-27), "3. TELUGU LETTER LLLA", Review of Indic-related L2 documents and Recommendations to the UTC
L2/12-076 Govt. of Andhra Pradesh's inputs on document no. L2/12-015, 2012-02-01
L2/12-007 Moore, Lisa (2012-02-14), "D.2", UTC #130 / L2 #227 Minutes
N4253 (pdf, doc)"M59.16e", Unconfirmed minutes of WG 2 meeting 59, 2012-09-12
8.0U+0C5A1 L2/12-016 N4215 Sharma, Shriramana; Kolichala, Suresh; Venna, Nagarjuna; Rajan, Vinodh (2012-01-18), Proposal to encode 0C5A TELUGU LETTER RRRA
L2/12-031 Anderson, Deborah; McGowan, Rick; Whistler, Ken (2012-01-27), "4. TELUGU LETTER RRRA", Review of Indic-related L2 documents and Recommendations to the UTC
L2/13-028 Anderson, Deborah; McGowan, Rick; Whistler, Ken; Pournader, Roozbeh (2013-01-28), "20", Recommendations to UTC on Script Proposals
L2/13-121 Kolichala, Suresh (2013-05-09), Letter from Andhra Pradesh re Telugu RRRA
L2/13-058 Moore, Lisa (2013-06-12), "Consensus 135-C20", UTC #135 Minutes, Accept U+0C5A TELUGU LETTER RRRA for encoding in a future version of the standard, with properties as given in L2/12-016.
N4403 (pdf, doc)Umamaheswaran, V. S. (2014-01-28), "10.3.2 Telugu Letter RRRA", Unconfirmed minutes of WG 2 meeting 61, Holiday Inn, Vilnius, Lithuania; 2013-06-10/14
11.0U+0C041 L2/16-285 N4798 A, Srinidhi; A, Sridatta (2016-10-20), Proposal to encode the TELUGU SIGN COMBINING ANUSVARA ABOVE
L2/16-342 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu (2016-11-07), "3", Recommendations to UTC #149 November 2016 on Script Proposals
L2/16-325 Moore, Lisa (2016-11-18), "D.3", UTC #149 Minutes
12.0U+0C771 L2/17-218R N4860 A, Srinidhi; A, Sridatta (2017-07-17), Proposal to encode the TELUGU SIGN SIDDHAM
L2/17-255 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2017-07-28), "12. Telugu", Recommendations to UTC #152 July-August 2017 on Script Proposals
L2/17-222 Moore, Lisa (2017-08-11), "D.13", UTC #152 Minutes
N4953 (pdf, doc)"M66.16a", Unconfirmed minutes of WG 2 meeting 66, 2018-03-23
14.0U+0C3C1 L2/19-401 Rajan, Vinodh; Sharma, Shriramana; Kolichala, Suresh (2019-12-17), Proposal to Encode Telugu Sign Nukta
L2/19-405 A, Srinidhi; A, Sridatta (2019-12-29), Additional evidence for the use of Nukta sign in Telugu
L2/20-046 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2020-01-10), "8. Telugu", Recommendations to UTC #162 January 2020 on Script Proposals
L2/20-085 Rajan, Vinodh; Sharma, Shriramana; Kolichala, Suresh (2020-03-18), Revised Proposal to Encode Telugu Sign Nukta
L2/20-105 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "9a. Telugu Sign Nukta", Recommendations to UTC #163 April 2020 on Script Proposals
L2/20-102 Moore, Lisa (2020-05-06), "Consensus 163-C16", UTC #163 Minutes
U+0C5D1 L2/20-084R Rajan, Vinodh; Sharma, Shriramana; Kolichala, Suresh (2020-04-18), Revised Proposal to Encode Telugu Letter Nakaara Pollu
L2/20-105 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "9b. Telugu Letter Nakaara Pollu", Recommendations to UTC #163 April 2020 on Script Proposals
L2/20-102 Moore, Lisa (2020-05-06), "Consensus 163-C17", UTC #163 Minutes
  1. Proposed code points and characters names may differ from final code points and names
  2. See also L2/01-303, L2/01-304, L2/01-305, and L2/01-430R

Related Research Articles

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). Controls C1 (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

The rupee sign “” is a currency sign used to represent the monetary unit of account in Pakistan, Sri Lanka, Nepal, Mauritius, Seychelles, and formerly in India. It resembles, and is often written as, the Latin character sequence "Rs", of which it is an orthographic ligature.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.

Gurmukhi is a Unicode block containing characters for the Punjabi language, as it is written in India. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Oriya (Odia), Khondi, and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Oriya characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B02..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.

CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. Such encodings include the South Korean KS X 1001:1998, Taiwanese Big5, Japanese IBM 32, South Korean KS X 1001:2004, Japanese JIS X 0213, Japanese ARIB STD-B24 and the North Korean KPS 10721-2000 source standards.

Ideographic Description Characters is a Unicode block containing graphic characters used for describing CJK ideographs. They are used in Ideographic Description Sequences (IDS) to provide a description of an ideograph, in terms of what other ideographs make it up and how they are laid out relative to one another. An IDS provides the reader with a description of an ideograph that cannot be represented properly, usually because it is not encoded in Unicode; rendering systems are not intended to automatically compose the pieces into a complete ideograph, and the descriptions are not standardized.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.