Tamil (Unicode block)

Tamil
Tamil
Range	U+0B80..U+0BFF; (128 code points)
Plane	BMP
Scripts	Tamil
Major alphabets	Tamil; Saurashtra
Assigned	72 code points
Unused	56 reserved code points
Source standards	ISCII
Unicode version history
1.0.0 (1991)	61 (+61)
4.0 (2003)	69 (+8)
4.1 (2005)	71 (+2)
5.1 (2008)	72 (+1)
Unicode documentation
	Code chart ∣ Web page
	Note:

Last updated July 27, 2024

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Block

Tamil ^[1]^[2] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+0B8x			ஂ	ஃ		அ	ஆ	இ	ஈ	உ	ஊ				எ	ஏ
U+0B9x	ஐ		ஒ	ஓ	ஔ	க				ங	ச		ஜ		ஞ	ட
U+0BAx				ண	த				ந	ன	ப				ம	ய
U+0BBx	ர	ற	ல	ள	ழ	வ	ஶ	ஷ	ஸ	ஹ					ா	ி
U+0BCx	ீ	ு	ூ				ெ	ே	ை		ொ	ோ	ௌ	்
U+0BDx	ௐ							ௗ
U+0BEx							௦	௧	௨	௩	௪	௫	௬	௭	௮	௯
U+0BFx	௰	௱	௲	௳	௴	௵	௶	௷	௸	௹	௺
Notes 1. ^ As of Unicode version 15.1 2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Tamil block:

Version	Final code points^{[lower-alpha 1]}	Count	UTC ID	L2 ID	WG2 ID	Document
1.0.0	U+0B82..0B83, 0B85..0B8A, 0B8E..0B90, 0B92..0B95, 0B99..0B9A, 0B9C, 0B9E..0B9F, 0BA3..0BA4, 0BA8..0BAA, 0BAE..0BB5, 0BB7..0BB9, 0BBE..0BC2, 0BC6..0BC8, 0BCA..0BCD, 0BD7, 0BE7..0BF2	61	UTC/1991-056			Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam
			UTC/1991-057			Whistler, Ken, Indic names list
			UTC/1991-048B			Whistler, Ken (1991-03-27), "III. L. Walk In proposals", Draft Minutes from the UTC meeting #46 day 2, 3/27 at Apple
				L2/01-303		Vikas, Om (2001-07-26), Letter from the Government from India on "Draft for Unicode Standard for Indian Scripts"
				L2/01-304		Feedback on Unicode Standard 3.0, 2001-08-02
				L2/01-305		McGowan, Rick (2001-08-08), Draft UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
				L2/01-295R		Moore, Lisa (2001-11-06), "Motion 88-M9", Minutes from the UTC/L2 meeting #88, The UTC approves changing the general category type of U+0B83 TAMIL SIGN VISARGA from Mn to Lo.
				L2/01-430R		McGowan, Rick (2001-11-20), UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
				L2/05-185		Documentation for KSSA as Non-conjunct Consonant and Conjunct Consonant in Tamil, 2005-07-25
				L2/05-180		Moore, Lisa (2005-08-17), "Tamil (C.12.1), KSSA in Tamil (C.12)", UTC #104 Minutes
				L2/10-108		Moore, Lisa (2010-05-19), "B.10.15 [U+0B83, U+0BC1, U+0BC2]", UTC #123 / L2 #220 Minutes
4.0	U+0BF3..0BFA	8		L2/01-375R	N2381R	Umamaheswaran, V. S. (2001-10-11), Proposal to add eight Tamil symbols
				L2/01-420		Whistler, Ken (2001-10-30), "d. Tamil sign additions", WG2 (Singapore) Resolution Consent Docket for UTC
				L2/01-405R		Moore, Lisa (2001-12-12), "Consensus 89-C23", Minutes from the UTC/L2 meeting in Mountain View, November 6-9, 2001
				L2/02-112	N2421	Umamaheswaran, V. S. (2002-03-15), Feedback on Tamil Symbols in PDAM2-10646-1 from the INFITT WG on Unicode
				L2/02-154	N2403	Umamaheswaran, V. S. (2002-04-22), "7.11", Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore, 2001-10-15/19
				L2/12-106		Sharma, Shriramana (2012-03-17), "2. Tamil", Request for editorial updates to various Indic scripts
				L2/12-147		Anderson, Deborah; McGowan, Rick; Whistler, Ken (2012-04-25), "II. TAMIL", Review of Indic-related L2 documents and Recommendations to the UTC
				L2/12-150		Ganesan, Naga (2012-05-01), Tamil credit sign (U+0BF7) glyph shape from Printed Books
				L2/12-180		Manivannan, Mani (2012-05-05), Review of Indic-related L2 documents and Recommendations
				L2/12-384		Ganesan, Naga (2012-06-11), Comments on Tamil fractions and Tamil credit sign
				L2/13-028		Anderson, Deborah; McGowan, Rick; Whistler, Ken; Pournader, Roozbeh (2013-01-28), "19.2", Recommendations to UTC on Script Proposals
					N4480	Sharma, Shriramana (2013-09-06), Request to change two glyphs of existing Tamil symbols
				L2/17-424		A, Srinidhi; A, Sridatta (2017-12-08), Changes to ScriptExtensions.txt for Indic characters for Unicode 11.0
				L2/18-039		Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Cook, Richard (2018-01-19), "ScriptExtensions.txt changes for Indic", Recommendations to UTC #154 January 2018 on Script Proposals
				L2/18-007		Moore, Lisa (2018-03-19), "Action item 154-A120", UTC #154 Minutes, Make script extension changes in version 11.0 as documented in section 6B, pages 6-9 of L2/18-039.
4.1	U+0BB6	1		L2/03-273		Proposal to add Tamil grantha character SHA, 2003-07-29
				L2/03-278		Bhaskararao, Peri (2003-07-29), Review of a Proposal placed before UTC bearing No. L2/03-273, Proposal to encode Tamil SHA
					N2618	Bhaskararao, Peri (2003-09-14), Review of a Proposal placed before Unicode Technical Committee entitled 'Proposal to add Tamil Letter SHA' (L2/03-273)
	U+0BE6	1		L2/04-073	N2741	Kaplan, Michael (2004-02-01), Proposal to add Tamil Digit Zero
5.1	U+0BD0	1		L2/06-184	N3119	Proposal to add Tamil Om, 2006-04-28
				L2/06-108		Moore, Lisa (2006-05-25), "C.23", UTC #107 Minutes
					N3153 (pdf, doc)	Umamaheswaran, V. S. (2007-02-16), "M49.5c", Unconfirmed minutes of WG 2 meeting 49 AIST, Akihabara, Tokyo, Japan; 2006-09-25/29
↑ Proposed code points and characters names may differ from final code points and names

Related Research Articles

The Tamil script is an abugida script that is used by Tamils and Tamil speakers in India, Sri Lanka, Malaysia, Singapore, Indonesia and elsewhere to write the Tamil language. It is one of the official scripts of the Indian Republic. Certain minority languages such as Saurashtra, Badaga, Irula and Paniya are also written in the Tamil script.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the Tamil script. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the IANA in 2007.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

The rupee sign "₨" is a currency sign used to represent the monetary unit of account in Pakistan, Sri Lanka, Nepal, Mauritius, Seychelles, and formerly in India. It resembles, and is often written as, the Latin character sequence "Rs", of which it is an orthographic ligature.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.

Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Indian states of Andhra Pradesh and Telangana. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Tamil All Character Encoding (TACE16) is a scheme for encoding the Tamil script in the Private Use Area of Unicode, implementing a syllabary-based character model differing from the modified-ISCII model used by Unicode's existing Tamil implementation.

References

↑ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[final-3] Proposed code points and characters names may differ from final code points and names

[1] "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.

[2] "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

[1]

[2]

[lower-alpha 1]

Tamil (Unicode block)

Contents

Block

History

See also

Related Research Articles

References

Tamil
Range	U+0B80..U+0BFF (128 code points)
Plane	BMP
Scripts	Tamil
Major alphabets	Tamil Saurashtra
Assigned	72 code points
Unused	56 reserved code points
Source standards	ISCII
Unicode version history

1.0.0 (1991)	61 (+61)
4.0 (2003)	69 (+8)
4.1 (2005)	71 (+2)
5.1 (2008)	72 (+1)

Unicode documentation
Code chart ∣ Web page
Note: ^[1]^[2]