Sinhala (Unicode block)

Sinhala
Sinhala
Range	U+0D80..U+0DFF; (128 code points)
Plane	BMP
Scripts	Sinhala
Major alphabets	Sinhala; Pali; Sanskrit
Assigned	91 code points
Unused	37 reserved code points
Unicode version history
3.0 (1999)	80 (+80)
7.0 (2014)	90 (+10)
13.0 (2020)	91 (+1)
Unicode documentation
	Code chart ∣ Web page
	Note:

Last updated July 27, 2024

Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.

Block

Sinhala ^[1]^[2] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+0D8x		ඁ	ං	ඃ		අ	ආ	ඇ	ඈ	ඉ	ඊ	උ	ඌ	ඍ	ඎ	ඏ
U+0D9x	ඐ	එ	ඒ	ඓ	ඔ	ඕ	ඖ				ක	ඛ	ග	ඝ	ඞ	ඟ
U+0DAx	ච	ඡ	ජ	ඣ	ඤ	ඥ	ඦ	ට	ඨ	ඩ	ඪ	ණ	ඬ	ත	ථ	ද
U+0DBx	ධ	න		ඳ	ප	ඵ	බ	භ	ම	ඹ	ය	ර		ල
U+0DCx	ව	ශ	ෂ	ස	හ	ළ	ෆ				්					ා
U+0DDx	ැ	ෑ	ි	ී	ු		ූ		ෘ	ෙ	ේ	ෛ	ො	ෝ	ෞ	ෟ
U+0DEx							෦	෧	෨	෩	෪	෫	෬	෭	෮	෯
U+0DFx			ෲ	ෳ	෴
Notes 1. ^ As of Unicode version 15.1 2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Sinhala block:

Version	Final code points^{[lower-alpha 1]}	Count	L2 ID	WG2 ID	Document
3.0	U+0D82..0D83, 0D85..0D96, 0D9A..0DB1, 0DB3..0DBB, 0DBD, 0DC0..0DC6, 0DCA, 0DCF..0DD4, 0DD6, 0DD8..0DDF, 0DF2..0DF4	80	L2/97-019	N1480	Ginige, S. L. (1996-11-15), Request to add Sinhalese to 10646 based on recent Sinhalese Standard (SLS 1134-1996)
			L2/97-019.1		Sri Lanka standard Sinhala character code for information interchange
			L2/97-018	N1473R	Everson, Michael (1997-03-01), Proposal for encoding the Sinhala script in ISO/IEC 10646 (revision 1)
				N1532	Ross, Hugh McGregor (1997-03-07), Comment on Sri Lanka Proposal for Sinhala Script
			L2/97-030	N1503 (pdf, doc)	Umamaheswaran, V. S.; Ksar, Mike (1997-04-01), "8.6", Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24
			L2/97-145	N1589	Everson, Michael (1997-06-13), Mapping of Sinhala between ISO/IEC 10646 and SLS 1134
				N1585	Disanayaka, J. B. (1997-06-25), Towards Standard Sinhala Character Code
				N1584	Adams, Glenn (1997-06-30), Feedback on Sinhala Script Proposals (N1480, N 1473R, etc.)
			L2/97-158	N1613	Report of ad-hoc group on Sinhala encoding, 1997-07-02
			L2/97-288	N1603	Umamaheswaran, V. S. (1997-10-24), "8.7", Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June – 4 July 1997
			L2/98-319	N1896	Revised text of 10646-1/FPDAM 21, AMENDMENT 21: Sinhala, 1998-10-22
			L2/99-010	N1903 (pdf, html, doc)	Umamaheswaran, V. S. (1998-12-30), Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25
			L2/10-164		Dias, Gihan (2010-05-05), Sinhala Named Sequences
			L2/10-108		Moore, Lisa (2010-05-19), "Consensus 123-C32", UTC #123 / L2 #220 Minutes, Accept three Sinhala named sequences as provisional in Unicode 6.0...
				N3903 (pdf, doc)	"M57.09 (Named USIs for Sinhala)", Unconfirmed minutes of WG2 meeting 57, 2011-03-31
7.0	U+0DE6..0DEF	10	L2/07-268	N3253 (pdf, doc)	Umamaheswaran, V. S. (2007-07-26), "M50.28", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27
			L2/08-007		Inclusion of archaic Sinhala numerals in the Sinhala character code range, 2008-01-07
			L2/08-068		Dias, Gihan (2008-01-28), Archaic Sinhala Numerals
			L2/08-105		Observations on the Encoding of Archaic Sinhala Numerals in Unicode/UCS, 2008-02-05
			L2/08-003		Moore, Lisa (2008-02-14), "Archaic Sinhala Numerals", UTC #114 Minutes
			L2/10-165		Dias, Gihan (2010-05-03), Preliminary Proposal to Encode Sinhala Digits and Numerals
			L2/10-312		Dias, Gihan (2010-08-10), Proposal to Encode Sinhala Archaic Numerals and Numbers
			L2/10-337	N3888	Proposal to include Sinhala Numerals to the BMP and SMP of the UCS, 2010-08-19
				N3888-A	Senaweera, L. N. (2010-09-10), Sri Lanka's proposal on Sinhala Numerals for inclusion in Information Technology - Universal Multiple Octet Coded Character Set, ISO/IEC 10646 : 2003
				N3888-B	Unicode Character Properties of Sinhala Lith Illakkam (Sinhala Astrological Digits) and Sinhala Illakkam or Sinhala Archaic Numbers
			L2/10-433		Wijayawardhana, Harsha; et al. (2010-10-23), RE: Background information on the use of Sinhala Numerals (L2/10-337)
			L2/10-416R		Moore, Lisa (2010-11-09), "Sinhala Numerals", UTC #125 / L2 #222 Minutes
				N3903 (pdf, doc)	"M57.14", Unconfirmed minutes of WG2 meeting 57, 2011-03-31
13.0	U+0D81	1	L2/18-060	N4964	A, Srinidhi; A, Sridatta (2018-02-05), Proposal to encode the CANDRABINDU for Sinhala
			L2/18-079		Anderson, Deborah (2018-03-21), Feedback on Sinhala candrabindu (L2/18-060)
			L2/18-168		Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Chapman, Chris; Cook, Richard (2018-04-28), "10. Sinhala", Recommendations to UTC #155 April-May 2018 on Script Proposals
			L2/18-115		Moore, Lisa (2018-05-09), "D.4.1", UTC #155 Minutes
			L2/18-183		Moore, Lisa (2018-11-20), "B.1.3.1.1.1", UTC #156 Minutes
				N5020 (pdf, doc)	Umamaheswaran, V. S. (2019-01-11), "10.3.14", Unconfirmed minutes of WG 2 meeting 67
			L2/20-052		Pournader, Roozbeh (2020-01-15), Changes to Identifier_Type of some Unicode 13.0 characters
			L2/20-015R		Moore, Lisa (2020-05-14), "B.13.4 Changes to Identifier_Type of some Unicode 13.0 characters", Draft Minutes of UTC Meeting 162
↑ Proposed code points and characters names may differ from final code points and names

Related Research Articles

The Sinhala script, also known as Sinhalese script, is a writing system used by the Sinhalese people and most Sri Lankans in Sri Lanka and elsewhere to write the Sinhala language as well as the liturgical languages Pali and Sanskrit. The Sinhalese Akṣara Mālāva, one of the Brahmic scripts, is a descendant of the Ancient Indian Brahmi script. It ultimately descended from the Grantha script.

The Tamil script is an abugida script that is used by Tamils and Tamil speakers in India, Sri Lanka, Malaysia, Singapore, Indonesia and elsewhere to write the Tamil language. It is one of the official scripts of the Indian Republic. Certain minority languages such as Saurashtra, Badaga, Irula and Paniya are also written in the Tamil script.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

Sinhala may refer to:

Sinhala language software for computers have been present since the late 1980s but no standard character representation system was put in place which resulted in proprietary character representation systems and fonts. In the wake of this CINTEC introduced Sinhala within the UNICODE standard. ICTA concluded the work started by CINTEC for approving and standardizing Sinhala Unicode in Sri Lanka.

The rupee sign "₨" is a currency sign used to represent the monetary unit of account in Pakistan, Sri Lanka, Nepal, Mauritius, Seychelles, and formerly in India. It resembles, and is often written as, the Latin character sequence "Rs", of which it is an orthographic ligature.

Sinhala input methods are ways of writing the Sinhala language, spoken primarily in Sri Lanka, using a computer. Sinhala input methods can be broadly classified into two main groups: ones based on typewriter keyboard layouts, and ones that are meant to be typed on QWERTY keyboards using an input method, known as "Singlish".

Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.

Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Indian states of Andhra Pradesh and Telangana. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of China, Bhutan, Nepal, Mongolia, northern India, eastern Pakistan and Russia.

Tamil All Character Encoding (TACE16) is a scheme for encoding the Tamil script in the Private Use Area of Unicode, implementing a syllabary-based character model differing from the modified-ISCII model used by Unicode's existing Tamil implementation.

Sinhala Archaic Numbers is a Unicode block containing Sinhala Illakkam number characters.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

References

↑ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[final-3] Proposed code points and characters names may differ from final code points and names

[1] "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.

[2] "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

[1]

[2]

[lower-alpha 1]

Sinhala
Range	U+0D80..U+0DFF (128 code points)
Plane	BMP
Scripts	Sinhala
Major alphabets	Sinhala Pali Sanskrit
Assigned	91 code points
Unused	37 reserved code points
Unicode version history

3.0 (1999)	80 (+80)
7.0 (2014)	90 (+10)
13.0 (2020)	91 (+1)

Unicode documentation
Code chart ∣ Web page
Note: ^[1]^[2]

Sinhala (Unicode block)

Contents

Block

History

Related Research Articles

References