Sinhala | |
---|---|
Range | U+0D80..U+0DFF (128 code points) |
Plane | BMP |
Scripts | Sinhala |
Major alphabets | Sinhala Pali Sanskrit |
Assigned | 91 code points |
Unused | 37 reserved code points |
Unicode version history | |
3.0 (1999) | 80 (+80) |
7.0 (2014) | 90 (+10) |
13.0 (2020) | 91 (+1) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1] [2] |
Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.
Sinhala [1] [2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+0D8x | ඁ | ං | ඃ | අ | ආ | ඇ | ඈ | ඉ | ඊ | උ | ඌ | ඍ | ඎ | ඏ | ||
U+0D9x | ඐ | එ | ඒ | ඓ | ඔ | ඕ | ඖ | ක | ඛ | ග | ඝ | ඞ | ඟ | |||
U+0DAx | ච | ඡ | ජ | ඣ | ඤ | ඥ | ඦ | ට | ඨ | ඩ | ඪ | ණ | ඬ | ත | ථ | ද |
U+0DBx | ධ | න | ඳ | ප | ඵ | බ | භ | ම | ඹ | ය | ර | ල | ||||
U+0DCx | ව | ශ | ෂ | ස | හ | ළ | ෆ | ් | ා | |||||||
U+0DDx | ැ | ෑ | ි | ී | ු | ූ | ෘ | ෙ | ේ | ෛ | ො | ෝ | ෞ | ෟ | ||
U+0DEx | ෦ | ෧ | ෨ | ෩ | ෪ | ෫ | ෬ | ෭ | ෮ | ෯ | ||||||
U+0DFx | ෲ | ෳ | ෴ | |||||||||||||
Notes |
The following Unicode-related documents record the purpose and process of defining specific characters in the Sinhala block:
Version | Final code points [lower-alpha 1] | Count | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|
3.0 | U+0D82..0D83, 0D85..0D96, 0D9A..0DB1, 0DB3..0DBB, 0DBD, 0DC0..0DC6, 0DCA, 0DCF..0DD4, 0DD6, 0DD8..0DDF, 0DF2..0DF4 | 80 | L2/97-019 | N1480 | Ginige, S. L. (1996-11-15), Request to add Sinhalese to 10646 based on recent Sinhalese Standard (SLS 1134-1996) |
L2/97-019.1 | Sri Lanka standard Sinhala character code for information interchange | ||||
L2/97-018 | N1473R | Everson, Michael (1997-03-01), Proposal for encoding the Sinhala script in ISO/IEC 10646 (revision 1) | |||
N1532 | Ross, Hugh McGregor (1997-03-07), Comment on Sri Lanka Proposal for Sinhala Script | ||||
L2/97-030 | N1503 (pdf, doc) | Umamaheswaran, V. S.; Ksar, Mike (1997-04-01), "8.6", Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24 | |||
L2/97-145 | N1589 | Everson, Michael (1997-06-13), Mapping of Sinhala between ISO/IEC 10646 and SLS 1134 | |||
N1585 | Disanayaka, J. B. (1997-06-25), Towards Standard Sinhala Character Code | ||||
N1584 | Adams, Glenn (1997-06-30), Feedback on Sinhala Script Proposals (N1480, N 1473R, etc.) | ||||
L2/97-158 | N1613 | Report of ad-hoc group on Sinhala encoding, 1997-07-02 | |||
L2/97-288 | N1603 | Umamaheswaran, V. S. (1997-10-24), "8.7", Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June – 4 July 1997 | |||
L2/98-319 | N1896 | Revised text of 10646-1/FPDAM 21, AMENDMENT 21: Sinhala, 1998-10-22 | |||
L2/99-010 | N1903 (pdf, html, doc) | Umamaheswaran, V. S. (1998-12-30), Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25 | |||
L2/10-164 | Dias, Gihan (2010-05-05), Sinhala Named Sequences | ||||
L2/10-108 | Moore, Lisa (2010-05-19), "Consensus 123-C32", UTC #123 / L2 #220 Minutes, Accept three Sinhala named sequences as provisional in Unicode 6.0... | ||||
N3903 (pdf, doc) | "M57.09 (Named USIs for Sinhala)", Unconfirmed minutes of WG2 meeting 57, 2011-03-31 | ||||
7.0 | U+0DE6..0DEF | 10 | L2/07-268 | N3253 (pdf, doc) | Umamaheswaran, V. S. (2007-07-26), "M50.28", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27 |
L2/08-007 | Inclusion of archaic Sinhala numerals in the Sinhala character code range, 2008-01-07 | ||||
L2/08-068 | Dias, Gihan (2008-01-28), Archaic Sinhala Numerals | ||||
L2/08-105 | Observations on the Encoding of Archaic Sinhala Numerals in Unicode/UCS, 2008-02-05 | ||||
L2/08-003 | Moore, Lisa (2008-02-14), "Archaic Sinhala Numerals", UTC #114 Minutes | ||||
L2/10-165 | Dias, Gihan (2010-05-03), Preliminary Proposal to Encode Sinhala Digits and Numerals | ||||
L2/10-312 | Dias, Gihan (2010-08-10), Proposal to Encode Sinhala Archaic Numerals and Numbers | ||||
L2/10-337 | N3888 | Proposal to include Sinhala Numerals to the BMP and SMP of the UCS, 2010-08-19 | |||
N3888-A | Senaweera, L. N. (2010-09-10), Sri Lanka's proposal on Sinhala Numerals for inclusion in Information Technology - Universal Multiple Octet Coded Character Set, ISO/IEC 10646 : 2003 | ||||
N3888-B | Unicode Character Properties of Sinhala Lith Illakkam (Sinhala Astrological Digits) and Sinhala Illakkam or Sinhala Archaic Numbers | ||||
L2/10-433 | Wijayawardhana, Harsha; et al. (2010-10-23), RE: Background information on the use of Sinhala Numerals (L2/10-337) | ||||
L2/10-416R | Moore, Lisa (2010-11-09), "Sinhala Numerals", UTC #125 / L2 #222 Minutes | ||||
N3903 (pdf, doc) | "M57.14", Unconfirmed minutes of WG2 meeting 57, 2011-03-31 | ||||
13.0 | U+0D81 | 1 | L2/18-060 | N4964 | A, Srinidhi; A, Sridatta (2018-02-05), Proposal to encode the CANDRABINDU for Sinhala |
L2/18-079 | Anderson, Deborah (2018-03-21), Feedback on Sinhala candrabindu (L2/18-060) | ||||
L2/18-168 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Chapman, Chris; Cook, Richard (2018-04-28), "10. Sinhala", Recommendations to UTC #155 April-May 2018 on Script Proposals | ||||
L2/18-115 | Moore, Lisa (2018-05-09), "D.4.1", UTC #155 Minutes | ||||
L2/18-183 | Moore, Lisa (2018-11-20), "B.1.3.1.1.1", UTC #156 Minutes | ||||
N5020 (pdf, doc) | Umamaheswaran, V. S. (2019-01-11), "10.3.14", Unconfirmed minutes of WG 2 meeting 67 | ||||
L2/20-052 | Pournader, Roozbeh (2020-01-15), Changes to Identifier_Type of some Unicode 13.0 characters | ||||
L2/20-015R | Moore, Lisa (2020-05-14), "B.13.4 Changes to Identifier_Type of some Unicode 13.0 characters", Draft Minutes of UTC Meeting 162 | ||||
|
The Sinhala script, also known as Sinhalese script, is a writing system used by the Sinhalese people and most Sri Lankans in Sri Lanka and elsewhere to write the Sinhala language as well as the liturgical languages Pali and Sanskrit. The Sinhalese Akṣara Mālāva, one of the Brahmic scripts, is a descendant of the Ancient Indian Brahmi script. It ultimately descended from the Grantha script.
The Tamil script is an abugida script that is used by Tamils and Tamil speakers in India, Sri Lanka, Malaysia, Singapore, Indonesia and elsewhere to write the Tamil language. It is one of the official scripts of the Indian Republic. Certain minority languages such as Saurashtra, Badaga, Irula and Paniya are also written in the Tamil script.
Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.
Sinhala may refer to:
Sinhala language software for computers have been present since the late 1980s but no standard character representation system was put in place which resulted in proprietary character representation systems and fonts. In the wake of this CINTEC introduced Sinhala within the UNICODE standard. ICTA concluded the work started by CINTEC for approving and standardizing Sinhala Unicode in Sri Lanka.
The rupee sign "₨" is a currency sign used to represent the monetary unit of account in Pakistan, Sri Lanka, Nepal, Mauritius, Seychelles, and formerly in India. It resembles, and is often written as, the Latin character sequence "Rs", of which it is an orthographic ligature.
Sinhala input methods are ways of writing the Sinhala language, spoken primarily in Sri Lanka, using a computer. Sinhala input methods can be broadly classified into two main groups: ones based on typewriter keyboard layouts, and ones that are meant to be typed on QWERTY keyboards using an input method, known as "Singlish".
Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.
Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Indian states of Andhra Pradesh and Telangana. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.
Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.
Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.
Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of China, Bhutan, Nepal, Mongolia, northern India, eastern Pakistan and Russia.
Tamil All Character Encoding (TACE16) is a scheme for encoding the Tamil script in the Private Use Area of Unicode, implementing a syllabary-based character model differing from the modified-ISCII model used by Unicode's existing Tamil implementation.
Sinhala Archaic Numbers is a Unicode block containing Sinhala Illakkam number characters.
Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.