Sinhala (Unicode block)

Last updated
Sinhala
RangeU+0D80..U+0DFF
(128 code points)
Plane BMP
Scripts Sinhala
Major alphabetsSinhala
Pali
Sanskrit
Assigned91 code points
Unused37 reserved code points
Unicode version history
3.0 (1999)80 (+80)
7.0 (2014)90 (+10)
13.0 (2020)91 (+1)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.

Contents

Block

Sinhala [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+0D8x
U+0D9x
U+0DAx
U+0DBx
U+0DCx
U+0DDx
U+0DEx
U+0DFx
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Sinhala block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
3.0U+0D82..0D83, 0D85..0D96, 0D9A..0DB1, 0DB3..0DBB, 0DBD, 0DC0..0DC6, 0DCA, 0DCF..0DD4, 0DD6, 0DD8..0DDF, 0DF2..0DF480L2/97-019N1480Ginige, S. L. (1996-11-15), Request to add Sinhalese to 10646 based on recent Sinhalese Standard (SLS 1134-1996)
L2/97-019.1Sri Lanka standard Sinhala character code for information interchange
L2/97-018 N1473R Everson, Michael (1997-03-01), Proposal for encoding the Sinhala script in ISO/IEC 10646 (revision 1)
N1532 Ross, Hugh McGregor (1997-03-07), Comment on Sri Lanka Proposal for Sinhala Script
L2/97-030 N1503 (pdf, doc)Umamaheswaran, V. S.; Ksar, Mike (1997-04-01), "8.6", Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24
L2/97-145N1589Everson, Michael (1997-06-13), Mapping of Sinhala between ISO/IEC 10646 and SLS 1134
N1585 Disanayaka, J. B. (1997-06-25), Towards Standard Sinhala Character Code
N1584Adams, Glenn (1997-06-30), Feedback on Sinhala Script Proposals (N1480, N 1473R, etc.)
L2/97-158N1613Report of ad-hoc group on Sinhala encoding, 1997-07-02
L2/97-288 N1603 Umamaheswaran, V. S. (1997-10-24), "8.7", Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June – 4 July 1997
L2/98-319 N1896 Revised text of 10646-1/FPDAM 21, AMENDMENT 21: Sinhala, 1998-10-22
L2/99-010 N1903 (pdf, html, doc)Umamaheswaran, V. S. (1998-12-30), Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25
L2/10-164 Dias, Gihan (2010-05-05), Sinhala Named Sequences
L2/10-108 Moore, Lisa (2010-05-19), "Consensus 123-C32", UTC #123 / L2 #220 Minutes, Accept three Sinhala named sequences as provisional in Unicode 6.0...
N3903 (pdf, doc)"M57.09 (Named USIs for Sinhala)", Unconfirmed minutes of WG2 meeting 57, 2011-03-31
7.0U+0DE6..0DEF10 L2/07-268 N3253 (pdf, doc)Umamaheswaran, V. S. (2007-07-26), "M50.28", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27
L2/08-007 Inclusion of archaic Sinhala numerals in the Sinhala character code range, 2008-01-07
L2/08-068 Dias, Gihan (2008-01-28), Archaic Sinhala Numerals
L2/08-105 Observations on the Encoding of Archaic Sinhala Numerals in Unicode/UCS, 2008-02-05
L2/08-003 Moore, Lisa (2008-02-14), "Archaic Sinhala Numerals", UTC #114 Minutes
L2/10-165 Dias, Gihan (2010-05-03), Preliminary Proposal to Encode Sinhala Digits and Numerals
L2/10-312 Dias, Gihan (2010-08-10), Proposal to Encode Sinhala Archaic Numerals and Numbers
L2/10-337 N3888 Proposal to include Sinhala Numerals to the BMP and SMP of the UCS, 2010-08-19
N3888-A Senaweera, L. N. (2010-09-10), Sri Lanka's proposal on Sinhala Numerals for inclusion in Information Technology - Universal Multiple Octet Coded Character Set, ISO/IEC 10646 : 2003
N3888-B Unicode Character Properties of Sinhala Lith Illakkam (Sinhala Astrological Digits) and Sinhala Illakkam or Sinhala Archaic Numbers
L2/10-433 Wijayawardhana, Harsha; et al. (2010-10-23), RE: Background information on the use of Sinhala Numerals (L2/10-337)
L2/10-416R Moore, Lisa (2010-11-09), "Sinhala Numerals", UTC #125 / L2 #222 Minutes
N3903 (pdf, doc)"M57.14", Unconfirmed minutes of WG2 meeting 57, 2011-03-31
13.0U+0D811 L2/18-060 N4964 A, Srinidhi; A, Sridatta (2018-02-05), Proposal to encode the CANDRABINDU for Sinhala
L2/18-079 Anderson, Deborah (2018-03-21), Feedback on Sinhala candrabindu (L2/18-060)
L2/18-168 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Chapman, Chris; Cook, Richard (2018-04-28), "10. Sinhala", Recommendations to UTC #155 April-May 2018 on Script Proposals
L2/18-115 Moore, Lisa (2018-05-09), "D.4.1", UTC #155 Minutes
L2/18-183 Moore, Lisa (2018-11-20), "B.1.3.1.1.1", UTC #156 Minutes
N5020 (pdf, doc)Umamaheswaran, V. S. (2019-01-11), "10.3.14", Unconfirmed minutes of WG 2 meeting 67
L2/20-052 Pournader, Roozbeh (2020-01-15), Changes to Identifier_Type of some Unicode 13.0 characters
L2/20-015R Moore, Lisa (2020-05-14), "B.13.4 Changes to Identifier_Type of some Unicode 13.0 characters", Draft Minutes of UTC Meeting 162
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

<span class="mw-page-title-main">Sinhala script</span> Abugida writing system

The Sinhala script, also known as Sinhalese script, is a writing system used by the Sinhalese people and most Sri Lankans in Sri Lanka and elsewhere to write the Sinhala language as well as the liturgical languages Pali and Sanskrit. The Sinhalese Akṣara Mālāva, one of the Brahmic scripts, is a descendant of the Ancient Indian Brahmi script. It is also related to the Grantha script.

<span class="mw-page-title-main">Tamil script</span> Brahmic script

The Tamil script is an abugida script that is used by Tamils and Tamil speakers in India, Sri Lanka, Malaysia, Singapore, Indonesia and elsewhere to write the Tamil language. It is one of the official scripts of the Indian Republic. Certain minority languages such as Saurashtra, Badaga, Irula and Paniya are also written in the Tamil script.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

Sinhala may refer to:

Sinhala language software for computers have been present since the late 1980s but no standard character representation system was put in place which resulted in proprietary character representation systems and fonts. In the wake of this CINTEC introduced Sinhala within the UNICODE standard. ICTA concluded the work started by CINTEC for approving and standardizing Sinhala Unicode in Sri Lanka.

The rupee sign "" is a currency sign used to represent the monetary unit of account in Pakistan, Sri Lanka, Nepal, Mauritius, Seychelles, and formerly in India. It resembles, and is often written as, the Latin character sequence "Rs", of which it is an orthographic ligature.

<span class="mw-page-title-main">Sinhala input methods</span>

Sinhala input methods are ways of writing the Sinhala language, spoken primarily in Sri Lanka, using a computer. Sinhala input methods can be broadly classified into two main groups: ones based on typewriter keyboard layouts, and ones that are meant to be typed on QWERTY keyboards using an input method, known as "Singlish".

Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.

Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Indian states of Andhra Pradesh and Telangana. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of China, Bhutan, Nepal, Mongolia, northern India, eastern Pakistan and Russia.

<span class="mw-page-title-main">Enclosed Ideographic Supplement</span> Unicode character block

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Sinhala Archaic Numbers is a Unicode block containing Sinhala Illakkam number characters.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.