Gurmukhi (Unicode block)

Last updated
Gurmukhi
RangeU+0A00..U+0A7F
(128 code points)
Plane BMP
Scripts Gurmukhi
Major alphabetsPunjabi
Assigned80 code points
Unused48 reserved code points
Source standards ISCII
Unicode version history
1.0.0 (1991)74 (+74)
1.1 (1993)75 (+1)
4.0 (2003)77 (+2)
5.1 (2008)79 (+2)
11.0 (2018)80 (+1)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2] [3]

Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Contents

Block

Gurmukhi [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+0A0x
U+0A1x
U+0A2x
U+0A3xਿ
U+0A4x
U+0A5x
U+0A6x
U+0A7x
Notes
1. ^ As of Unicode version 15.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Gurmukhi block:

Version Final code points [lower-alpha 1] Count UTC  ID L2  ID WG2  IDDocument
1.0.0U+0A02, 0A05..0A0A, 0A0F..0A10, 0A13..0A28, 0A2A..0A30, 0A32..0A33, 0A35..0A36, 0A38..0A39, 0A3C, 0A3E..0A42, 0A47..0A48, 0A4B..0A4C, 0A59..0A5C, 0A5E, 0A66..0A7474UTC/1991-056Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam
UTC/1991-057Whistler, Ken, Indic names list
UTC/1991-048B Whistler, Ken (1991-03-27), "III. L. Walk In proposals", Draft Minutes from the UTC meeting #46 day 2, 3/27 at Apple
L2/01-303 Vikas, Om (2001-07-26), Letter from the Government from India on "Draft for Unicode Standard for Indian Scripts"
L2/01-304 Feedback on Unicode Standard 3.0, 2001-08-02
L2/01-305 McGowan, Rick (2001-08-08), Draft UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
L2/01-430R McGowan, Rick (2001-11-20), UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
L2/05-371R Sidhu, Sukhjinder (2005-11-30), Gurmukhi annotations
L2/06-008R2 Moore, Lisa (2006-02-13), "C.8", UTC #106 Minutes
L2/20-055 Pournader, Roozbeh (2020-01-16), Proposed sequences for composition exclusions
L2/20-015 Moore, Lisa (2020-01-23), "B.13.1.1 Proposed sequences for composition exclusions", Draft Minutes of UTC Meeting 162
1.1U+0A4D1(to be determined)
4.0U+0A01, 0A032 L2/01-431R [lower-alpha 2] McGowan, Rick (2001-11-08), Actions for UTC and Editorial Committee in response to L2/01-430R
L2/01-405R Moore, Lisa (2001-12-12), "Consensus 89-C19", Minutes from the UTC/L2 meeting in Mountain View, November 6-9, 2001, Accept the twelve Indic characters with names and coding positions as documented in L2/01-431R
L2/02-117 N2425 McGowan, Rick (2002-03-21), Additional Characters for Indic Scripts
L2/03-102 Vikas, Om (2003-03-04), Unicode Standard for Indic Scripts
L2/03-101.4 Proposed Changes in Indic Scripts [Gurmukhi document], 2003-03-04
5.1U+0A511 L2/05-088R Sidhu, Sukhjinder (2005-04-21), Proposed Changes to Gurmukhi
L2/05-167 Sidhu, Sukhjinder (2005-08-01), Proposed Changes to Gurmukhi 2
L2/05-180 Moore, Lisa (2005-08-17), "Gurmukhi (C.6)", UTC #104 Minutes
L2/05-344 Sidhu, Sukhjinder (2005-10-27), Proposed changes to Gurmukhi 3
L2/05-279 Moore, Lisa (2005-11-10), "C.14", UTC #105 Minutes
L2/05-384 N3021 Sidhu, Sukhjinder (2005-12-18), Proposal to encode Gurmukhi 3
L2/06-020 McGowan, Rick (2006-01-25), Public Review Issue #82: Representation of Gurmukhi Double Vowels
L2/06-030 Sidhu, Sukhjinder (2006-01-27), "E", Proposed Changes to Gurmukhi 4
L2/06-008R2 Moore, Lisa (2006-02-13), "B.11.5, C.8", UTC #106 Minutes
N3103 (pdf, doc)Umamaheswaran, V. S. (2006-08-25), "M48.25a", Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27
U+0A751 L2/06-008R2 Moore, Lisa (2006-02-13), "C.8", UTC #106 Minutes
L2/06-037R N3073 Sidhu, Sukhjinder (2006-04-07), Proposal to encode Gurmukhi Sign Yakash
N3103 (pdf, doc)Umamaheswaran, V. S. (2006-08-25), "M48.25b", Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27
L2/16-294 Singh, Sarabveer (2016-10-27), Changes to Gurmukhi 10
L2/16-302 Sharma, Shriramana (2016-10-28), Feedback on L2/16-294 on Gurmukhi
L2/16-327 McGowan, Rick (2016-11-07), "Feedback on L2/16-294 (Gurmukhi)", Comments on Public Review Issues (July 27 - Nov 7, 2016)
L2/16-342 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu (2016-11-07), "4", Recommendations to UTC #149 November 2016 on Script Proposals
L2/16-325 Moore, Lisa (2016-11-18), "D.4 (later rescinded)", UTC #149 Minutes
L2/16-380 Singh, Manvir (2016-12-09), Feedback on L2/16-294
L2/16-384 Singh, Sarabveer (2016-12-13), Feedback on L2/16-380
L2/17-037 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu; Moore, Lisa; Liang, Hai; Ishida, Richard; Misra, Karan; McGowan, Rick (2017-01-21), "5. Gurmukhi", Recommendations to UTC #150 January 2017 on Script Proposals
L2/17-016 Moore, Lisa (2017-02-08), "Consensus 150-C13", UTC #150 Minutes, Retain the current glyph in the code charts for Yakash, U+0A75, rescinding the decision documented under D.4.1 in the UTC #149 minutes.
11.0U+0A761 L2/16-209R A, Srinidhi; A, Sridatta (2016-07-25), Proposal to Encode an Abbreviation Sign for Gurmukhi
L2/16-203 Moore, Lisa (2016-08-18), "D.9", UTC #148 Minutes
N4873R (pdf, doc)"M65.08f", Unconfirmed minutes of WG 2 meeting 65, 2018-03-16
  1. Proposed code points and characters names may differ from final code points and names
  2. See also L2/01-303, L2/01-304, L2/01-305, and L2/01-430R

Related Research Articles

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of the screen and portraying drop shadows. Its block name in Unicode 1.0 was Blocks.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Indian states of Andhra Pradesh and Telangana. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text," and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Symbols for Legacy Computing is a Unicode block containing graphic characters that were used for various home computers from the 1970s and 1980s and in Teletext broadcasting standards. It includes characters from the Amstrad CPC, MSX, Mattel Aquarius, RISC OS, MouseText, Atari ST, TRS-80 Color Computer, Oric, Texas Instruments TI-99/4A, TRS-80, Minitel, Teletext, ATASCII, PETSCII, ZX80, and ZX81 character sets, as well as semigraphics characters.

Mac OS Gurmukhi is a character set developed by Apple Inc., based on IS 13194:1991 (ISCII-91).

References

  1. "Unicode 1.0.1 Addendum" (PDF). The Unicode Standard. 1992-11-03. Retrieved 2016-07-09.
  2. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  3. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.