Bengali (Unicode block)

Bengali
Bengali
Range	U+0980..U+09FF; (128 code points)
Plane	BMP
Scripts	Bengali
Major alphabets	Bengali, Assamese
Assigned	96 code points
Unused	32 reserved code points
Source standards	ISCII
Unicode version history
1.0.0 (1991)	89 (+89)
4.0 (2003)	90 (+1)
4.1 (2005)	91 (+1)
5.2 (2009)	92 (+1)
7.0 (2014)	93 (+1)
10.0 (2017)	95 (+2)
11.0 (2018)	96 (+1)
Unicode documentation
	Code chart ∣ Web page
	Note: This article contains Indic text. Without proper rendering support, you may see question marks or boxes, misplaced vowels or missing conjunctsinstead of Indic text.

Last updated May 26, 2024

Bengali Unicode block contains characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Riang, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings.

Block

Bengali ^[1]^[2] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+098x	ঀ	ঁ	ং	ঃ		অ	আ	ই	ঈ	উ	ঊ	ঋ	ঌ			এ
U+099x	ঐ			ও	ঔ	ক	খ	গ	ঘ	ঙ	চ	ছ	জ	ঝ	ঞ	ট
U+09Ax	ঠ	ড	ঢ	ণ	ত	থ	দ	ধ	ন		প	ফ	ব	ভ	ম	য
U+09Bx	র		ল				শ	ষ	স	হ			়	ঽ	া	ি
U+09Cx	ী	ু	ূ	ৃ	ৄ			ে	ৈ			ো	ৌ	্	ৎ
U+09Dx								ৗ					ড়	ঢ়		য়
U+09Ex	ৠ	ৡ	ৢ	ৣ			০	১	২	৩	৪	৫	৬	৭	৮	৯
U+09Fx	ৰ	ৱ	৲	৳	৴	৵	৶	৷	৸	৹	৺	৻	ৼ	৽	৾
Notes 1. ^ As of Unicode version 15.1 2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Bengali block:

Version	Final code points^{[lower-alpha 1]}	Count	UTC ID	L2 ID	WG2 ID	Document
1.0.0	U+0981..0983, 0985..098C, 098F..0990, 0993..09A8, 09AA..09B0, 09B2, 09B6..09B9, 09BC, 09BE..09C4, 09C7..09C8, 09CB..09CD, 09D7, 09DC..09DD, 09DF..09E3, 09E6..09FA	89	UTC/1991-056			Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam
			UTC/1991-057			Whistler, Ken, Indic names list
			UTC/1991-048B			Whistler, Ken (1991-03-27), "III. L. Walk In proposals", Draft Minutes from the UTC meeting #46 day 2, 3/27 at Apple
				L2/01-303		Vikas, Om (2001-07-26), Letter from the Government from India on "Draft for Unicode Standard for Indian Scripts"
				L2/01-304		Feedback on Unicode Standard 3.0, 2001-08-02
				L2/01-305		McGowan, Rick (2001-08-08), Draft UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
				L2/01-430R		McGowan, Rick (2001-11-20), UTC Response to L2/01-304, "Feedback on Unicode Standard 3.0"
				L2/03-113		Everson, Michael (2003-03-05), Conjuncts: making sure we are right
				L2/08-288		Whistler, Ken (2008-08-04), Public Review Issue #123: Bengali Currency Numerator Values
				L2/08-361		Moore, Lisa (2008-12-02), "Bengali Currency Numerator Values (B.11.1)", UTC #117 Minutes
				L2/09-225R		Moore, Lisa (2009-08-17), "E.1.2", UTC #120 / L2 #217 Minutes
				L2/20-055		Pournader, Roozbeh (2020-01-16), Proposed sequences for composition exclusions
				L2/20-015		Moore, Lisa (2020-01-23), "B.13.1.1 Proposed sequences for composition exclusions", Draft Minutes of UTC Meeting 162
4.0	U+09BD	1		L2/01-431R ^{[lower-alpha 2]}		McGowan, Rick (2001-11-08), Actions for UTC and Editorial Committee in response to L2/01-430R
				L2/01-405R		Moore, Lisa (2001-12-12), "Consensus 89-C19", Minutes from the UTC/L2 meeting in Mountain View, November 6-9, 2001, Accept the twelve Indic characters with names and coding positions as documented in L2/01-431R
				L2/02-117	N2425	McGowan, Rick (2002-03-21), Additional Characters for Indic Scripts
				L2/03-084		Jain, Manoj (2003-03-03), Proposed changes in the Unicode Standards for Indic Scripts - Bengali
				L2/03-102		Vikas, Om (2003-03-04), Unicode Standard for Indic Scripts
				L2/03-101.1		Proposed Changes in Indic Scripts [Bengali document], 2003-03-04
				L2/03-104		Jain, Manoj (2003-03-04), Sample Text for Bengali Sign Avagraha
				L2/04-102		Pavanaja, U. B. (2004-02-10), Bug in Kannada collation
				L2/04-432		Wissink, Cathy (2004-12-31), Indic collation: action items 99-20 and 99-29
4.1	U+09CE	1		L2/00-303	N2261	Incorporation of Bangla (Bengali) Coded Character in ISO/IEC 10646-1, 2000-08-23
				L2/00-304	N2261-1	Proposal Summary Form for character U+09BA, KHANDATA, 2000-08-23
				L2/01-050	N2253	Umamaheswaran, V. S. (2001-01-21), "7.12 Proposal to synchronize Bengali standard with 10646", Minutes of the SC2/WG2 meeting in Athens, September 2000
				L2/03-084		Jain, Manoj (2003-03-03), Proposed changes in the Unicode Standards for Indic Scripts - Bengali
				L2/03-102		Vikas, Om (2003-03-04), Unicode Standard for Indic Scripts
				L2/03-101.1		Proposed Changes in Indic Scripts [Bengali document], 2003-03-04
				L2/04-060		Sengupta, Gautam (2004-02-01), Encoding Bangla Khanda-Ta With Ta+Virama
				L2/04-062		Constable, Peter (2004-02-01), Encoding Bangla Khanda-Ta With Ta+Virama
				L2/04-102		Pavanaja, U. B. (2004-02-10), Bug in Kannada collation
				L2/04-262	N2810	Constable, Peter (2004-02-17), Encoding of Bengali Khanda Ta in Unicode (PRI #30 document)
				L2/04-192	N2811	Sengupta, Gautam (2004-06-07), Feedback on PR-30: Encoding of Bangla Khanda Ta in Unicode
				L2/04-233	N2812	Vikas, Om (2004-06-10), Letter to Mark Davis re Bengali Khanda Ta
				L2/04-252	N2813	Constable, Peter (2004-06-15), Review of Bengali Khanda Ta and PRI-30 Feedback
				L2/04-264	N2809	Constable, Peter (2004-06-17), Proposal to encode Bengali Khanda Ta in the UCS
				L2/04-432		Wissink, Cathy (2004-12-31), Indic collation: action items 99-20 and 99-29
5.2	U+09FB	1			N3353 (pdf, doc)	Umamaheswaran, V. S. (2007-10-10), "M51.18", Unconfirmed minutes of WG 2 meeting 51 Hanzhou, China; 2007-04-24/27
				L2/07-192	N3311	Pandey, Anshuman (2007-05-21), Proposal to Encode the Ganda Currency Mark for Bengali in the BMP of the UCS
				L2/07-225		Moore, Lisa (2007-08-21), "Bengali", UTC #112 Minutes
7.0	U+0980	1		L2/11-359		Pandey, Anshuman (2011-10-21), Proposal to Encode the Sign Anji for Bengali
				L2/11-403		Anderson, Deborah; McGowan, Rick; Whistler, Ken (2011-10-26), "IV. BENGALI", Review of Indic-related L2 documents and Recommendations to the UTC
				L2/11-408		Lata, Swaran (2011-10-27), Letter from Swaran Lata, Gov't of India, re proposals
				L2/12-079		Lata, Swaran (2012-02-07), Inputs of Govt. of India on various documents
				L2/12-121	N4157	Pandey, Anshuman (2012-04-23), Proposal to Encode the Sign ANJI for Bengali
				L2/12-147		Anderson, Deborah; McGowan, Rick; Whistler, Ken (2012-04-25), "VII. BENGALI", Review of Indic-related L2 documents and Recommendations to the UTC
				L2/12-184		Lata, Swaran (2012-05-07), GOI Feedback on the various Indic related documents
				L2/12-277		Lata, Swaran (2012-07-26), GOI Feedback on the various Indic related document submitted to UTC
10.0	U+09FC	1		L2/15-204		Anderson, Deborah; et al. (2015-07-25), "3. Bengali", Recommendations to UTC #144 July 2015 on Script Proposals
				L2/15-161		Sharma, Shriramana (2015-07-31), Proposal to encode 09CF BENGALI LETTER VEDIC ANUSVARA
				L2/15-187		Moore, Lisa (2015-08-11), "D.6.1", UTC #144 Minutes
					N4739	"M64.06", Unconfirmed minutes of WG 2 meeting 64, 2016-08-31
	U+09FD	1		L2/15-172R		A, Srinidhi; A, Sridatta (2015-07-09), Proposal to Encode an Abbreviation Sign for Bengali
				L2/15-204		Anderson, Deborah; et al. (2015-07-25), "3. Bengali", Recommendations to UTC #144 July 2015 on Script Proposals
				L2/15-187		Moore, Lisa (2015-08-11), "D.6.2", UTC #144 Minutes
					N4739	"M64.06", Unconfirmed minutes of WG 2 meeting 64, 2016-08-31
11.0	U+09FE	1		L2/16-322	N4808	A, Srinidhi; A, Sridatta (2016-11-01), Proposal to encode the SANDHI MARK for Bengali
				L2/17-037		Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu; Moore, Lisa; Liang, Hai; Ishida, Richard; Misra, Karan; McGowan, Rick (2017-01-21), "4. Bengali", Recommendations to UTC #150 January 2017 on Script Proposals
				L2/17-016		Moore, Lisa (2017-02-08), "D.3.2", UTC #150 Minutes
				L2/17-130		Anderson, Deborah (2017-04-19), Comments on L2/16-322 and L2/16-383, Sandhi marks for Bengali and Newa
				L2/17-153		Anderson, Deborah (2017-05-17), "4. Bengali and Newa", Recommendations to UTC #151 May 2017 on Script Proposals
				L2/17-103		Moore, Lisa (2017-05-18), "D.3 Sandhi Mark", UTC #151 Minutes
↑ Proposed code points and characters names may differ from final code points and names ↑ See also L2/01-303, L2/01-304, L2/01-305, and L2/01-430R

Related Research Articles

The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South, East and Southeast Asia: Indo-Aryan, Dravidian, Tibeto-Burman, Mongolic, Austroasiatic, Austronesian, and Tai. They were also the source of the dictionary order (gojūon) of Japanese kana.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

In Indic scripts, the daṇḍa is a punctuation mark. The glyph consists of a single vertical stroke.

The Tirhuta or Maithili script was the primary historical script for the Maithili language, as well as one of the historical scripts for Sanskrit. It is believed to have originated in the 10th century CE. It is very similar to Bengali–Assamese script, with most consonants being effectively identical in appearance. For the most part, writing in Maithili has switched to the Devanagari script, which is used to write neighbouring Central Indic languages to the west and north such as Hindi and Nepali, and the number of people with a working knowledge of Tirhuta has dropped considerably in recent years.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

The Bengali–Assamese script, sometimes also known as Eastern Nagari, is an eastern Brahmic script, primarily used today for the Bengali and Assamese language spoken in eastern South Asia. It evolved from Gaudi script, also the common ancestor of the Odia and Trihuta scripts. It is commonly referred to as the Bengali script by Bengalis and the Assamese script by the Assamese, while in academic discourse it is sometimes called Eastern-Nāgarī. Three of the 22 official languages of the Indian Republic—Bengali, Assamese, and Meitei—commonly use this script in writing; Bengali is also the official and national language of Bangladesh.

Devanagari is a Unicode block containing characters for writing languages such as Hindi, Marathi, Bodo, Maithili, Sindhi, Nepali, and Sanskrit, among others. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Indian states of Andhra Pradesh and Telangana. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Kannada is a Unicode block containing characters for the Kannada, Sanskrit, Konkani, Sankethi, Havyaka, Tulu and Kodava languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings.

Malayalam is a Unicode block containing characters of the Malayalam script. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, and Kannada blocks were similarly all based on their ISCII encodings.

Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations.

Hiragana is a Unicode block containing hiragana characters for the Japanese language.

Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more kana, and many containing CJK ideographs. Many of its characters were added for compatibility with the Japanese ARIB STD-B24 standard. Six symbols from Chinese folk religion were added in Unicode version 10.

Tamil All Character Encoding (TACE16) is a scheme for encoding the Tamil script in the Private Use Area of Unicode, implementing a syllabary-based character model differing from the modified-ISCII model used by Unicode's existing Tamil implementation.

References

↑ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[final-3] Proposed code points and characters names may differ from final code points and names

[also01430-4] See also L2/01-303, L2/01-304, L2/01-305, and L2/01-430R

[1] "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.

[2] "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

[1]

[2]

[lower-alpha 1]

[lower-alpha 2]

Bengali
Range	U+0980..U+09FF (128 code points)
Plane	BMP
Scripts	Bengali
Major alphabets	Bengali, Assamese
Assigned	96 code points
Unused	32 reserved code points
Source standards	ISCII
Unicode version history

1.0.0 (1991)	89 (+89)
4.0 (2003)	90 (+1)
4.1 (2005)	91 (+1)
5.2 (2009)	92 (+1)
7.0 (2014)	93 (+1)
10.0 (2017)	95 (+2)
11.0 (2018)	96 (+1)

Unicode documentation
Code chart ∣ Web page
Note: ^[1]^[2] This article contains Indic text. Without proper rendering support, you may see question marks or boxes, misplaced vowels or missing conjunctsinstead of Indic text.

Bengali (Unicode block)

Contents

Block

History

Related Research Articles

References