Arabic Supplement

Last updated
Arabic Supplement
RangeU+0750..U+077F
(48 code points)
Plane BMP
Scripts Arabic
Major alphabets Khowar
Torwali
Burushaski
Shahmukhi
Arwi
Jawi script
Ajami script
Early Persian
Assigned48 code points
Unused0 reserved code points
Unicode version history
4.1 (2005)30 (+30)
5.1 (2008)48 (+18)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Arabic Supplement is a Unicode block that encodes Arabic letter variants used for writing non-Arabic languages, including languages of Pakistan and Africa, and old Persian.

Contents

Block

Arabic Supplement [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+075xݐݑݒݓݔݕݖݗݘݙݚݛݜݝݞݟ
U+076xݠݡݢݣݤݥݦݧݨݩݪݫݬݭݮݯ
U+077xݰݱݲݳݴݵݶݷݸݹݺݻݼݽݾݿ
Notes
1. ^ As of Unicode version 15.1

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic Supplement block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
4.1U+0750..076926 L2/02-274 Kew, Jonathan (2002-07-16), Proposal for extensions to the Arabic block
L2/03-168 Kew, Jonathan (2003-06-02), Proposal to encode Arabic-script letters for African languages
L2/03-176 Kew, Jonathan (2003-06-03), Proposal to encode Jawi and Moroccan Arabic GAF characters
L2/03-210 Kew, Jonathan (2003-06-12), Draft chart showing UTC #95 additions to Arabic blocks
L2/03-223 N2598 Kew, Jonathan (2003-07-10), Proposal to encode additional Arabic-script characters
U+076A1 L2/03-228R2 N2627 Kew, Jonathan (2003-09-29), Proposal to encode Marwari LAM WITH BAR Character
L2/03-240R3 Moore, Lisa (2003-10-21), "Marwari Lam with Bar (B.14.6)", UTC #96 Minutes
U+076B..076D3 L2/04-025R N2723 Kew, Jonathan (2004-03-15), Proposal to encode Additional Arabic script characters
5.1U+076E..077D16 N3117 Bashir, Elena; Hussain, Sarmad; Anderson, Deborah (2006-07-27), Proposal to add characters needed for Khowar, Torwali, and Burushaski
L2/06-150 Bashir, Elena (2006-05-05), Letters of support for characters needed for Khowar, Torwali, and Burushaski
L2/06-149 Bashir, Elena; Hussain, Sarmad; Anderson, Deborah (2006-05-09), Proposal to add characters needed for Khowar, Torwali, and Burushaski
L2/06-108 Moore, Lisa (2006-05-25), "C.18", UTC #107 Minutes
N3153 (pdf, doc)Umamaheswaran, V. S. (2007-02-16), "M49.8", Unconfirmed minutes of WG 2 meeting 49 AIST, Akihabara, Tokyo, Japan; 2006-09-25/29
L2/06-328 Pournader, Roozbeh (2006-10-11), Proposal to change the previously decided name of some Arabic characters
L2/06-324R2 Moore, Lisa (2006-11-29), "Consensus 109-C27", UTC #109 Minutes
L2/07-268 N3253 (pdf, doc)Umamaheswaran, V. S. (2007-07-26), "M50.4e", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27, Names of characters in the range 0773 to 077D are changed by replacing the word 'EASTERN' with 'EXTENDED' in them.
L2/07-264 Anderson, Deborah (2007-08-06), Shaping behavior of Burushaski characters and other Arabic additions in L2/06-149
L2/07-225 Moore, Lisa (2007-08-21), "Burushaski Shaping Behavior", UTC #112 Minutes
L2/10-158 Mansour, Kamal (2010-05-04), Shaping Behavior of U+0777
L2/10-108 Moore, Lisa (2010-05-19), "Action item 123-A50", UTC #123 / L2 #220 Minutes, Suggest clarifying text in section 8.2 of TUS 5.2 pp 248-249 regarding Yeh and Farsi Yeh joining groups.
U+077E..077F2 L2/06-345R N3180R Everson, Michael; Pournader, Roozbeh; Sarbar, Elnaz (2006-10-24), Proposal to encode eight Arabic characters for Persian and Azerbaijani in the UCS
L2/06-324R2 Moore, Lisa (2006-11-29), "C.12", UTC #109 Minutes
L2/07-268 N3253 (pdf, doc)Umamaheswaran, V. S. (2007-07-26), "M50.15", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut, Azerbaijani, and Jakovlev's Chuvash orthography.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

Syriac is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam.

Georgian is a Unicode block containing the Mkhedruli and Asomtavruli Georgian characters used to write Modern Georgian, Svan, and Mingrelian languages. Another lower case, Nuskhuri, is encoded in a separate Georgian Supplement block, which is used with the Asomtavruli to write the ecclesiastical Khutsuri Georgian script.

Ethiopic Supplement is a Unicode block containing extra Geʽez characters for writing the Sebatbeit language, and Ethiopic tone marks.

Tamil is a Unicode block containing characters for the Tamil, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B82..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block, added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Hiragana is a Unicode block containing hiragana characters for the Japanese language.

Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages.

Bamum is a Unicode block containing the characters of stage-G Bamum script, used for modern writing of the Bamum language of western Cameroon. Characters for writing earlier orthographies are contained in a Bamum Supplement block.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Kana Extended-A is a Unicode block containing hentaigana and historic kana characters. Additional hentaigana characters are encoded in the Kana Supplement block.

Tangut Supplement is a Unicode block containing characters from the Tangut script, which was used for writing the Tangut language spoken by the Tangut people in the Western Xia Empire, and in China during the Yuan dynasty and early Ming dynasty. This block is a supplement to the main Tangut block.

Lisu Supplement is a Unicode block containing supplementary characters of the Fraser alphabet, which is used to write the Lisu language. This is a supplement to the main Lisu block, with currently only a single character used for the Naxi language assigned to it.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.