Arabic Extended-B | |
---|---|
Range | U+0870..U+089F (48 code points) |
Plane | BMP |
Scripts | Arabic |
Major alphabets | Bosnian Javanese Sorabe Sundanese |
Assigned | 42 code points |
Unused | 6 reserved code points |
Unicode version history | |
14.0 (2021) | 41 (+41) |
16.0 (2024) | 42 (+1) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1] [2] |
Arabic Extended-B is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages. The block also includes currency symbols and an abbreviation mark. [3]
Arabic Extended-B [1] [2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+087x | ࡰ | ࡱ | ࡲ | ࡳ | ࡴ | ࡵ | ࡶ | ࡷ | ࡸ | ࡹ | ࡺ | ࡻ | ࡼ | ࡽ | ࡾ | ࡿ |
U+088x | ࢀ | ࢁ | ࢂ | ࢃ | ࢄ | ࢅ | ࢆ | ࢇ | ࢈ | ࢉ | ࢊ | ࢋ | ࢌ | ࢍ | ࢎ | |
U+089x | | | | ࢘ | ࢙ | ࢚ | ࢛ | ࢜ | ࢝ | ࢞ | ࢟ | |||||
Notes |
The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic Extended-B block:
Version | Final code points [lower-alpha 1] | Count | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|
14.0 | U+0870..0888, 089D..089F | 28 | L2/19-306 | N5142 | Pournader, Roozbeh; Anderson, Deborah (2019-09-29), Arabic additions for Quranic orthographies |
L2/19-343 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2019-10-06), "a. Additions for Quranic orthographies", Recommendations to UTC #161 October 2019 on Script Proposals | ||||
L2/19-323 | Moore, Lisa (2019-10-01), "Consensus 161-C4", UTC #161 Minutes | ||||
L2/20-105 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3f. Comments on L2/19-306", Recommendations to UTC #163 April 2020 on Script Proposals | ||||
U+0889..088A | 2 | L2/19-339 | Jacquerye, Denis Moyogo (2019-10-03), Proposal to encode Bosnian Arabic characters | ||
L2/19-343 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2019-10-06), "d. Bosnian Arabic characters", Recommendations to UTC #161 October 2019 on Script Proposals | ||||
L2/19-323 | Moore, Lisa (2019-10-01), "C.6.5", UTC #161 Minutes | ||||
U+088B..088D | 3 | L2/19-340 | Jacquerye, Denis Moyogo (2019-10-03), Proposal to encode Javanese and Sundanese Arabic characters | ||
L2/19-323 | Moore, Lisa (2019-10-01), "C.6.6", UTC #161 Minutes | ||||
U+088E | 1 | L2/20-071R | Pournader, Roozbeh; Izadpanah, Borna (2020-05-01), Proposal to encode an Arabic tail character used for abbreviation | ||
L2/20-105 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3b. Arabic Tail Character", Recommendations to UTC #163 April 2020 on Script Proposals | ||||
L2/20-102 | Moore, Lisa (2020-05-06), "Consensus 163-C26", UTC #163 Minutes | ||||
U+0890..0891 | 2 | L2/20-245 | Hosny, Khaled; Pournader, Roozbeh (2020-09-09), Proposal to encode three Arabic symbols | ||
L2/20-250 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-10-01), "5a. Three Symbols", Recommendations to UTC #165 October 2020 on Script Proposals | ||||
L2/20-237 | Moore, Lisa (2020-10-27), "Consensus 165-C15", UTC #165 Minutes | ||||
U+0898..089C | 5 | L2/20-089 | Syarifuddin, M. Mahali (2020-02-28), Proposal to Encode Characters from Indonesian Orthography of Quran | ||
L2/20-105 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3c. Indonesian Orthography of Quran", Recommendations to UTC #163 April 2020 on Script Proposals | ||||
L2/20-102 | Moore, Lisa (2020-05-06), "Consensus 163-C14", UTC #163 Minutes | ||||
16.0 | U+0897 | 1 | L2/22-116 | Sh., Rikza F. (2022-05-22), Proposal to Encode Four Pegon Characters | |
L2/22-128 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Constable, Peter (2022-07-20), "4b Pegon", Recommendations to UTC #172 July 2022 on Script Proposals | ||||
L2/22-121 | Constable, Peter (2022-08-01), "D.1.4b Four Arabic Pegon Characters", Draft Minutes of UTC Meeting 172 | ||||
L2/23-157 | Constable, Peter (2023-07-31), "Consensus 176-C17", UTC #176 Minutes, The UTC approves the change of canonical combining class for U+0897 ARABIC PEPET to ccc=230, from ccc=0 | ||||
|
A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.
Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.
Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.
Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.
Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.
Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.
Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.
Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits.
Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.
Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.
Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.
Georgian is a Unicode block containing the Mkhedruli and Asomtavruli Georgian characters used to write Modern Georgian, Svan, and Mingrelian languages. Another lower case, Nuskhuri, is encoded in a separate Georgian Supplement block, which is used with the Asomtavruli to write the ecclesiastical Khutsuri Georgian script.
Myanmar is a Unicode block containing characters for the Burmese, Mon, Shan, Palaung, and the Karen languages of Myanmar, as well as the Aiton and Phake languages of Northeast India. It is also used to write Pali and Sanskrit in Myanmar.
Myanmar Extended-A is a Unicode block containing Myanmar characters for writing the Khamti Shan and Aiton languages.
CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.
Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.
Georgian Extended is a Unicode block containing Georgian Mtavruli letters that function as uppercase versions of their Mkhedruli counterparts in the Georgian block. Unlike all other casing scripts in Unicode, there is no title casing between Mkhedruli and Mtavruli letters, because Mtavruli is typically used only in all-caps text, although there have been some historical attempts at capitalization.
Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey.
Devanagari Extended-A is a Unicode block containing characters for auspicious signs from Indian inscriptions and manuscripts from the 11th century onward.