Arabic Extended-B

Last updated
Arabic Extended-B
RangeU+0870..U+089F
(48 code points)
Plane BMP
Scripts Arabic
Major alphabetsBosnian
Javanese
Sorabe
Sundanese
Assigned41 code points
Unused7 reserved code points
Unicode version history
14.0 (2021)41 (+41)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Arabic Extended-B is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages. The block also includes currency symbols and an abbreviation mark. [3]

Contents

Block

Arabic Extended-B [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+087x
U+088x
U+089x    
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic Extended-B block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
14.0U+0870..0888, 089D..089F28 L2/19-306 N5142 Pournader, Roozbeh; Anderson, Deborah (2019-09-29), Arabic additions for Quranic orthographies
L2/19-343 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2019-10-06), "a. Additions for Quranic orthographies", Recommendations to UTC #161 October 2019 on Script Proposals
L2/19-323 Moore, Lisa (2019-10-01), "Consensus 161-C4", UTC #161 Minutes
L2/20-105 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3f. Comments on L2/19-306", Recommendations to UTC #163 April 2020 on Script Proposals
U+0889..088A2 L2/19-339 Jacquerye, Denis Moyogo (2019-10-03), Proposal to encode Bosnian Arabic characters
L2/19-343 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2019-10-06), "d. Bosnian Arabic characters", Recommendations to UTC #161 October 2019 on Script Proposals
L2/19-323 Moore, Lisa (2019-10-01), "C.6.5", UTC #161 Minutes
U+088B..088D3 L2/19-340 Jacquerye, Denis Moyogo (2019-10-03), Proposal to encode Javanese and Sundanese Arabic characters
L2/19-323 Moore, Lisa (2019-10-01), "C.6.6", UTC #161 Minutes
U+088E1 L2/20-071R Pournader, Roozbeh; Izadpanah, Borna (2020-05-01), Proposal to encode an Arabic tail character used for abbreviation
L2/20-105 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3b. Arabic Tail Character", Recommendations to UTC #163 April 2020 on Script Proposals
L2/20-102 Moore, Lisa (2020-05-06), "Consensus 163-C26", UTC #163 Minutes
U+0890..08912 L2/20-245 Hosny, Khaled; Pournader, Roozbeh (2020-09-09), Proposal to encode three Arabic symbols
L2/20-250 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-10-01), "5a. Three Symbols", Recommendations to UTC #165 October 2020 on Script Proposals
L2/20-237 Moore, Lisa (2020-10-27), "Consensus 165-C15", UTC #165 Minutes
U+0898..089C5 L2/20-089 Syarifuddin, M. Mahali (2020-02-28), Proposal to Encode Characters from Indonesian Orthography of Quran
L2/20-105 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3c. Indonesian Orthography of Quran", Recommendations to UTC #163 April 2020 on Script Proposals
L2/20-102 Moore, Lisa (2020-05-06), "Consensus 163-C14", UTC #163 Minutes
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

<span class="mw-page-title-main">Arabic (Unicode block)</span> Unicode character block

Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits.

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

Georgian is a Unicode block containing the Mkhedruli and Asomtavruli Georgian characters used to write Modern Georgian, Svan, and Mingrelian languages. Another lower case, Nuskhuri, is encoded in a separate Georgian Supplement block, which is used with the Asomtavruli to write the ecclesiastical Khutsuri Georgian script.

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Variation Selectors is a Unicode block containing 16 variation selectors used to specify a glyph variant for a preceding character. They are currently used to specify standardized variation sequences for mathematical symbols, emoji symbols, 'Phags-pa letters, and CJK unified ideographs corresponding to CJK compatibility ideographs. At present only standardized variation sequences with VS1, VS2, VS3, VS15 and VS16 have been defined; VS15 and VS16 are reserved to request that a character should be displayed as text or as an emoji respectively.

Georgian Extended is a Unicode block containing Georgian Mtavruli letters that function as uppercase versions of their Mkhedruli counterparts in the Georgian block. Unlike all other casing scripts in Unicode, there is no title casing between Mkhedruli and Mtavruli letters, because Mtavruli is typically used only in all-caps text, although there have been some historical attempts at capitalization.

Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey.

Devanagari Extended-A is a Unicode block containing characters for auspicious signs from Indian inscriptions and manuscripts from the 11th century onward.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. The Unicode Consortium. The Unicode Standard, Version 14.0.0, (Mountain View, CA: The Unicode Consortium, 2011. ISBN   978-1-936213-29-0), Chapter 9