Arabic Presentation Forms-A

Last updated
Arabic Presentation Forms-A
RangeU+FB50..U+FDFF
(688 code points)
Plane BMP
Scripts Arabic (629 char.)
Common (2 char.)
Major alphabetsCentral Asian languages
Pashto
Persian
Kurdish
Sindhi
Urdu
Symbol setscontextual forms
multi-letter and
word ligatures
Assigned631 code points
Unused25 reserved code points
32 non-characters
Unicode version history
1.1 (1993)593 (+593)
3.2 (2002)594 (+1)
4.0 (2003)595 (+1)
6.0 (2010)611 (+16)
14.0 (2021)631 (+20)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]
This range was initially part of the Private Use Area in Unicode 1.0.0, [3] and removed from it in Unicode 1.0.1. [4]

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Contents

The presentation forms are present only for compatibility with older standards such as codepage 864 used in DOS, and are typically used in visual and not logical order. [5] It has been agreed no further presentation forms will be encoded; though the block still sees further encodings including a contiguous range of 32 noncharacters. [6]

Block

Arabic Presentation Forms-A [1] [2] [3]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+FB5x
U+FB6x
U+FB7xﭿ
U+FB8x
U+FB9x
U+FBAx
U+FBBx﮿
U+FBCx
U+FBDx
U+FBEx
U+FBFxﯿ
U+FC0x
U+FC1x
U+FC2x
U+FC3xﰿ
U+FC4x
U+FC5x
U+FC6x
U+FC7xﱿ
U+FC8x
U+FC9x
U+FCAx
U+FCBxﲿ
U+FCCx
U+FCDx
U+FCEx
U+FCFxﳿ
U+FD0x
U+FD1x
U+FD2x
U+FD3x﴿
U+FD4x
U+FD5x
U+FD6x
U+FD7xﵿ
U+FD8x
U+FD9x
U+FDAx
U+FDBxﶿ
U+FDCx
U+FDDx
U+FDEx
U+FDFx﷿
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points
3. ^ Black areas indicate noncharacters (code points that are guaranteed never to be assigned as encoded characters in the Unicode Standard)

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic Presentation Forms-A block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
1.1U+FB50..FBB1, FBD3..FD3F, FD50..FD8F, FD92..FDC7, FDF0..FDFB593(to be determined)
L2/06-008R2 Moore, Lisa (2006-02-13), "Motion 106-M3", UTC #106 Minutes, Drop U+FD3E ORNATE LEFT PARENTHESIS and U+FD3F ORNATE RIGHT PARENTHESIS from the list of characters with Bidi Mirrored property proposed in Public Review Issue 80.
L2/14-026 Moore, Lisa (2014-02-17), "Consensus 138-C21", UTC #138 Minutes, Change the General Category and linebreak properties of U+FD3E LEFT ORNATE PARENTHESIS to gc=Pe and lb=CL; and change General Category and linebreak properties of U+FD3F RIGHT ORNATE PARENTHESIS to gc=Ps and lb=OP, in Unicode 7.0.
L2/20-289 N5155 Evans, Lorna Priest (2020-12-07), Request for glyph changes and annotations for Kazakh, Kyrgyz, and Uyghur [Affects U+FBD7-FBD8, U+FBDD, and U+FBE0-FBE1]
L2/21-016R Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2021-01-14), "11a. Glyph changes and annotations for Kazakh, Kyrgyz, and Uyghur", Recommendations to UTC #166 January 2021 on Script Proposals
L2/21-009 Moore, Lisa (2021-01-27), "B.1 — 11a", UTC #166 Minutes
3.1U+FDD0..FDEF32 L2/00-187 Moore, Lisa (2000-08-23), "Not a Character", UTC minutes -- Boston, August 8-11, 2000
L2/00-341 N2277 Addition of reserved characters for internal processing uses, 2000-09-19
L2/01-050 N2253 Umamaheswaran, V. S. (2001-01-21), "7.20 Proposal for Reserved Positions for Processing Purposes", Minutes of the SC2/WG2 meeting in Athens, September 2000
3.2U+FDFC1 L2/01-148R Pournader, Roozbeh (2001-04-07), Proposal: Arabic Ligature Rial
L2/01-184R Moore, Lisa (2001-06-18), "Motion 87-M6", Minutes from the UTC/L2 meeting
L2/01-354 N2373 Pournader, Roozbeh (2001-09-20), Proposal: Arabic Currency Sign Rial
L2/02-154 N2403 Umamaheswaran, V. S. (2002-04-22), "7.8", Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore, 2001-10-15/19
4.0U+FDFD1 L2/02-005 Hussain, Sarmad; Afzal, Muhammad (2001-12-18), Urdu Computing Standards (Charts and Exhibits)
L2/02-006 (pdf, doc) N2413-1 Zia, Khaver (2002-01-10), Towards Unicode Standard for Urdu
L2/02-003 N2413-2 Afzal, Muhammad; Hussain, Sarmad (2001-12-28), Urdu Computing Standards: Development of Urdu Zabta Takhti (UZT) 1.01
L2/02-004 N2413-3 Hussain, Sarmad; Afzal, Muhammad (2001-12-28), Urdu Computing Standards: Urdu Zabta Takhti (UZT) 1.01
L2/02-163 N2413-4 (pdf, doc)Proposal to add Marks and Digits in Arabic Code Block (for Urdu), 2002-04-30
L2/02-011R Kew, Jonathan (2002-01-12), Comments on L2/02-006: Towards Unicode Standard for Urdu
L2/02-197 Freytag, Asmus (2002-05-01), Urdu Feedback from Bidi Committee
L2/02-166R2 Moore, Lisa (2002-08-09), "Motion 91-M3", UTC #91 Minutes
L2/02-372 N2453 (pdf, doc)Umamaheswaran, V. S. (2002-10-30), "7.9 Urdu contribution", Unconfirmed minutes of WG 2 meeting 42
L2/02-466 N2567 Everson, Michael; Pournader, Roozbeh (2002-12-09), Towards resolution on the name of U+FDFD
L2/02-467 N2568 Everson, Michael; Pournader, Roozbeh; Hussain, Sarmad; Afzal, Muhammad (2002-12-10), Consensus on the name of U+FDFD
L2/04-196 N2653 (pdf, doc)Umamaheswaran, V. S. (2004-06-04), "a-3", Unconfirmed minutes of WG 2 meeting 44
6.0U+FBB2..FBC116L2/98-274 Davis, Mark; Mansour, Kamal (1998-07-28), Proposed Arabic Script Additions for Minority Languages
L2/98-409 Davis, Mark; Mansour, Kamal (1998-12-01), Proposal to add 25 Arabic characters to the BMP
L2/98-419 (pdf, doc)Aliprand, Joan (1999-02-05), "Additional Arabic characters", Approved Minutes -- UTC #78 & NCITS Subgroup L2 # 175 Joint Meeting, San Jose, CA -- December 1-4, 1998
L2/02-021 Davis, Mark; Mansour, Kamal (2002-01-17), Proposal To Amend Arabic repertoire
L2/03-154 Kew, Jonathan; Mansour, Kamal; Davis, Mark (2003-05-16), Proposal to encode productive Arabic-script modifier marks
L2/06-039 N3460-A Durrani, Attash (2006-01-29), Preliminary Proposal to add Nuqta Characters to Arabic Block
L2/06-240 Kew, Jonathan (2006-07-19), Letter to Dr. Durrani
L2/06-322 Durrani, Attash (2006-10-04), Letter to Jonathan Kew re Nuqtas
L2/07-094 Durrani, Attash (2007-04-03), Regarding Nuqta Characters
L2/07-174 Durrani, Attash (2007-05-14), The Case Folding Solution for the Arabic Script
L2/08-159 Durrani, Attash; Mansour, Kamal; McGowan, Rick (2008-04-18), Proposal to Encode 22 Characters for Arabic Pedagogical Use
L2/08-230 Anderson, Deborah (2008-05-23), Comments on Proposal to Encode 22 Characters for Arabic Pedagogical Use
L2/08-159R N3460R Durrani, Attash; Mansour, Kamal; McGowan, Rick (2008-06-24), Proposal to Encode 16 Characters for Arabic Pedagogical Use
L2/08-161R2 Moore, Lisa (2008-11-05), "Motion 115-M3", UTC #115 Minutes
L2/08-412 N3553 (pdf, doc)Umamaheswaran, V. S. (2008-11-05), "M53.19", Unconfirmed minutes of WG 2 meeting 53
L2/08-361 Moore, Lisa (2008-12-02), "Consensus 117-C26", UTC #117 Minutes
L2/09-011 Pournader, Roozbeh (2009-01-13), Consistent naming and better properties for Arabic Pedagogical Symbols
L2/09-110 N3606 Pandey, Anshuman (2009-03-30), Proposal to Advance the Renaming of Arabic Pedagogical Symbols
L2/09-234 N3603 (pdf, doc)Umamaheswaran, V. S. (2009-07-08), "M54.06a", Unconfirmed minutes of WG 2 meeting 54
L2/09-104 Moore, Lisa (2009-05-20), "Consensus 119-C25", UTC #119 / L2 #216 Minutes
14.0U+FBC21 L2/19-306 N5142 Pournader, Roozbeh; Anderson, Deborah (2019-09-29), Arabic additions for Quranic orthographies
L2/19-343 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai (2019-10-06), "a. Additions for Quranic orthographiesFD4C:c. Arabic honorifics", Recommendations to UTC #161 October 2019 on Script Proposals
L2/19-323 Moore, Lisa (2019-10-01), "Consensus 161-C4", UTC #161 Minutes
L2/20-105 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3f. Comments on L2/19-306", Recommendations to UTC #163 April 2020 on Script Proposals
U+FD40..FD4B, FDFE..FDFF14 L2/14-147 Pournader, Roozbeh (2014-07-27), Proposal to encode seventeen Arabic honorifics
L2/14-170 Anderson, Deborah; Whistler, Ken; McGowan, Rick; Pournader, Roozbeh; Iancu, Laurențiu (2014-07-28), "5. L2/14‐147", Recommendations to UTC #140 August 2014 on Script Proposals
L2/19-289R Pournader, Roozbeh; Jibaly, Mustafa (2019-07-26), Proposal to encode fourteen Arabic honorifics
L2/19-270 Moore, Lisa (2019-10-07), "Consensus 160-C25", UTC #160 Minutes
U+FD4C..FD4D2 L2/19-319 Pournader, Roozbeh; Jibaly, Mustafa (2019-09-29), Proposal to encode two more Arabic honorifics
L2/19-323 Moore, Lisa (2019-10-01), "Consensus 161-C3", UTC #161 Minutes
U+FD4E..FD4F2 L2/20-042 Pournader, Roozbeh; Hooshdaran, Soheil; Jibaly, Mustafa (2020-01-15), Proposal to encode yet two more Arabic honorifics
L2/20-015R Moore, Lisa (2020-05-14), "C.5.3", Draft Minutes of UTC Meeting 162
U+FDCF1 L2/20-081 Pournader, Roozbeh; Evans, Lorna (2020-03-10), Proposal to encode an Arabic honorific used in Christian texts
L2/20-105 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Constable, Peter; Liang, Hai (2020-04-20), "3a. Arabic Honorific", Recommendations to UTC #163 April 2020 on Script Proposals
L2/20-102 Moore, Lisa (2020-05-06), "Consensus 163-C13", UTC #163 Minutes
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFFZERO WIDTH NO-BREAK SPACE, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

The Unicode Standard assigns various properties to each Unicode character and code point.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. "3.5: Private Use Area" (PDF). The Unicode Standard, Version 1.0, Volume 1. Unicode Consortium. 1991. pp. 118–119. ISBN   0-201-56788-1.
  4. "Version 1.0.1 Notice Page" (PDF). Unicode Consortium.
  5. The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The Unicode Consortium, 2011. ISBN   978-1-936213-01-6), Chapter 8
  6. "Private-Use Characters, Noncharacters & Sentinels FAQ". www.unicode.org. Retrieved 2023-07-24.