Arabic Presentation Forms-A

Last updated
Arabic Presentation Forms-A
RangeU+FB50..U+FDFF
(688 code points)
Plane BMP
Scripts Arabic (629 char.)
Common (2 char.)
Major alphabetsCentral Asian languages
Pashto
Persian
Kurdish
Sindhi
Urdu
Symbol setscontextual forms
multi-letter and
word ligatures
Assigned631 code points
Unused25 reserved code points
32 non-characters
Unicode version history
1.1 (1993)593 (+593)
3.2 (2002)594 (+1)
4.0 (2003)595 (+1)
6.0 (2010)611 (+16)
14.0 (2021)631 (+20)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]
This range was initially part of the Private Use Area in Unicode 1.0.0, [3] and removed from it in Unicode 1.0.1. [4]

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Contents

The presentation forms are present only for compatibility with older standards such as codepage 864 used in DOS, and are typically used in visual and not logical order. [5] It has been agreed no further presentation forms will be encoded; though the block still sees further encodings including a contiguous range of 32 noncharacters. [6]

Block

Arabic Presentation Forms-A [1] [2] [3]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+FB5x
U+FB6x
U+FB7xﭿ
U+FB8x
U+FB9x
U+FBAx
U+FBBx﮿
U+FBCx
U+FBDx
U+FBEx
U+FBFxﯿ
U+FC0x
U+FC1x
U+FC2x
U+FC3xﰿ
U+FC4x
U+FC5x
U+FC6x
U+FC7xﱿ
U+FC8x
U+FC9x
U+FCAx
U+FCBxﲿ
U+FCCx
U+FCDx
U+FCEx
U+FCFxﳿ
U+FD0x
U+FD1x
U+FD2x
U+FD3x﴿
U+FD4x
U+FD5x
U+FD6x
U+FD7xﵿ
U+FD8x
U+FD9x
U+FDAx
U+FDBxﶿ
U+FDCx
U+FDDx
U+FDEx
U+FDFx﷿
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points
3. ^ Black areas indicate noncharacters (code points that are guaranteed never to be assigned as encoded characters in the Unicode Standard)

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic Presentation Forms-A block:

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

The byte-order mark (BOM) is a particular usage of the special Unicode character code, U+FEFFZERO WIDTH NO-BREAK SPACE, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

The Unicode Standard assigns various properties to each Unicode character and code point.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. "3.5: Private Use Area" (PDF). The Unicode Standard, Version 1.0, Volume 1. Unicode Consortium. 1991. pp. 118–119. ISBN   0-201-56788-1.
  4. "Version 1.0.1 Notice Page" (PDF). Unicode Consortium.
  5. The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The Unicode Consortium, 2011. ISBN   978-1-936213-01-6), Chapter 8
  6. "Private-Use Characters, Noncharacters & Sentinels FAQ". www.unicode.org. Retrieved 2023-07-24.