Arabic Presentation Forms-B

Last updated
Arabic Presentation Forms-B
RangeU+FE70..U+FEFF
(144 code points)
Plane BMP
Scripts Arabic (140 char.)
Common (1 char.)
Symbol setscontextual and isolate forms of Arabic letters and points
Assigned141 code points
Unused3 reserved code points
Unicode version history
1.0.0 (1991)140 (+140)
3.2 (2002)141 (+1)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]
The characters in this block were re-ordered in Unicode 1.0.1, in the process of merging with ISO/IEC 10646. [3]

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP (zero width no-break space) is also here, which is only meant for a byte order mark (that may precede text, Arabic or not, or be absent). [note 1] The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; [5] its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1. [3]

Contents

The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text. [6]

Block

Arabic Presentation Forms-B [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+FE7xﹿ
U+FE8x
U+FE9x
U+FEAx
U+FEBxﺿ
U+FECx
U+FEDx
U+FEEx
U+FEFxZW
NBSP
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic Presentation Forms-B block:

Version Final code points [lower-alpha 1] Count UTC  ID L2  ID WG2  IDDocument
1.0.0U+FE70..FE72, FE74, FE76..FE7F14 UTC/1991-048B Whistler, Ken (1991-03-27), "14 addtional[sic] Arabic spacing diacritics", Draft Minutes from the UTC meeting #46 day 2, 3/27 at Apple
U+FE80..FEFC125(to be determined)
U+FEFF1UTC/1991-054Whistler, Ken, FF Proposal
UTC/1991-048B Whistler, Ken (1991-03-27), "III.I.4", Draft Minutes from the UTC meeting #46 day 2, 3/27 at Apple
L2/05-137 Freytag, Asmus (2005-05-10), Handling "defective" names
L2/05-108R Moore, Lisa (2005-08-26), "Consensus 103-C7", UTC #103 Minutes, Create a "Normative Name Alias" property and file in the UCD. Populate the property with names from the sections "Typos" and "Bad or misleading names" from document L2/05-137.
3.2U+FE731 L2/01-069 Davis, Mark (2001-01-29), Proposal Summary Form for Arabic character tail for final Seen family (Seen, Sheen, Saad, Daad)
L2/01-095 N2322 Umamaheswaran, V. S. (2001-02-05), Proposal to add "Arabic Tail Fragment" character
L2/01-012R Moore, Lisa (2001-05-21), "Motion 86-M30", Minutes UTC #86 in Mountain View, Jan 2001, Accept the addition of the character ARABIC TAIL FRAGMENT at U+FE73, with the properties of an extender.
L2/01-344 N2353 (pdf, doc)Umamaheswaran, V. S. (2001-09-09), "7.9", Minutes from SC2/WG2 meeting #40 -- Mountain View, April 2001
  1. Proposed code points and characters names may differ from final code points and names.

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts. Many common characters, including numerals, punctuation, and other symbols, are unified within the standard and are not treated as specific to any given writing system. Unicode encodes thousands of emoji, with the continued development thereof conducted by the Consortium as a part of the standard. Moreover, the widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan. Unicode is ultimately capable of encoding more than 1.1 million characters.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not explicitly categorize these characters as being "letterlike."

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

The Unicode Standard assigns various properties to each Unicode character and code point.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits.

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text," and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. 1 2 "Unicode 1.0.1 Addendum" (PDF). The Unicode Standard. 1992-11-03. Retrieved 2016-07-09.
  4. "Layout Controls" (PDF). The Unicode Standard, Version 12.0.0. The Unicode Consortium. p. 871.
  5. "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.
  6. The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The Unicode Consortium, 2011. ISBN   978-1-936213-01-6), Chapter 8

Notes

  1. As the name suggests, it was also used to prohibit line breaks at its position, but this usage was deprecated in Unicode 3.2. [4]