Arabic Extended-C

Last updated
Arabic Extended-C
RangeU+10EC0..U+10EFF
(64 code points)
Plane SMP
Scripts Arabic
Assigned3 code points
Unused61 reserved code points
Unicode version history
15.0 (2022)3 (+3)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey. [3] [4]

Contents

Block

Arabic Extended-C [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+10ECx
U+10EDx
U+10EEx
U+10EFx𐻽𐻾𐻿
Notes
1. ^ As of Unicode version 15.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic Extended-C block:

Version Final code points [lower-alpha 1] Count L2  IDDocument
15.0U+10EFD..10EFF3 L2/21-133 Shaikh, Lateef Sagar (2021-06-24), Proposal to encode Quranic marks used in Turkey
L2/21-130 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Liang, Hai (2021-07-26), "6a. Quranic Marks used in Turkey", Recommendations to UTC #168 July 2021 on Script Proposals
L2/21-123 Cummings, Craig (2021-08-03), "Consensus 168-C22", Draft Minutes of UTC Meeting 168
L2/21-181 Pournader, Roozbeh (2021-08-25), Allocating Arabic Extended-C in SMP and Arabic code point changes
L2/21-174 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Liang, Hai (2021-10-01), "4a Arabic Allocation in SMP", Recommendations to UTC #169 October 2021 on Script Proposals
L2/21-167 Cummings, Craig (2022-01-27), "Consensus 169-C2", Approved Minutes of UTC Meeting 169, Move the following characters -- U+0895 ARABIC SMALL LOW WORD SAKTA to U+10EFD U+0896 ARABIC SMALL LOW WORD QASR to U+10EFE U+0897 ARABIC SMALL LOW WORD MADDA to U+10EFF
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.

The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFFZERO WIDTH NO-BREAK SPACE, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

As of Unicode version 15.0, Cyrillic script is encoded across several blocks:

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista). Latin Extended-F and -G contain characters for phonetic transcription.

In computing, a Unicode symbol is a Unicode character which is not part of a script used to write a natural language, but is nonetheless available for use as part of a text.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 15.0, five of the planes have assigned code points (characters), and seven are named.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

Unicode contains a number of characters that represent various cultural, political, and religious symbols. Most, but not all, of these symbols are in the Miscellaneous Symbols block.

The Unicode Standard assigns various properties to each Unicode character and code point.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

Arabic Extended-B is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages. The block also includes currency symbols and an abbreviation mark.

References

  1. "Unicode character database". The Unicode Standard.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2022-09-14.
  3. "L2/21-133: Proposal to encode Quranic marks used in Turkey" (PDF). 2021-06-24.
  4. The Unicode Standard (PDF). 15.0.0. The Unicode Consortium. 2022. ISBN   978-1-936213-32-0.