Arabic (Unicode block)

Last updated
Arabic
RangeU+0600..U+06FF
(256 code points)
Plane BMP
Scripts Arabic (238 char.)
Common (6 char.)
Inherited (12 char.)
Major alphabets Arabic
Kurdish
Pashto
Persian
Urdu
Sindhi
Assigned256 code points
Unused0 reserved code points
1 deprecated
Source standards ISO 8859-6
Unicode version history
1.0.0 (1991)169 (+169)
1.1 (1993)194 (+25)
3.0 (1999)206 (+12)
3.2 (2002)208 (+2)
4.0 (2003)227 (+19)
4.1 (2005)235 (+8)
5.1 (2008)250 (+15)
6.0 (2010)252 (+2)
6.1 (2012)253 (+1)
6.3 (2013)254 (+1)
7.0 (2014)255 (+1)
14.0 (2021)256 (+1)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]
Unicode block Arabic.jpg UCB Arabic.png
Unicode block Arabic.jpg

Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits. [3]

Contents

Block

Arabic [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+060x ؀  ؁  ؂  ؃  ؄  ؅ ؆؇؈؉؊؋،؍؎؏
U+061xؘؙؚؐؑؒؓؔؕؖؗ؛ ALM ؝؞؟
U+062xؠءآأؤإئابةتثجحخد
U+063xذرزسشصضطظعغػؼؽؾؿ
U+064xـفقكلمنهوىيًٌٍَُ
U+065xِّْٕٖٜٟٓٔٗ٘ٙٚٛٝٞ
U+066x٠١٢٣٤٥٦٧٨٩٪٫٬٭ٮٯ
U+067xٰٱٲٳٴٵٶٷٸٹٺٻټٽپٿ
U+068xڀځڂڃڄڅچڇڈډڊڋڌڍڎڏ
U+069xڐڑڒړڔڕږڗژڙښڛڜڝڞڟ
U+06Axڠڡڢڣڤڥڦڧڨکڪګڬڭڮگ
U+06Bxڰڱڲڳڴڵڶڷڸڹںڻڼڽھڿ
U+06Cxۀہۂۃۄۅۆۇۈۉۊۋیۍێۏ
U+06Dxېۑےۓ۔ەۖۗۘۙۚۛۜ ۝ ۞۟
U+06Exۣ۠ۡۢۤۥۦۧۨ۩۪ۭ۫۬ۮۯ
U+06Fx۰۱۲۳۴۵۶۷۸۹ۺۻۼ۽۾ۿ
Notes
1. ^ As of Unicode version 15.1
2. ^ Unicode code point U+0673 is deprecated as of Unicode version 6.0

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic block:

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 of the standard defines 154998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its block name in Unicode 1.0 was Generic Diacritical Marks.

Control Pictures is a Unicode block containing characters for graphically representing the C0 control codes, and other control characters. Its block name in Unicode 1.0 was Pictures for Control Codes.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs submitted to the Ideographic Research Group between 1992 and 1998, plus ten ideographs added in Unicode 13.0 which had previously been mistakenly unified with others.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block, added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2002 and 2006, plus five "urgently needed" characters added in Unicode versions 14.0 and 15.0, some of which had previously been mistakenly unified with other characters.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Arabic Extended-B is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages. The block also includes currency symbols and an abbreviation mark.

Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The Unicode Consortium, 2011. ISBN   978-1-936213-01-6), Chapter 8