Arabic (Unicode block)

Last updated
Arabic
RangeU+0600..U+06FF
(256 code points)
Plane BMP
Scripts Arabic (238 char.)
Common (6 char.)
Inherited (12 char.)
Major alphabets Arabic
Kurdish
Pashto
Persian
Urdu
Sindhi
Assigned256 code points
Unused0 reserved code points
1 deprecated
Source standards ISO 8859-6
Unicode version history
1.0.0 (1991)169 (+169)
1.1 (1993)194 (+25)
3.0 (1999)206 (+12)
3.2 (2002)208 (+2)
4.0 (2003)227 (+19)
4.1 (2005)235 (+8)
5.1 (2008)250 (+15)
6.0 (2010)252 (+2)
6.1 (2012)253 (+1)
6.3 (2013)254 (+1)
7.0 (2014)255 (+1)
14.0 (2021)256 (+1)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]
Unicode block Arabic.jpg UCB Arabic.png
Unicode block Arabic.jpg

Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits. [3]

Contents

Unicode chart Arabic

Arabic [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+060x ؀  ؁  ؂  ؃  ؄  ؅ ؆؇؈؉؊؋،؍؎؏
U+061xؘؙؚؐؑؒؓؔؕؖؗ؛ ALM ؝؞؟
U+062xؠءآأؤإئابةتثجحخد
U+063xذرزسشصضطظعغػؼؽؾؿ
U+064xـفقكلمنهوىيًٌٍَُ
U+065xِّْٕٖٜٟٓٔٗ٘ٙٚٛٝٞ
U+066x٠١٢٣٤٥٦٧٨٩٪٫٬٭ٮٯ
U+067xٰٱٲٳٴٵٶٷٸٹٺٻټٽپٿ
U+068xڀځڂڃڄڅچڇڈډڊڋڌڍڎڏ
U+069xڐڑڒړڔڕږڗژڙښڛڜڝڞڟ
U+06Axڠڡڢڣڤڥڦڧڨکڪګڬڭڮگ
U+06Bxڰڱڲڳڴڵڶڷڸڹںڻڼڽھڿ
U+06Cxۀہۂۃۄۅۆۇۈۉۊۋیۍێۏ
U+06Dxېۑےۓ۔ەۖۗۘۙۚۛۜ ۝ ۞۟
U+06Exۣ۠ۡۢۤۥۦۧۨ۩۪ۭ۫۬ۮۯ
U+06Fx۰۱۲۳۴۵۶۷۸۹ۺۻۼ۽۾ۿ
Notes
1. ^ As of Unicode version 16.0
2. ^ Unicode code point U+0673 is deprecated as of Unicode version 6.0

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Arabic block:

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 of the standard defines 154998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts.

The at sign, @, is an accounting and invoice abbreviation meaning "at a rate of", now seen more widely in email addresses and social media platform handles. It is normally read aloud as "at" and is also commonly called the at symbol, commercial at, or address sign.

Miscellaneous Symbols is a Unicode block (U+2600–U+26FF) containing glyphs representing concepts from a variety of categories: astrological, astronomical, chess, dice, musical notation, political symbols, recycling, religious symbols, trigrams, warning signs, and weather, among others.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the standard. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. They are intentionally left undefined so that third parties may assign their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF, containing these code points:

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Arabic Extended-A is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

Tags is a Unicode block containing formatting tag characters. The block is designed to mirror ASCII. It was originally intended for language tags, but has now been repurposed as emoji modifiers, specifically for region flags.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Emoticons is a Unicode block containing emoticons or emoji. Most of them are intended as representations of faces, although some of them include hand gestures or non-human characters.

Arabic Extended-B is a Unicode block encoding Qur'anic annotations and letter variants used for various non-Arabic languages. The block also includes currency symbols and an abbreviation mark.

Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey.

References

  1. "Unicode character database". The Unicode Standard. Archived from the original on 2021-05-07. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Archived from the original on 2016-06-29. Retrieved 2023-07-26.
  3. The Unicode Consortium. The Unicode Standard, Version 6.0.0 Archived 2022-03-06 at the Wayback Machine , (Mountain View, CA: The Unicode Consortium, 2011. ISBN   978-1-936213-01-6), Chapter 8 Archived 2018-05-22 at the Wayback Machine