List of Arabic letter components

Last updated

This is a list of Arabic letter components used in Arabic script.

Contents

Table of Letter Components

A = The letter is used for most languages and dialects with writing systems based on Arabic.

MSA = Letters used in Modern Standard Arabic.

CA = Letters used in Classical Arabic.

AD = Letters used in some regional Arabic Dialects.

"Arabic" = Letters used in Classical Arabic, Modern Standard Arabic, and most regional dialects.

"Farsi" = Letters used in modern Persian.

FW = Foreign words: the letter is sometimes used to spell foreign words.

SV = Stylistic variant: the letter is used interchangeably with at least one other letter depending on the calligraphic style.

AW = Arabic words: the letter is used in additional languages to spell Arabic words.


Table

No additions

Letter Line Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا
positional formsisolated ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا
start ء ا none ىـ (YNB) ٮـ حـسـصـطـعـڡـ (QF) ٯـڪـكـکـگـلـمـدروھـہـهـلا
middle ء ـا none ـىـ (YNB) ـٮـ ـحــســصــطــعــڡـ (QF) ـٯــڪــكــکــگــلــمــدـرـوـھــہــهــلا
end ء ـاـےـىـںـٮـحـسـصـطـعـڡـٯـڪـكـکـگـلـمـدـرـوـھـہـهـلا
image of formsisolated
start
middle
end
ءاےىںٮحسصطعڡٯڪكکگلمدروھہهلا
Unicode for above U+0621 U+0627 U+06D2 U+0649 U+06BA U+066E U+062D U+0633 U+0635 U+0637 U+0639 U+06A1 U+066F U+06AA U+0643 U+06A9 U+06AF U+0644 U+0645 U+062F U+0631 U+0648 U+06BE U+06C1 U+0647 ا + ل
Languages that use the letter shape with or without i'jam Arabic Arabic Urdu Arabic Arabic Arabic Arabic Arabic Arabic Arabic Arabic Arabic Arabic Sindhi, Arabic SV Arabic Farsi, Urdu, Arabic SV Farsi, Urdu Arabic Arabic Arabic Arabic Arabic Urdu, Arabic SV Urdu, Arabic SV Arabic, Farsi Arabic
diacritics (i) isolated ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا
nonestart
mid
end
ءـا ا ـــے ے ىـ
ـىـ
ـى
ـں ںٮـ
ـٮـ
ـٮ
حـ
ـحـ
ـح
سـ
ـسـ
ـس
صـ
ـصـ
ـص
طـ
ـطـ
ـط
عـ
ـعـ
ـع
ڡـ
ـڡـ
ـڡ
ٯـ
ـٯـ
ـٯ
ڪــ
ـڪــ
ـڪ
كـ
ـكـ
ـك
کـ
ـکـ
ـک
گـ
ـگـ
ـگ
لـ
ـلـ
ـل
مـ
ـمـ
ـم
ـد دـر رـو وھـ
ـھـ
ـھ
ہـ
ـہـ
ـہ
هـ
ـهـ
ـه
ـلا لا
Languages using the bare shape with no additions Arabic Arabic, Farsi, Urdu Urdu MSA Urdu, Arabic SVArabic SV Rasm Arabic, Farsi, Urdu Arabic, Farsi, Urdu Arabic, Farsi, Urdu Arabic, Farsi, Urdu Arabic, Farsi, Urdu Arabic SV in Rasm Arabic SV in Rasm Sindhi, Arabic SV Arabic Farsi, Urdu, Arabic SV Farsi, Urdu Arabic, Farsi, Urdu Arabic, Farsi, Urdu Arabic, Farsi, Urdu Arabic, Farsi, Urdu Arabic, Farsi, Urdu Urdu, Arabic SV Urdu, Arabic SV Arabic, Farsi MSA
Unicode for above U+0621 U+0627 U+06D2 U+0649 U+06BA U+066E U+062D U+0633 U+0635 U+0637 U+0639 U+06A1 U+066F U+06AA U+0643 U+06A9 U+06AF U+0644 U+0645 U+062F U+0631 U+0648 U+06BE U+06C1 U+0647 ل + ا
Bare line in isolated and end forms onlyisolatedءاے ی ٮحسصطع ڪكکگلمدروھہهلا
start
mid
end
یـ
ـیـ
ـی
ࢽـ
ـࢽـ
ـࢽ
ࢻـ
ـࢻـ
ـࢻ
ࢼـ
ـࢼـ
ـࢼ
image
Languages
UnicodeU+08BDU+08BBU+08BC

dots

1 dot
Diacritics (i) Letter Shapes:
(ii)
ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا
1 dot below  ◌࣭   ◌ٜ   ــٜـ     isolatedءاےىں ب ج سصطعڡٯڪكکگلمدروھہهلا
connectedبـ ـبـ ـبجـ ـجـ ـج
image
Languages
U+FBB3 U+065C Unicode U+0628 U+062C
1 dot above + 1 dot below ﮲﮳ isolatedءاےى ڹ ٮح ښ ۻ ط ۼ ڣ ٯڪكکگلمد ږ وھہهلا
connected
image
Languages
Unicode
1 dot above ◌࣪    isolatedءاےىنٮخسضظغ ف ڧ ڪكکگلمذزوھہهلا
connected
image
Languages
U+FBB2 Unicode U+0646 U+062E U+0636 U+0638 U+063A U+0641 U+06A7 U+0630 U+0632
2 dots
diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا
2 dots below
start and mid (iii)
isolated formءاے یـ ـیـ ـی ی حسصطعڡٯڪكکگلمدروھہهلا
image
Languages Farsi, Urdu, AD
U+FBB5 U+FBB5 Unicode U+06CC
2 dots below
all positions
isolated formءاے يـ ـيـ ـي ي حسصطعڡٯڪكکگلمدروھہهلا
image
Languages Arabic (iv)
U+FBB5 U+FBB5 Unicode U+064A
2 vertical dots below    isolated formءاےىںٮحسصطعڡٯڪكکگلمدروھہهلا
image
Languages
U+FBBE Unicode
2 vertical dots above    isolated formءاےىںٮحسصطعڡٯڪكکگلمدروھہهلا
image
Languages
U+FBBD Unicode
2 horizontal
dots above
isolated formءاےىںتحسصطعڡ ق ڪكکگلمدروھـۃـةلا
image
Languages
FBB4 Unicode U+062A U+0642 U+06C3 U+0629
diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا
above: 2 dots
below: 2 dots
above: 2 dots
(vertically)
below: 2 dots
(vertically)
U+08EB (2283)◌࣫TWO DOTS ABOVE
U+08EE (2286)◌࣮TWO DOTS BELOW
3 dots
ث پ چ ژ ش Arabic & Persian
ݑ ڥ ڤ ڨ ڠ ڟ ڞ ۺ ڜ ڛ څ ڿ ۑ ۋ ڮ ڴ ڷ ڸ other pointing out
ݤ ڏ ݓ ݒ ݡ ݘ ݞ inverted
3 dots below (horizontal)

    

characterءاےىں ݐ حسصطعڡٯڪكکگلمدروھہهلا
image
Languages Fula
Unicode U+0750
3 dots below (inverted)   characterءاےىںٮحسصطعڡٯڪكکگلمدروھہهلا
image
Languages
U+FBB7 Unicode
3 dots below   isolated formءاےىںپچسصطعڡٯڪكکگلمدروھہهلا
image
Languages
U+FBB9 Unicode U+067E U+0686
3 dots above
+ 3 dots below
  isolated formءاےىںٮح ڜ صطعڡٯڪكکگلمدروھہهلا
image
Languages
U+FBB6 + U+FBB9 Unicode
3 dots above   isolated formثشژ
image
Languages
U+FBB6 Unicode U+062B U+0634 U+0698
3 dots above (inverted)   characterءاےىںٮحسصطعڡٯڪكکگلمدروھہهلا
image
U+FBB8 Languages
Unicode
4 dots
4 dots below    isolated formءاےىں ڀ ڇ سصطعڡٯڪكکگلمدروھہهلا
image
Languages
U+FBBB Unicode
4 dots above    isolated formءاےىں ٿ ح ݜ صطع ڦ ٯڪكکگلم ڐ ڙ وھہهلا
image
Languages Shina, Torwali Sindhi, Shina, Torwali
U+FBBA Unicode U+067F U+075C U+0690 U+0699
different dots above and below
mixed dots
above + below
isolated formءاےىںٮ ڿ ۺ صطعڡٯڪكکگلمدروھہهلا
image
Languages
Unicode
diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا

tōē

small tōē
below
isolated formءاےىںٮ ݮ سصطعڡٯڪكکگلمدروھہهلا
connected form(s)ݮـ ـݮـ ـݮ
image U+076E Naskh style.svg
Unicode U+076E
Languages Khowar
small tōē
above
ـــؕــ     ◌ؕ isolated formءاےىڻٹݲسصطعڡٯڪكکگ  لؕ مڈڑوھہهلا
connected form(s)
image
Urdu Punjabi Torwali Languages Punjabi Urdu Torwali Punjabi [1] Urdu Urdu
U+0615 U+FBC0 Unicode U+06BB U+0679 U+0772 U+0688 U+0691
small tōē + dot(s)isolated formءاےى ݨ ٮحݰصطعڡٯڪكکگلمدݱوھہهلا
connected form(s)
image
Languages Punjabi, Seraiki, Shina Torwali Torwali
Unicode
diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا

ring

ring    isolated formءاے ؠ ںٮحسصطعڡٯڪك ګ گلم ډ ړ ۄ ھہهلا
image
Languages
FBBF Unicode
ring and dotsisolated formءاےى ڼ ټ حسصطعڡٯڪكکگلمدروھہهلا
image
Languages
Unicode
diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا

line

horizontal lineisolated formءاے ۍ ـۍ ںٮحسصطعڡ ؈ ڪك گ ݪ مد ݛ ۅ ھہهلا
image
Languages Punjabi, Marwari, Kalami Kirghiz
Unicode U+075B
multiple lines     isolated form ۽ اےىںٮحسصطعڡٯڪكکگ ۾ دروھہهلا
image
Languages
U+FBBCUnicode U+08A6
vertical lineisolated formءاےىںٮحسصطعڡٯڪكکگلمدر

ۈ

ھہهلا
image
Languages
Unicode

numeral

diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ك ڪ ک گ ل م د ر و ھ ہ ه لا
Persian numeral 2 above ٢ ٢ isolated formءا ݺ ىںٮحسصطعڡٯڪكکگلمدر ݸ ھہهلا
image
Burushaski Languages Burushaski Burushaski
Unicode U+077A
Persian numeral 3 above ٣ ٣ isolated formءا ݻ ݶ ںٮحسصطعڡٯڪكکگلمدروھہهلا
image
Burushaski Languages Burushaski
Unicode U+077B
Persian numeral 4 above ۴ ۴ isolated formءاےىںٮح ݽ صطعڡٯڪكکگلمدروھہهلا
image
Burushaski Languages Burushaski
Unicode U+077D
Persian numeral 4 below ۴ ۴ isolated formءاے ݷ ںٮ ݼ سصطعڡٯڪكکگلمدروھہهلا
image
Burushaski Languages Burushaski
Unicode U+077C
diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا

arrows

V below   ٚ     ٛ  isolated formءاےىں ݕ حسصطعڡٯڪكکگلمد ڕ وھہهلا
image
Languages Wolof
U+065B U+065A Unicode
small V above ــٚـ ◌ٚ isolated formءاے ێ ں ݖ حسصطعڡٯڪكکگ ڵ مد ڒ ۆ ھہهلا
image
Languages Wolof
U+065A Unicode U+0756
inverted V above ــٛـ ◌ٛ isolated formءاے ؽ ںٮحسصطعڡٯڪكکگلم ۮ ۯ ۉ ۿ ہهلا
image
U+065B Unicode
Languages
arrow and dotsisolated formءاےىںٮحسصطعڡٯڪكکگلمدروھہهلا
image
Unicode
Languages

Hamza

diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا
Hamza belowــٕـ◌ٕisolated formءإےىںٮحسصطعڡٯڪكکگلمدروھہهلإ
image
sometimes omitted in contexts where short vowel diacritics are omittedLanguages
U+0655 Unicode U+0625 U+0644
+ U+0625
Hamza aboveــٔـ◌ٔisolated formءأۓئںٮحسصطعڡٯڪكکگلمدرؤھۂۀلأ
image
sometimes omitted in contexts where short vowel diacritics are omittedLanguages
U+0674 U+0654 Unicode U+0623 U+06D3 U+0626 U+0624 U+06C2 U+06C0 U+0644
+ U+0623
Hamza and dotsisolated formءاےىںٮحسصطعڡٯڪكکگلمدروھہهلا
image
Unicode
Languages

other semi-optional vowels

maddah aboveــۤـ ــٓـ ◌ٓ ◌ۤisolated formءآےىںٮحسصطعڡٯڪكکگلمدروھہهلآ
image
doesn't strictly count as i'jam, but included in a lot of situations where other diacritics are left out.Languages Arabic, Urdu
U+06E4 U+0653 Unicode U+0622
Wasala aboveisolated formءٱےىںٮحسصطعڡٯڪكکگلمدروھہهلا
Wasala on screen 20200507.png
image
Alef wasala on screen 20200507w.png
Languages CA
none
(v)
Unicode U+0671
diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ڪ ك ک گ ل م د ر و ھ ہ ه لا

blank line for new entries

diacritics (i) Letter Shapes (ii) ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ك ڪ ک گ ل م د ر و ھ ہ ه لا

Footnotes

^i. The i'jam diacritic characters are illustrative only, in most typesetting the combined characters in the middle of the table are used. The characters used to illustrate the consonant diacritics are from Unicode set "Arabic pedagogical symbols". [2] The "Arabic Tatweel Modifier Letter" U+0640 character used to show the positional forms doesn't work in some Nastaliq fonts.

^ii. For most letters the isolated form is shown, for select letters all forms (isolated, start, middle, and end) are shown.

^iii. Urdu Choti Yē has 2 dots below in the initial and middle positions only. The standard Arabic version ي يـ ـيـ ـي always has 2 dots below.

^iv. These characters are used by most languages that use writing systems based on Arabic, though sometimes only in foreign words.

^v. A Wasala diacritic Unicode character has been proposed but not yet released.

Related Research Articles

<span class="mw-page-title-main">Arabic alphabet</span> Alphabets for Arabic and other languages

The Arabic alphabet, or Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is written from right-to-left in a cursive style, and includes 28 letters, of which most have contextual letterforms. The Arabic alphabet is considered an abjad, with only consonants required to be written; due to its optional use of diacritics to notate vowels, it is considered an impure abjad.

<span class="mw-page-title-main">Diacritic</span> Modifier mark added to a letter

A diacritic is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨á⟩, grave ⟨à⟩, and circumflex ⟨â⟩, are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.

The Coptic script is the script used for writing the Coptic language, the latest stage of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.

A cedilla, or cedille, is a hook or tail added under certain letters as a diacritical mark to modify their pronunciation. In Catalan, French, and Portuguese it is used only under the letter c, and the entire letter is called, respectively, c trencada, c cédille, and c cedilhado. It is used to mark vowel nasalization in many languages of sub-Saharan Africa, including Vute from Cameroon.

<span class="mw-page-title-main">Arabic diacritics</span> Diacritics used in the Arabic script

Arabic script has numerous diacritics, which include consonant pointing known as iʻjām (إِعْجَام), and supplementary diacritics known as tashkīl (تَشْكِيل). The latter include the vowel marks termed ḥarakāt.

<span class="mw-page-title-main">Romanian alphabet</span> Variant of the Latin alphabet

The Romanian alphabet is a variant of the Latin alphabet used for writing the Romanian language. It is a modification of the classical Latin alphabet and consists of 31 letters, five of which have been modified from their Latin originals for the phonetic requirements of the language:

When used as a diacritic mark, the term dot refers to the glyphs "combining dot above", and "combining dot below" which may be combined with some letters of the extended Latin alphabets in use in a variety of languages. Similar marks are used with other scripts.

A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.

<span class="mw-page-title-main">Syriac alphabet</span> Writing system

The Syriac alphabet is a writing system primarily used to write the Syriac language since the 1st century AD. It is one of the Semitic abjads descending from the Aramaic alphabet through the Palmyrene alphabet, and shares similarities with the Phoenician, Hebrew, Arabic and Sogdian, the precursor and a direct ancestor of the traditional Mongolian scripts.

<span class="mw-page-title-main">Persian alphabet</span> Writing system used for the Persian language

The Persian alphabet, also known as the Perso-Arabic script, is the right-to-left alphabet used for the Persian language. It is a variation of the Arabic alphabet with four additional letters: پ چ ژ گ. It was the basis of many Arabic-based scripts used in Central and South Asia. It is used for the Iranian and Dari standard varieties of Persian; and is one of two official writing systems for the Persian language, alongside the Cyrillic-based Tajik alphabet.

Waw is the sixth letter of the Semitic abjads, including Phoenician wāw 𐤅, Aramaic waw 𐡅, Hebrew vav ו, Syriac waw ܘ and Arabic wāw و.

Yodh is the tenth letter of the Semitic abjads, including Phoenician yōd 𐤉, Hebrew yud י, Aramaic yod 𐡉, Syriac yōḏ ܝ, and Arabic yāʾ ي. Its sound value is in all languages for which it is used; in many languages, it also serves as a long vowel, representing.

Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in a number of languages for several different purposes. The most familiar to English-language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such diacritics are also sometimes used for stylistic reasons.

<span class="mw-page-title-main">Urdu alphabet</span> Writing system used for Urdu

The Urdu alphabet is the right-to-left alphabet used for writing Urdu. It is a modification of the Persian alphabet, which itself is derived from the Arabic script. It has official status in the republics of Pakistan, India and South Africa. The Urdu alphabet has up to 39 or 40 distinct letters with no distinct letter cases and is typically written in the calligraphic Nastaʿlīq script, whereas Arabic is more commonly written in the Naskh style.

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

<span class="mw-page-title-main">Pashto alphabet</span> Writing system used for the Pashto language

The Pashto alphabet is the right-to-left abjad-based alphabet developed from the Arabic script, used for the Pashto language in Pakistan and Afghanistan. It originated in the 16th century through the works of Pir Roshan.

<span class="mw-page-title-main">Pegon script</span> Javanese-Arabic script

Pegon is a modified Arabic script used to write the Javanese, Sundanese, and Madurese languages, as an alternative to the Latin script or the Javanese script and the Old Sundanese script. It was used in a variety of applications, from religion, to diplomacy, to poetry. But today particularly, it is used for religious (Islamic) writing and poetry, particularly in writing commentaries of the Qur'an. Pegon includes letters that are not present in Modern Standard Arabic. Pegon has been studied far less than its Jawi counterpart which is used for Malay, Acehnese and Minangkabau.

<span class="mw-page-title-main">Balochi Standard Alphabet</span> Arabic script for the Balochi language

The Balochi Standard Alphabet or Balòrabi(Arabic Scripts), Balòtin(Latin Scripts), also known as Balorabi, is an abjad-based writing system developed from the Arabic script, used for the Balochi language spoken in the Balochistan region of Pakistan, Afghanistan and Iran.

Swahili Ajami script refers to the alphabet derived from Arabic script that is used for the writing of Swahili language.

References

  1. Lorna Priest Evans; M. G. Abbas Malik. "Proposal to encode ARABIC LETTER LAM WITH SMALL ARABIC LETTER TAH ABOVE in the UCS" (PDF). www.unicode.org. Retrieved 10 May 2020.
  2. "Unicode Utilities: UnicodeSet Arabic pedagogical symbols". unicode.org. Retrieved 20 March 2020.
  3. "Extended Arabic Letter". unicode.org. Retrieved 2021-10-02.
  4. "Based on ISO 8859-6". unicode.org. Retrieved 2021-10-02.