Romanization of Arabic

Last updated

The romanization of Arabic refers to the standard norms for rendering written and spoken Arabic in the Latin script in one of various systematic ways. Romanized Arabic is used for a number of different purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used moreover or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters and are used in academic settings or for the benefit of non-speakers, contrast with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.


Different systems and strategies have been developed to address the inherent problems of rendering various Arabic varieties in the Latin script. Examples of such problems are the symbols for Arabic phonemes that do not exist in English or other European languages; the means of representing the Arabic definite article, which is always spelled the same way in written Arabic but has numerous pronunciations in the spoken language depending on context; and the representation of short vowels (usually i u or e o, accounting for variations such as Muslim/Moslem or Mohammed/Muhammad/Mohamed).


Romanization is often termed "transliteration", but this is not technically correct.[ citation needed ] Transliteration is the direct representation of foreign letters using Latin symbols, while most systems for romanizing Arabic are actually transcription systems, which represent the sound of the language. As an example, the above rendering munāẓaratu l-ḥurūfi l-ʻarabīyah of the Arabic : مناظرة الحروف العربية is a transcription, indicating the pronunciation; an example transliteration would be mnaẓrḧ alḥrwf alʻrbyḧ.

Romanization standards and systems

Principal standards and systems are:

Mixed digraphic and diacritical

Fully diacritical


Comparison table

Letter Unicode Name IPA BGN/
ء 3 0621hamzah ʔ ʼ  4 ʾʼ  4 ʾʼ  4 ʾˈ, ˌ'2
ب0628ʼ b b
ت062Aʼ t t
ث062Bthāʼ θ th (t͟h) 5 _ts/th
ج062Cjīm d͡ʒ ~ ɡ ~ ʒ jdj (d͟j) 5 j  6 ǧ^gj/g/dj
ح062Dḥāʼ ħ   7 .h7
خ062Ekhāʼ x kh (k͟h) 5   6 x_hkh/7'/5
د062Fdāl d d
ذ0630dhāl ð dh (d͟h) 5 _dz/dh/th
ر0631ʼ r r
ز0632zayn/zāy z z
س0633sīn s s
ش0634shīn ʃ sh (s͟h) 5 š^ssh/ch
ص0635ṣād ş  7 .ss/9
ض0636ḍād   7 .dd/9'
ط0637ṭāʼ ţ  7 .tt/6
ظ0638ẓāʼ ðˤ ~   7 ḏ̣/ẓ 11 .zz/dh/6'
ع0639ʻayn ʕ ʻ  4 ʿʽ  4 ʿ`3
غ063Aghayn ɣ gh (g͟h) 5   6 ġġ.ggh/3'
ف 8 0641ʼ f f
ق 8 0642qāf q q2/g/q/8
ك0643kāf k k
ل0644lām l l
م0645mīm m m
ن0646nūn n n
ه0647ʼ h h
و0648wāw w , w; ūw; Uw/ou/oo/u/o
ي 9 064Aʼ j , y; īy; Iy/i/ee/ei/ai
آ0622alif maddahʔaːā, ʼāʾāʾâ'A2a/aa
ة0629ʼ marbūṭaha, ath; t—; th; tTa/e(h); et/at
ال 06210644alif lām(var.)al-  10 ʾalal-el/al
ى 9 0649alif maqṣūraháā_Aa
ـَ 064Efatḥah a aa/e/é
ـِ 0650kasrah i ii/e/é
ـُ 064Fḍammah u uou/o/u
ـَا 064E0627fatḥah alif āa’A/aaa
ـِي 0650064Akasrah yāʼ īiyI/iyi/ee
ـُو 064F0648ḍammah wāw ūuwU/uwou/oo/u
ـَي 064E064Afatḥah yāʼajayay/ai/ey/ei
ـَو 064E0648fatḥah wāwawawaw/aou
ـً 064BfatḥatānanananáaNan
ـٍ 064DkasratāninininíiNin/en
ـٌ 064CḍammatānunununúuNoun/on/oon/un

Romanization issues

Any romanization system has to make a number of decisions which are dependent on its intended field of application.


One basic problem is that written Arabic is normally unvocalized; i.e., many of the vowels are not written out, and must be supplied by a reader familiar with the language. Hence unvocalized Arabic writing does not give a reader unfamiliar with the language sufficient information for accurate pronunciation. As a result, a pure transliteration, e.g., rendering قطر as qṭr, is meaningless to an untrained reader. For this reason, transcriptions are generally used that add vowels, e.g. qaṭar. However, unvocalized systems match exactly to written Arabic, unlike vocalized systems such as Arabic chat, which some claim detracts from one's ability to spell. [15]

Transliteration vs. transcription

Most uses of romanization call for transcription rather than transliteration: Instead of transliterating each written letter, they try to reproduce the sound of the words according to the orthography rules of the target language: Qaṭar. This applies equally to scientific and popular applications. A pure transliteration would need to omit vowels (e.g. qṭr), making the result difficult to interpret except for a subset of trained readers fluent in Arabic. Even if vowels are added, a transliteration system would still need to distinguish between multiple ways of spelling the same sound in the Arabic script, e.g. alifا vs. alif maqṣūrahى for the sound /aː/ā, and the six different ways (ء إ أ آ ؤ ئ) of writing the glottal stop (hamza, usually transcribed ʼ ). This sort of detail is needlessly confusing, except in a very few situations (e.g., typesetting text in the Arabic script).

Most issues related to the romanization of Arabic are about transliterating vs. transcribing; others, about what should be romanized:

A transcription may reflect the language as spoken, typically rendering names, for example, by the people of Baghdad (Baghdad Arabic), or the official standard (Literary Arabic) as spoken by a preacher in the mosque or a TV newsreader. A transcription is free to add phonological (such as vowels) or morphological (such as word boundaries) information. Transcriptions will also vary depending on the writing conventions of the target language; compare English Omar Khayyam with German Omar Chajjam, both for عمر خيام/ʕumar xajjaːm/, [ˈʕomɑr xæjˈjæːm] (unvocalized ʿmr ḫyām, vocalized ʻUmar Khayyām).

A transliteration is ideally fully reversible: a machine should be able to transliterate it back into Arabic. A transliteration can be considered as flawed for any one of the following reasons:

A fully accurate transcription may not be necessary for native Arabic speakers, as they would be able to pronounce names and sentences correctly anyway, but it can be very useful for those not fully familiar with spoken Arabic and who are familiar with the Roman alphabet. An accurate transliteration serves as a valuable stepping stone for learning, pronouncing correctly, and distinguishing phonemes. It is a useful tool for anyone who is familiar with the sounds of Arabic but not fully conversant in the language.

One criticism is that a fully accurate system would require special learning that most do not have to actually pronounce names correctly, and that with a lack of a universal romanization system they will not be pronounced correctly by non-native speakers anyway. The precision will be lost if special characters are not replicated and if a reader is not familiar with Arabic pronunciation.


Examples in Literary Arabic:

Arabic أمجد كان له قصرإلى المملكة المغربية
Arabic with diacritics
(normally omitted)
أَمْجَدُ كَانَ لَهُ قَصْرإِلَى الْمَمْلَكَةِ الْمَغْرِبِيَّة
IPA /ʔamdʒadu kaːna lahu qasˤr//ʔila‿l.mamlakati‿l.maɣribij.ja/
ALA-LC Amjad kāna lahu qaṣrIlá al-mamlakah al-Maghribīyah
Hans Wehr amjad kāna lahū qaṣrilā l-mamlaka al-maḡribīya
DIN 31635 ʾAmǧad kāna lahu qaṣrʾIlā l-mamlakah al-Maġribiyyah
UNGEGN Amjad kāna lahu qaşrIlá al-mamlakah al-maghribiyyah
ISO 233 ʾˈamǧad kāna lahu qaṣrʾˈilaỳ ʾˈalmamlakaẗ ʾˈalmaġribiȳaẗ
ArabTeX am^gad kAna lahu qa.sril_A almamlakaT alma.gribiyyaT
EnglishAmjad had a palaceTo the Moroccan Kingdom

Arabic alphabet and nationalism

There have been many instances of national movements to convert Arabic script into Latin script or to romanize the language.


A Beirut newspaper La Syrie pushed for the change from Arabic script to Latin script in 1922. The major head of this movement was Louis Massignon, a French Orientalist, who brought his concern before the Arabic Language Academy in Damascus in 1928. Massignon's attempt at romanization failed as the Academy and population viewed the proposal as an attempt from the Western world to take over their country. Sa'id Afghani, a member of the Academy, asserted that the movement to romanize the script was a Zionist plan to dominate Lebanon. [17] [18]


After the period of colonialism in Egypt, Egyptians were looking for a way to reclaim and reemphasize Egyptian culture. As a result, some Egyptians pushed for an Egyptianization of the Arabic language in which the formal Arabic and the colloquial Arabic would be combined into one language and the Latin alphabet would be used. [17] [18] There was also the idea of finding a way to use hieroglyphics instead of the Latin alphabet. [17] [18] A scholar, Salama Musa, agreed with the idea of applying a Latin alphabet to Egyptian Arabic, as he believed that would allow Egypt to have a closer relationship with the West. He also believed that Latin script was key to the success of Egypt as it would allow for more advances in science and technology. This change in script, he believed, would solve the problems inherent with Arabic, such as a lack of written vowels and difficulties writing foreign words. [17] [18] [19] Ahmad Lutfi As Sayid and Muhammad Azmi, two Egyptian intellectuals, agreed with Musa and supported the push for romanization. [17] [18] The idea that romanization was necessary for modernization and growth in Egypt continued with Abd Al Aziz Fahmi in 1944. He was the chairman for the Writing and Grammar Committee for the Arabic Language Academy of Cairo. [17] [18] He believed and desired to implement romanization in a way that allowed words and spellings to remain somewhat familiar to the Egyptian people. However, this effort failed as the Egyptian people felt a strong cultural tie to the Arabic alphabet, particularly the older generation. [17] [18]

See also

Related Research Articles

Arabic alphabet Alphabet for Arabic and other languages

The Arabic alphabet, or Arabic abjad, is the Arabic script as it is codified for writing Arabic. It is written from right to left in a cursive style and includes 28 letters. Most letters have contextual letterforms. The Arabic script is also a religious text, it is used mainly in Islamic countries, namely in Arabia, North Africa, Persia/Iran, Central Asia and the Northwestern Indian Subcontinent.

A diacritic is a glyph added to a letter or basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. Diacritic is primarily an adjective, though sometimes used as a noun, whereas diacritical is only ever an adjective. Some diacritical marks, such as the acute ( ´ ) and grave ( ` ), are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways, such as Greek ⟨α⟩ → ⟨a⟩, Cyrillic ⟨д⟩ → ⟨d⟩, Greek ⟨χ⟩ → the digraph ⟨ch⟩, Armenian ⟨ն⟩ → ⟨n⟩ or Latin ⟨æ⟩ → ⟨ae⟩.

Romanization Transcription of a text in a non-Latin writing system to Latin characters

Romanization or romanisation, in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

DIN 31635 is a Deutsches Institut für Normung (DIN) standard for the transliteration of the Arabic alphabet adopted in 1982. It is based on the rules of the Deutsche Morgenländische Gesellschaft (DMG) as modified by the International Orientalist Congress 1935 in Rome. The most important differences from English-based systems were doing away with j, because it stood for in the English-speaking world and for in the German-speaking world and the entire absence of digraphs like th, dh, kh, gh, sh. Its acceptance relies less on its official status than on its elegance and the Geschichte der arabischen Literatur manuscript catalogue of Carl Brockelmann and the dictionary of Hans Wehr. Today it is used in most German-language publications of Arabic and Islamic studies.

Romanization of Russian Romanization of the Russian alphabet

Romanization of Russian is the process of transliterating the Russian language from the Cyrillic script into the Latin script.

The romanizationof Ukrainian is the representation of the Ukrainian language using Latin letters. Ukrainian is natively written in its own Ukrainian alphabet, which is based on the Cyrillic script. Romanization may be employed to represent Ukrainian text or pronunciation for non-Ukrainian readers, on computer systems that cannot reproduce Cyrillic characters, or for typists who are not familiar with the Ukrainian keyboard layout. Methods of romanization include transliteration, representing written text, and transcription, representing the spoken word.

Yodh is the tenth letter of the Semitic abjads, including Phoenician Yōd , Hebrew Yōd י, Aramaic Yodh , Syriac Yōḏ ܝ, and Arabic Yāʾ ي. Its sound value is in all languages for which it is used; in many languages, it also serves as a long vowel, representing.

Romanization of Hebrew

Hebrew uses the Hebrew alphabet with optional vowel diacritics. The romanization of Hebrew is the use of the Latin alphabet to transliterate Hebrew words.

Aleph is the first letter of the Semitic abjads, including Phoenician ʾālep 𐤀, Hebrew ʾālef א, Aramaic ʾālap 𐡀, Syriac ʾālap̄ ܐ, and Arabic alif ا. It also appears as South Arabian 𐩱, and Ge'ez ʾälef አ.

Romanization of Bulgarian Transliteration of text in Bulgarian from its conventional Cyrillic orthography into the Latin alphabet

Romanization of Bulgarian is the practice of transliteration of text in Bulgarian from its conventional Cyrillic orthography into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names and place names in foreign-language contexts, or for informal writing of Bulgarian in environments where Cyrillic is not easily available. Official use of romanization by Bulgarian authorities is found, for instance, in identity documents and in road signage. Several different standards of transliteration exist, one of which was chosen and made mandatory for common use by the Bulgarian authorities in a law of 2009.

The Hans Wehr transliteration system is a system for transliteration of the Arabic alphabet into the Latin alphabet used in the Hans Wehr dictionary. The system was modified somewhat in the English editions. It is printed in lowercase italics. It marks some consonants using diacritics rather than digraphs, and writes long vowels with macrons.

<i>Dictionary of Modern Written Arabic</i> An Arabic-English dictionary compiled by Hans Wehr

The Dictionary of Modern Written Arabic is an Arabic-English dictionary compiled by Hans Wehr and edited by J Milton Cowan.

Khmer romanization refers to the romanization of the Khmer (Cambodian) language, that is, the representation of that language using letters of the Latin (Roman) alphabet. This is most commonly done with Khmer proper nouns such as names of people and geographical names, as in a gazetteer.

The Romanization of Macedonian is the transliteration of text in the Macedonian language from the Macedonian Cyrillic alphabet into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names in foreign contexts, or for informal writing of Macedonian in environments where Cyrillic is not easily available. Official use of Romanization by North Macedonia's authorities is found, for instance, on road signage and in passports. Several different codified standards of transliteration currently exist and there is widespread variability in practice.

Romanization of Persian or Latinization of Persian is the representation of the Persian language with the Latin script. Several different romanization schemes exist, each with its own set of rules driven by its own set of ideological goals.

Hamza Mark used in Arabic-based orthographies

Hamza (ء) is a letter in the Arabic alphabet, representing the glottal stop. Hamza is not one of the 28 "full" letters and owes its existence to historical inconsistencies in the standard writing system. It is derived from the Arabic letter ʿAyn (ع). In the Phoenician and Aramaic alphabets, from which the Arabic alphabet is descended, the glottal stop was expressed by alif (𐤀), continued by Alif ( ا ) in the Arabic alphabet. However, Alif was used to express both a glottal stop and also a long vowel. In order to indicate that a glottal stop is used, and not a mere vowel, it was added to Alif diacritically. In modern orthography, hamza may also appear on the line, under certain circumstances as though it were a full letter, independent of an Alif.

The Pashto alphabet is transliterated vis-à-vis Perso-Arabic scriptural denotation with additional glyphs added to accommodate phonemes used in Pashto.

The Khowar alphabet is the right-to-left alphabet used for the Khowar language. It is a modification of the Urdu alphabet, which is itself a derivative of the Persian alphabet and Arabic alphabet and uses the calligraphic Nastaʿlīq script.

Sindhi romanisation or Latinization of Sindhi is a system for representing the Sindhi language using the Latin script.


  1. "Romanization system for Arabic. BGN/PCGN 1956 System" (PDF).
  2. 1 2 3 4 "Arabic" (PDF). UNGEGN.
  3. Technical reference manual for the standardization of geographical names (PDF). UNGEGN. 2007. p. 12 [22].
  4. "Systèmes français de romanisation" (PDF). UNGEGN. 2009.
  5. "Arabic romanization table" (PDF). The Library of Congress.
  6. "IJMES Translation & Transliteration Guide". International Journal of Middle East Studies.
  7. "Encyclopaedia of Islam Romanization vs ALA Romanization for Arabic". University of Washington Libraries.
  8. Brockelmann, Carl; Ronkel, Philippus Samuel van (1935). Die Transliteration der arabischen Schrift... (PDF). Leipzig.
  9. 1 2 Reichmuth, Philipp (2009). "Transcription". In Versteegh, Kees (ed.). Encyclopedia of Arabic Language and Linguistics. 4. Brill. pp. 515–20.
  10. Millar, M. Angélica; Salgado, Rosa; Zedán, Marcela (2005). Gramatica de la lengua arabe para hispanohablantes. Santiago de Chile: Editorial Universitaria. pp. 53–54. ISBN   978-956-11-1799-0.
  11. "Standards, Training, Testing, Assessment and Certification". BSI Group. Archived from the original on October 7, 2008. Retrieved 2014-05-18.
  12. ArabTex User Manual Section 4.1 : ASCII Transliteration Encoding.
  13. "Buckwalter Arabic Transliteration". QAMUS LLC.
  14. "Arabic Morphological Analyzer/The Buckwalter Transliteration". Xerox. Retrieved 2017-04-30.
  15. "Arabizi sparks concern among educators". 2013-05-09. Retrieved 2014-05-18.
  16. "Arabic" (PDF). ALA-LC Romanization Tables. Library of Congress. p. 9. Retrieved 2013-06-14. 21. The prime (ʹ) is used: (a) To separate two letters representing two distinct consonantal sounds, when the combination might otherwise be read as a digraph.
  17. 1 2 3 4 5 6 7 Shrivtiel, Shraybom (1998). The Question of Romanisation of the Script and The Emergence of Nationalism in the Middle East. Mediterranean Language Review. pp. 179–196.
  18. 1 2 3 4 5 6 7 History of Arabic Writing
  19. Shrivtiel, p. 188