Romanization of Persian

Last updated

Romanization of Persian or Latinization of Persian is the representation of the Persian language (Farsi, Dari and Tajik) with the Latin script. Several different romanization schemes exist, each with its own set of rules driven by its own set of ideological goals.

Persian language Western Iranian language

Persian, also known by its endonym Farsi, is one of the Western Iranian languages within the Indo-Iranian branch of the Indo-European language family. It is primarily spoken in Iran, Afghanistan, and Tajikistan, Uzbekistan and some other regions which historically were Persianate societies and considered part of Greater Iran. It is written right to left in the Persian alphabet, a modified variant of the Arabic script, which itself evolved from the Aramaic alphabet.

Darī or Dari Persian or synonymously Farsi is a variation of the Persian language spoken in Afghanistan. Dari is the term officially recognized and promoted since 1964 by the Afghan government for the Persian language, hence, it is also known as Afghan Persian in many Western sources. This has resulted in a naming dispute. Many Persian speakers in Afghanistan prefer and use the name "Farsi" and say the term Dari has been forced on them by the dominant Pashtun ethnic group as an attempt to distance Afghans from their cultural, linguistic, and historical ties to the Persian-speaking world, which includes Iran, Tajikistan and Uzbekistan.

Tajik language language spoken in Tajikistan

Tajik or Tajiki, also called Tajiki Persian, is the variety of Persian spoken in Tajikistan and Uzbekistan. It is closely related to Dari Persian. Since the beginning of the twentieth century and collapse of the Soviet Union, Tajik has been considered by a number of writers and researchers to be a variety of Persian. The popularity of this conception of Tajik as a variety of Persian was such that, during the period in which Tajik intellectuals were trying to establish Tajik as a language separate from Persian, Sadriddin Ayni, who was a prominent intellectual and educator, had to make a statement that Tajik was not a bastardized dialect of Persian. The issue of whether Tajik and Persian are to be considered two dialects of a single language or two discrete languages has political sides to it.


Romanization paradigms

Because the Perso-Arabic script is an abjad writing system (with a consonant-heavy inventory of letters), many distinct words in standard Persian can have identical spellings, with widely varying pronunciations that differ in their (unwritten) vowel sounds. Thus a romanization paradigm can follow either transliteration (which mirrors spelling and orthography) or transcription (which mirrors pronunciation and phonology).

The Persian alphabet, or Perso-Arabic alphabet, is a writing system used for the Persian language.

Abjad type of writing system where each symbol stands for a consonant

An abjad is a type of writing system where each symbol or glyph stands for a consonant, leaving the reader to supply the appropriate vowel. So-called impure abjads do represent vowels, either with optional diacritics, a limited number of distinct vowel glyphs, or both. The name abjad is based on the old Arabic alphabet's first four letters—a, b, j, d—to replace the common terms "consonantary" or "consonantal alphabet" to refer to the family of scripts called West Semitic.

Consonant sound in spoken language, articulated with complete or partial closure of the vocal tract

In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are, pronounced with the lips;, pronounced with the front of the tongue;, pronounced with the back of the tongue;, pronounced in the throat; and, pronounced by forcing air through a narrow channel (fricatives); and and, which have air flowing through the nose (nasals). Contrasting with consonants are vowels.

The Latin script plays in Iran the role of a second script. For the proof of this assertion it is sufficient to take a look at the city and street signs or the Internet addresses in all countries. On the other hand, experience has shown that efforts to teach millions of Iranian young people abroad in reading and writing Persian mostly prove to be unsuccessful, due to the lack of daily contact with the Persian script. It seems that a way out of this dilemma has been found; and that is the use of the Latin script parallel to the Persian script.


Transliteration (in the strict sense) attempts to be a complete representation of the original writing, so that an informed reader should be able to reconstruct the original spelling of unknown transliterated words. Transliterations of Persian are used to represent individual Persian words or short quotations, in scholarly texts in English or other languages that do not use the Arabic alphabet.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways.

A transliteration will still have separate representations for different consonants of the Persian alphabet that are pronounced identically in Persian. Therefore, transliterations of Persian are often based on transliterations of Arabic. [1] The representation of the vowels of the Perso-Arabic alphabet is also complex, and transliterations are based on the written form.

The romanization of Arabic writes written and spoken Arabic in the Latin script in one of various systematic ways. Romanized Arabic is used for a number of different purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used in lieu of or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters and are used in academic settings or for the benefit of non-speakers, contrast with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.

Transliterations commonly used in the English-speaking world include BGN/PCGN romanization and ALA-LC Romanization.

BGN/PCGN romanization refers to the systems for romanization and Roman-script spelling conventions adopted by the United States Board on Geographic Names (BGN) and the Permanent Committee on Geographical Names for British Official Use (PCGN).

Non-academic English-language quotation of Persian words usually uses a simplification of one of the strict transliteration schemes (typically omitting diacritical marks) and/or unsystematic choices of spellings meant to guide English speakers using English spelling rules towards an approximation of the Persian sounds.


Transcriptions of Persian attempt to straightforwardly represent Persian phonology in the Latin script, without requiring a close or reversible correspondence with the Perso-Arabic script, and also without requiring a close correspondence to English phonetic values of Roman letters.

Main romanization schemes

Comparison table

IPADMG (1969)ALA-LC (1997)BGN/PCGN (1958)EI (1960)EI (2012)UN (1967)UN (2012)
U+0627اʔ, ∅ [lower-alpha 1] ʾ, — [lower-alpha 2] ’, — [lower-alpha 2] ʾ
U+062Dحhḩ/ḥ [lower-alpha 3] h
U+0635صsş/ṣ [lower-alpha 3] şs
U+0637طtţ/ṭ [lower-alpha 3] ţt
U+0638ظzz̧/ẓ [lower-alpha 3] z
U+0639عʿ [lower-alpha 2] ʿʿ
U+0648وv~w [lower-alpha 1] [lower-alpha 4] vv, w [lower-alpha 5] v
U+0647هh [lower-alpha 1] hhh [lower-alpha 6] hhh [lower-alpha 6] h [lower-alpha 6]
U+0629ة∅, th [lower-alpha 7] t [lower-alpha 8] h [lower-alpha 7]
U+06CCیj [lower-alpha 1] y
U+0621ءʔ, ∅ʾʾ
U+0624ؤʔ, ∅ʾʾ
U+0626ئʔ, ∅ʾʾ
Vowels [lower-alpha 9]
UnicodeFinalMedialInitialIsolatedIPADMG (1969)ALA-LC (1997)BGN/PCGN (1958)EI (2012)UN (1967)UN (2012)
U+0648 U+064F◌ﻮَ◌ﻮَ◌وَo [lower-alpha 10] ooouoo
U+064E U+0627◌َا◌َاأ◌َاɑː~ɒːāāāāāā
U+0622◌ﺂ◌ﺂآ◌آɑː~ɒːā, ʾā [lower-alpha 11] ā, ’ā [lower-alpha 11] āāāā
U+064E U+06CC◌َﯽ◌َیɑː~ɒːāááāáā
U+06CC U+0670◌ﯽٰ◌یٰɑː~ɒːāááāāā
U+064F U+0648◌ُﻮ◌ُﻮاُو◌ُوuː, oː [lower-alpha 5] ūūūu, ō [lower-alpha 5] ūu
U+0650 U+06CC◌ِﯽ◌ِﯿاِﯾ◌ِیiː, eː [lower-alpha 5] īīīi, ē [lower-alpha 5] īi
U+064E U+0648◌َﻮ◌َﻮاَو◌َوow~aw [lower-alpha 5] auawowow, aw [lower-alpha 5] owow
U+064E U+06CC◌َﯽ◌َﯿاَﯾ◌َیej~aj [lower-alpha 5] aiayeyey, ay [lower-alpha 5] eyey
U+064E U+06CC◌ﯽ◌ی–e, –je–e, –ye–i, –yi–e, –ye–e, –ye–e, –ye–e, –ye


  1. 1 2 3 4 Used as a vowel as well.
  2. 1 2 3 Hamza and ayn are not transliterated at the beginning of words.
  3. 1 2 3 4 The dot below may be used instead of cedilla.
  4. At the beginning of words the combination خو was pronounced /xw/ or /xʷ/ in Classical Persian. In modern varieties the glide /ʷ/ has been lost, though the spelling has not been changed. It may be still heard in Dari as a relict pronunciation. The combination /xʷa/ was changed to /xo/ (see below).
  5. 1 2 3 4 5 6 7 8 9 In Dari.
  6. 1 2 3 Not transliterated at the end of words.
  7. 1 2 In the combination یة at the end of words.
  8. When used instead of ت at the end of words.
  9. Diacritical signs ( harakat ) are rarely written.
  10. After خ from the earlier /xʷa/. Often transliterated as xwa or xva. For example, خور/xor/ "sun" was /xʷar/ in Classical Persian.
  11. 1 2 After vowels.

Pre-Islamic period

In the pre-Islamic period Old and Middle Persian employed various scripts including Old Persian cuneiform, Pahlavi and Avestan scripts. For each period there are established transcriptions and transliterations by prominent linguists. [5] [9] [10] [11] [12]

IPAOld Persian [lower-roman 1] [lower-roman 2] Middle Persian
(Pahlavi) [lower-roman 1]
Avestan [lower-roman 1]
ttt, t̰
ʃšš, š́, ṣ̌
xxx, x́
ggg, ġ
mmm, m̨
ŋŋ, ŋʷ
nnn, ń, ṇ
jyy, ẏ
ãą, ą̇


  1. 1 2 3 Slash signifies equal variants.
  2. There exist some differences in transcription of Old Persian preferred by different scholars:
    • ā = â
    • ī, ū = i, u
    • x = kh, ḵ, ḥ, ḫ
    • c/č = ǩ
    • j/ǰ = ǧ
    • θ = ϑ, þ, th, ṯ, ṭ
    • ç = tr, θʳ, ϑʳ, ṙ, s͜s, s̀
    • f = p̱
    • y, v = j, w.

Other romanization schemes

Bahá'í Persian romanization

Bahá'ís use a system standardized by Shoghi Effendi, which he initiated in a general letter on March 12, 1923. [13] The Bahá'í transliteration scheme was based on a standard adopted by the Tenth International Congress of Orientalists which took place in Geneva in September 1894. Shoghi Effendi changed some details of the Congress's system, most notably in the use of digraphs in certain cases (e.g. sh instead of š), and in incorporating the solar letters when writing the definite article al- (Arabic: ال) according to pronunciation (e.g. ar-Rahim, as-Saddiq, instead of al-Rahim, al-Saddiq).

A detailed introduction to the Bahá'í Persian romanization can usually be found at the back of a Bahá'í scripture.

ASCII Internet romanizations

It is common to write Persian language with only the Latin alphabet (as opposed to the Persian alphabet) especially in online chat, social networks, emails and SMS. It has developed and spread due to a former lack of software supporting the Persian alphabet, and/or due to a lack of knowledge about the software that was available. Although Persian writing is supported in recent operating systems, there are still many cases where the Persian alphabet is unavailable and there is a need for an alternative way to write Persian with the basic Latin alphabet. This way of writing is sometimes called Fingilish or Pingilish (a portmanteau of Farsi or Persian and English). In most cases this is an ad hoc simplification of the scientific systems listed above (such as ALA-LC or BGN/PCGN), but ignoring any special letters or diacritical signs. ع may be written using the numeral "3", as in the Arabic chat alphabet.

Tajik Latin alphabet

The Tajik language or Tajik Persian is a variety of the Persian language. It was written in Tajik SSR in a standardized Latin script from 1926 until the late 1930s, when the script was officially changed to Cyrillic. However, Tajik phonology differs slightly from that of Persian in Iran. As the result of these two factors romanization schemes of the Tajik Cyrillic script follow rather different principles. [14]

The Tajik alphabet in Latin (1928-1940) [15]
A aB ʙC cÇ çD dE eF fG g Ƣ ƣ H hI iĪ ī
J jK kL lM mN nO oP pQ qR rS sŞ şT t
U uŪ ūV vX xZ z Ƶ ƶ ʼ

See also

Related Research Articles

Romanization of Russian Romanization of the Russian alphabet

Romanization of Russian is the process of transliterating the Russian language from the Cyrillic script into the Latin script.

The romanization or Latinization of Ukrainian is the representation of the Ukrainian language using Latin letters. Ukrainian is natively written in its own Ukrainian alphabet, which is based on the Cyrillic script. Romanization may be employed to represent Ukrainian text or pronunciation for non-Ukrainian readers, on computer systems that cannot reproduce Cyrillic characters, or for typists who are not familiar with the Ukrainian keyboard layout. Methods of romanization include transliteration, representing written text, and transcription, representing the spoken word.

Romanization of Greek is the transliteration (letter-mapping) or transcription (sound-mapping) of text from the Greek alphabet into the Latin alphabet. The conventions for writing and romanizing Ancient Greek and Modern Greek differ markedly, which can create confusion. The sound of the English letter B was written as β in ancient Greek but is now written as the digraph μπ, while the modern β sounds like the English letter V instead. The Greek name Ἰωάννης became Johannes in Latin and then John in English, but in Greek itself has instead become Γιάννης; this might be written as Yannis, Jani, Ioannis, Yiannis, or Giannis, but not Giannes or Giannēs as it would have been in ancient Greek. The masculine Greek word Ἅγιος or Άγιος might variously appear as Hagiοs, Agios, Aghios, or Ayios, or simply be translated as "Holy" or "Saint" in English forms of Greek placenames.

Romanization of Bulgarian transliteration of text in Bulgarian from its conventional Cyrillic orthography into the Latin alphabet

Romanization of Bulgarian is the practice of transliteration of text in Bulgarian from its conventional Cyrillic orthography into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names and place names in foreign-language contexts, or for informal writing of Bulgarian in environments where Cyrillic is not easily available. Official use of romanization by Bulgarian authorities is found, for instance, in identity documents and in road signage. Several different standards of transliteration exist, one of which was chosen and made mandatory for common use by the Bulgarian authorities in a law of 2009.

Lao romanization systems are transcriptions of the Lao alphabet into the Latin alphabet.

Khmer romanization refers to the romanization of the Khmer (Cambodian) language, that is, the representation of that language using letters of the Latin (Roman) alphabet. This is most commonly done with Khmer proper nouns such as names of people and geographical names, as in a gazetteer.

Scientific transliteration, variously called academic, linguistic, international, or scholarly transliteration, is an international system for transliteration of text from the Cyrillic script to the Latin script (romanization). This system is most often seen in linguistics publications on Slavic languages.

Romanization or Latinization of Belarusian is any system for transliterating written Belarusian from Cyrillic to the Latin.

BGN/PCGN romanization system for Russian is a method for romanization of Cyrillic Russian texts, that is, their transliteration into the Latin alphabet as used in the English language.

The Myanmar Language Commission Transcription System (1980), also known as the MLC Transcription System (MLCTS), is a transliteration system for rendering Burmese in the Latin alphabet. It is loosely based on the common system for romanization of Pali, has some similarities to the ALA-LC romanization and was devised by the Myanmar Language Commission. The system is used in many linguistic publications regarding Burmese and is used in MLC publications as the primary form of romanization of Burmese.

The Romanization of Macedonian is the transliteration of text in the Macedonian language from the Macedonian Cyrillic alphabet into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names in foreign contexts, or for informal writing of Macedonian in environments where Cyrillic is not easily available. Official use of Romanization by Macedonian authorities is found, for instance, on road signage and in passports. Several different codified standards of transliteration currently exist and there is widespread variability in practice.

Tajik alphabet

The Tajik language has been written in three alphabets over the course of its history: an adaptation of the Perso-Arabic script, an adaptation of the Latin script, and an adaptation of the Cyrillic script. Any script used specifically for Tajik may be referred to as the Tajik alphabet, which is written as алифбои тоҷикӣ in Cyrillic characters, الفبای تاجیکی‎ with Arabic script, and alifboji toçikī in Latin script.

The BGN/PCGN romanization system for Belarusian is a method for romanization of Cyrillic Belarusian texts, that is, their transliteration into the Latin alphabet.

There are various systems of romanization of the Armenian alphabet.

Romanization of the Burmese alphabet is representation of the Burmese language or Burmese names in the Latin alphabet.


  1. Joachim, Martin D. (1993). Languages of the world: cataloging issues and problems. New York: Haworth Press. p. 137. ISBN   1560245204.
  2. 1 2 Pedersen, Thomas T. "Persian (Farsi)" (PDF). Transliteration of Non-Roman Scripts.
  3. "Persian" (PDF). The Library of Congress.
  4. "Romanization system for Persian (Dari and Farsi). BGN/PCGN 1958 System" (PDF).
  5. 1 2 "Transliteration". Encyclopædia Iranica.
  6. 1 2 "Persian" (PDF). UNGEGN.
  7. Toponymic Guidelines for map and other editors – Revised edition 1998. Working Paper No. 41 . Submitted by the Islamic Republic of Iran. UNGEGN, 20th session. New York, 17–28 January 2000.
  8. New Persian Romanization System. E/CONF.101/118/Rev.1* . Tenth United Nations Conference on the Standardization of Geographical Names. New York, 31 July – 9 August 2012.
  9. Bartholomae, Christian (1904). Altiranisches Wörterbuch. Strassburg. p. XXIII.
  10. Kent, Roland G. (1950). Old Persian. New Heaven, Connecticut. pp. 12–13.
  11. MacKenzie, D. N. (1971). "Transcription". A Concise Pahlavi Dictionary. London.
  12. Hoffmann, Karl; Forssman, Bernhard (1996). Avestische Laut- und Flexionslehre. Innsbruck. pp. 41–44. ISBN   3-85124-652-7.
  13. Effendi, Shoghi (1974). Bahá'í Administration. Wilmette, Illinois, USA: Bahá'í Publishing Trust. p. 43. ISBN   0-87743-166-3.
  14. Pedersen, Thomas T. "Tajik" (PDF). Transliteration of Non-Roman Scripts.
  15. Perry, John R. (2005). A Tajik Persian Reference Grammar. Brill. pp. 34–35.