Romanization of Persian or Latinization of Persian is the representation of the Persian language (Farsi, Dari and Tajik) with the Latin script. Several different romanization schemes exist, each with its own set of rules driven by its own set of ideological goals.

Persian, also known by its endonym Farsi, is one of the Western Iranian languages within the Indo-Iranian branch of the Indo-European language family. It is primarily spoken in Iran, Afghanistan, and Tajikistan, Uzbekistan and some other regions which historically were Persianate societies and considered part of Greater Iran. It is written right to left in the Persian alphabet, a modified variant of the Arabic script, which itself evolved from the Aramaic alphabet.

Darī or Dari Persian or synonymously Farsi is a variation of the Persian language spoken in Afghanistan. Dari is the term officially recognized and promoted since 1964 by the Afghan government for the Persian language, hence, it is also known as Afghan Persian in many Western sources. This has resulted in a naming dispute. Many Persian speakers in Afghanistan prefer and use the name "Farsi" and say the term Dari has been forced on them by the dominant Pashtun ethnic group as an attempt to distance Afghans from their cultural, linguistic, and historical ties to the Persian-speaking world, which includes Iran, Tajikistan and Uzbekistan.

Tajik language language spoken in Tajikistan

Tajik or Tajiki, also called Tajiki Persian, is the variety of Persian spoken in Tajikistan and Uzbekistan. It is closely related to Dari Persian. Since the beginning of the twentieth century and collapse of the Soviet Union, Tajik has been considered by a number of writers and researchers to be a variety of Persian. The popularity of this conception of Tajik as a variety of Persian was such that, during the period in which Tajik intellectuals were trying to establish Tajik as a language separate from Persian, Sadriddin Ayni, who was a prominent intellectual and educator, had to make a statement that Tajik was not a bastardized dialect of Persian. The issue of whether Tajik and Persian are to be considered two dialects of a single language or two discrete languages has political sides to it.


Romanization paradigms

Because the Perso-Arabic script is an abjad writing system (with a consonant-heavy inventory of letters), many distinct words in standard Persian can have identical spellings, with widely varying pronunciations that differ in their (unwritten) vowel sounds. Thus a romanization paradigm can follow either transliteration (which mirrors spelling and orthography) or transcription (which mirrors pronunciation and phonology).

The Persian alphabet, or Perso-Arabic alphabet, is a writing system used for the Persian language.

Abjad type of writing system where each symbol stands for a consonant

An abjad is a type of writing system where each symbol or glyph stands for a consonant, leaving the reader to supply the appropriate vowel. So-called impure abjads do represent vowels, either with optional diacritics, a limited number of distinct vowel glyphs, or both. The name abjad is based on the old Arabic alphabet's first four letters—a, b, j, d—to replace the common terms "consonantary" or "consonantal alphabet" to refer to the family of scripts called West Semitic.

Consonant sound in spoken language, articulated with complete or partial closure of the vocal tract

In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are, pronounced with the lips;, pronounced with the front of the tongue;, pronounced with the back of the tongue;, pronounced in the throat; and, pronounced by forcing air through a narrow channel (fricatives); and and, which have air flowing through the nose (nasals). Contrasting with consonants are vowels.

The Latin script plays in Iran the role of a second script. For the proof of this assertion it is sufficient to take a look at the city and street signs or the Internet addresses in all countries. On the other hand, experience has shown that efforts to teach millions of Iranian young people abroad in reading and writing Persian mostly prove to be unsuccessful, due to the lack of daily contact with the Persian script. It seems that a way out of this dilemma has been found; and that is the use of the Latin script parallel to the Persian script.


Transliteration (in the strict sense) attempts to be a complete representation of the original writing, so that an informed reader should be able to reconstruct the original spelling of unknown transliterated words. Transliterations of Persian are used to represent individual Persian words or short quotations, in scholarly texts in English or other languages that do not use the Arabic alphabet.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways.

A transliteration will still have separate representations for different consonants of the Persian alphabet that are pronounced identically in Persian. Therefore, transliterations of Persian are often based on transliterations of Arabic. [1] The representation of the vowels of the Perso-Arabic alphabet is also complex, and transliterations are based on the written form.

The romanization of Arabic writes written and spoken Arabic in the Latin script in one of various systematic ways. Romanized Arabic is used for a number of different purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used in lieu of or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters and are used in academic settings or for the benefit of non-speakers, contrast with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.

Transliterations commonly used in the English-speaking world include BGN/PCGN romanization and ALA-LC Romanization.

BGN/PCGN romanization refers to the systems for romanization and Roman-script spelling conventions adopted by the United States Board on Geographic Names (BGN) and the Permanent Committee on Geographical Names for British Official Use (PCGN).

Non-academic English-language quotation of Persian words usually uses a simplification of one of the strict transliteration schemes (typically omitting diacritical marks) and/or unsystematic choices of spellings meant to guide English speakers using English spelling rules towards an approximation of the Persian sounds.


Transcriptions of Persian attempt to straightforwardly represent Persian phonology in the Latin script, without requiring a close or reversible correspondence with the Perso-Arabic script, and also without requiring a close correspondence to English phonetic values of Roman letters.

Main romanization schemes

Comparison table

IPADMG (1969)ALA-LC (1997)BGN/PCGN (1958)EI (1960)EI (2012)UN (1967)UN (2012)
U+0627اʔ, ∅ [lower-alpha 1] ʾ, — [lower-alpha 2] ’, — [lower-alpha 2] ʾ
U+062Dحhḩ/ḥ [lower-alpha 3] h
U+0635صsş/ṣ [lower-alpha 3] şs
U+0637طtţ/ṭ [lower-alpha 3] ţt
U+0638ظzz̧/ẓ [lower-alpha 3] z
U+0639عʿ [lower-alpha 2] ʿʿ
U+0648وv~w [lower-alpha 1] [lower-alpha 4] vv, w [lower-alpha 5] v
U+0647هh [lower-alpha 1] hhh [lower-alpha 6] hhh [lower-alpha 6] h [lower-alpha 6]
U+0629ة∅, th [lower-alpha 7] t [lower-alpha 8] h [lower-alpha 7]
U+06CCیj [lower-alpha 1] y
U+0621ءʔ, ∅ʾʾ
U+0624ؤʔ, ∅ʾʾ
U+0626ئʔ, ∅ʾʾ
Vowels [lower-alpha 9]
UnicodeFinalMedialInitialIsolatedIPADMG (1969)ALA-LC (1997)BGN/PCGN (1958)EI (2012)UN (1967)UN (2012)
U+0648 U+064F◌ﻮَ◌ﻮَ◌وَo [lower-alpha 10] ooouoo
U+064E U+0627◌َا◌َاأ◌َاɑː~ɒːāāāāāā
U+0622◌ﺂ◌ﺂآ◌آɑː~ɒːā, ʾā [lower-alpha 11] ā, ’ā [lower-alpha 11] āāāā
U+064E U+06CC◌َﯽ◌َیɑː~ɒːāááāáā
U+06CC U+0670◌ﯽٰ◌یٰɑː~ɒːāááāāā
U+064F U+0648◌ُﻮ◌ُﻮاُو◌ُوuː, oː [lower-alpha 5] ūūūu, ō [lower-alpha 5] ūu
U+0650 U+06CC◌ِﯽ◌ِﯿاِﯾ◌ِیiː, eː [lower-alpha 5] īīīi, ē [lower-alpha 5] īi
U+064E U+0648◌َﻮ◌َﻮاَو◌َوow~aw [lower-alpha 5] auawowow, aw [lower-alpha 5] owow
U+064E U+06CC◌َﯽ◌َﯿاَﯾ◌َیej~aj [lower-alpha 5] aiayeyey, ay [lower-alpha 5] eyey
U+064E U+06CC◌ﯽ◌ی–e, –je–e, –ye–i, –yi–e, –ye–e, –ye–e, –ye–e, –ye


  1. 1 2 3 4 Used as a vowel as well.
  2. 1 2 3 Hamza and ayn are not transliterated at the beginning of words.
  3. 1 2 3 4 The dot below may be used instead of cedilla.
  4. At the beginning of words the combination خو was pronounced /xw/ or /xʷ/ in Classical Persian. In modern varieties the glide /ʷ/ has been lost, though the spelling has not been changed. It may be still heard in Dari as a relict pronunciation. The combination /xʷa/ was changed to /xo/ (see below).
  5. 1 2 3 4 5 6 7 8 9 In Dari.
  6. 1 2 3 Not transliterated at the end of words.
  7. 1 2 In the combination یة at the end of words.
  8. When used instead of ت at the end of words.
  9. Diacritical signs ( harakat ) are rarely written.
  10. After خ from the earlier /xʷa/. Often transliterated as xwa or xva. For example, خور/xor/ "sun" was /xʷar/ in Classical Persian.
  11. 1 2 After vowels.

Pre-Islamic period

In the pre-Islamic period Old and Middle Persian employed various scripts including Old Persian cuneiform, Pahlavi and Avestan scripts. For each period there are established transcriptions and transliterations by prominent linguists. [5] [9] [10] [11] [12]

IPAOld Persian [lower-roman 1] [lower-roman 2] Middle Persian
(Pahlavi) [lower-roman 1]
Avestan [lower-roman 1]
ttt, t̰
ʃšš, š́, ṣ̌
xxx, x́
ggg, ġ
mmm, m̨
ŋŋ, ŋʷ
nnn, ń, ṇ
jyy, ẏ
ãą, ą̇


  1. 1 2 3 Slash signifies equal variants.
  2. There exist some differences in transcription of Old Persian preferred by different scholars:
    • ā = â
    • ī, ū = i, u
    • x = kh, ḵ, ḥ, ḫ
    • c/č = ǩ
    • j/ǰ = ǧ
    • θ = ϑ, þ, th, ṯ, ṭ
    • ç = tr, θʳ, ϑʳ, ṙ, s͜s, s̀
    • f = p̱
    • y, v = j, w.

Other romanization schemes

Bahá'í Persian romanization

Bahá'ís use a system standardized by Shoghi Effendi, which he initiated in a general letter on March 12, 1923. [13] The Bahá'í transliteration scheme was based on a standard adopted by the Tenth International Congress of Orientalists which took place in Geneva in September 1894. Shoghi Effendi changed some details of the Congress's system, most notably in the use of digraphs in certain cases (e.g. sh instead of š), and in incorporating the solar letters when writing the definite article al- (Arabic: ال) according to pronunciation (e.g. ar-Rahim, as-Saddiq, instead of al-Rahim, al-Saddiq).

A detailed introduction to the Bahá'í Persian romanization can usually be found at the back of a Bahá'í scripture.

ASCII Internet romanizations

It is common to write Persian language with only the Latin alphabet (as opposed to the Persian alphabet) especially in online chat, social networks, emails and SMS. It has developed and spread due to a former lack of software supporting the Persian alphabet, and/or due to a lack of knowledge about the software that was available. Although Persian writing is supported in recent operating systems, there are still many cases where the Persian alphabet is unavailable and there is a need for an alternative way to write Persian with the basic Latin alphabet. This way of writing is sometimes called Fingilish or Pingilish (a portmanteau of Farsi or Persian and English). In most cases this is an ad hoc simplification of the scientific systems listed above (such as ALA-LC or BGN/PCGN), but ignoring any special letters or diacritical signs. ع may be written using the numeral "3", as in the Arabic chat alphabet.

Tajik Latin alphabet

The Tajik language or Tajik Persian is a variety of the Persian language. It was written in Tajik SSR in a standardized Latin script from 1926 until the late 1930s, when the script was officially changed to Cyrillic. However, Tajik phonology differs slightly from that of Persian in Iran. As the result of these two factors romanization schemes of the Tajik Cyrillic script follow rather different principles. [14]

The Tajik alphabet in Latin (1928-1940) [15]
A aB ʙC cÇ çD dE eF fG g Ƣ ƣ H hI iĪ ī
J jK kL lM mN nO oP pQ qR rS sŞ şT t
U uŪ ūV vX xZ z Ƶ ƶ ʼ

See also

