Last updated

Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus trans- + liter- ) in predictable ways, such as Greek α a , Cyrillic д d , Greek χ → the digraph ch , Armenian ն n or Latin æ ae .


For instance, for the Modern Greek term "Ελληνική Δημοκρατία", which is usually translated as "Hellenic Republic", the usual transliteration to Latin script is Ellēnikḗ Dēmokratía, and the name for Russia in Cyrillic script, "Россия", is usually transliterated as Rossija.

Transliteration is not primarily concerned with representing the sounds of the original but rather with representing the characters, ideally accurately and unambiguously. Thus, in the Greek above example, λλ is transliterated ll though it is pronounced [l], Δ is transliterated D though pronounced [ð], and η is transliterated ē, though it is pronounced [i] (exactly like ι) and is not long.

Conversely, transcription notes the sounds rather than the orthography of a text. So "Ελληνική Δημοκρατία" could be transcribed as [elinikí ðimokratía] , which does not specify which of the [i] sounds are written with the Greek letter η and which with ι.

Angle brackets may be used to set off transliteration, as opposed to slashes and square brackets for phonetic transcription. Angle brackets may also be used to set off characters in the original script. Conventions and author preferences vary.


Systematic transliteration is a mapping from one system of writing into another, typically grapheme to grapheme. Most transliteration systems are one-to-one, so a reader who knows the system can reconstruct the original spelling.

Transliteration is opposed to transcription, which maps the sounds of one language into a writing system. Still, most systems of transliteration map the letters of the source script to letters pronounced similarly in the target script, for some specific pair of source and target language. If the relations between letters and sounds are similar in both languages, a transliteration may be very close to a transcription. In practice, there are some mixed transliteration/transcription systems that transliterate a part of the original script and transcribe the rest.

For many script pairs, there is one or more standard transliteration systems. However, unsystematic transliteration is common.

Difference from transcription

In Modern Greek, the letters ⟨η⟩ ⟨ι⟩ ⟨υ⟩ and the letter combinations ⟨ει⟩ ⟨oι⟩ ⟨υι⟩ are pronounced [i] (except when pronounced as semivowels), and a modern transcription renders them all as ⟨i⟩; but a transliteration distinguishes them, for example by transliterating to ⟨ē⟩ ⟨i⟩ ⟨y⟩ and ⟨ei⟩ ⟨oi⟩ ⟨yi⟩. (As the ancient pronunciation of ⟨η⟩ was [ɛː], it is often transliterated as an ⟨e⟩ with a macron, even for modern texts.) On the other hand, ⟨ευ⟩ is sometimes pronounced [ev] and sometimes [ef], depending on the following sound. A transcription distinguishes them, but this is no requirement for a transliteration. The initial letter 'h' reflecting the historical rough breathing in words such as Ellēnikē should logically be omitted in transcription from Koine Greek on, [1] and from transliteration from 1982 on, but it is nonetheless frequently encountered.

Greek wordTransliterationTranscriptionEnglish translation
Ελληνική ΔημοκρατίαEllēnikē DēmokratiaElinikí DhimokratíaHellenic Republic
των υιώνtōn uiōnton ionof the sons


A simple example of difficulties in transliteration is the Arabic letter qāf. It is pronounced, in literary Arabic, approximately like English [k], except that the tongue makes contact not on the soft palate but on the uvula, but the pronunciation varies between different dialects of Arabic. The letter is sometimes transliterated into "g", sometimes into "q" and rarely even into "k" in English. [2] Another example is the Russian letter "Х" (kha). It is pronounced as the voiceless velar fricative /x/, like the Scottish pronunciation of ch in "loch". This sound is not present in most forms of English and is often transliterated as "kh" as in Nikita Khrushchev. Many languages have phonemic sounds, such as click consonants, which are quite unlike any phoneme in the language into which they are being transliterated.

Some languages and scripts present particular difficulties to transcribers. These are discussed on separate pages.


See also

Related Research Articles

Alphabet Standard set of letters that represent phonemes of a spoken language

An alphabet is a standardized set of basic written symbols or graphemes that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllable, for instance, and logographic systems use characters to represent words, morphemes, or other semantic units.

A diacritic is a glyph added to a letter or basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. Diacritic is primarily an adjective, though sometimes used as a noun, whereas diacritical is only ever an adjective. Some diacritical marks, such as the acute ( ´ ) and grave ( ` ), are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters.

Greeklish, a portmanteau of the words Greek and English, also known as Grenglish, Latinoellinika/Λατινοελληνικά or ASCII Greek, is the Greek language written using the Latin alphabet. Unlike standardized systems of Romanization of Greek, as used internationally for purposes such as rendering Greek proper names or place names, or for bibliographic purposes, the term Greeklish mainly refers to informal, ad-hoc practices of writing Greek text in environments where the use of the Greek alphabet is technically impossible or cumbersome, especially in electronic media. Greeklish was commonly used on the Internet when Greek people communicate by forum, e-mail, IRC, instant messaging and occasionally on SMS, mainly because older operating systems did not have the ability to write in Greek, or in a unicode form like UTF-8. Nowadays most Greek language content appears in the Greek alphabet.

In the polytonic orthography of Ancient Greek, the rough breathing character, is a diacritical mark used to indicate the presence of an sound before a vowel, diphthong, or after rho. It remained in the polytonic orthography even after the Hellenistic period, when the sound disappeared from the Greek language. In the monotonic orthography of Modern Greek phonology, in use since 1982, it is not used at all.


Romanization or romanisation, in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

A caron, háček or haček also known as a hachek, wedge, check, kvačica, strešica, mäkčeň, paukščiukas, inverted circumflex, inverted hat, or flying bird, is a diacritic (ˇ) commonly placed over certain letters in the orthography of some Baltic, Slavic, Finnic, Samic, Berber, and other languages to indicate a change in the related letter's pronunciation.

Devanāgarī is an Indian script used for languages including Hindi, Marathi, Nepali and Sanskrit. There are several somewhat similar methods of transliteration from Devanāgarī to the Roman script, including the influential and lossless IAST notation.

When used as a diacritic mark, the term dot is usually reserved for the interpunct, or to the glyphs 'combining dot above' ( ◌̇ ) and 'combining dot below' ( ◌̣ ) which may be combined with some letters of the extended Latin alphabets in use in Central European languages and Vietnamese.


Cyrillization or Cyrillisation is the process of rendering words of a language that normally uses a writing system other than Cyrillic script into the Cyrillic alphabet. Although such a process has often been carried out in an ad hoc fashion, the term "cyrillization" usually refers to a consistent system applied, for example, to transcribe names of German, Chinese, or English people and places for use in Russian, Ukrainian, Serbian, Macedonian or Bulgarian newspapers and books. Cyrillization is analogous to romanization, when words from a non-Latin-script-using language are rendered in the Latin alphabet for use

A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.

I (Cyrillic)

I is a letter used in almost all Cyrillic alphabets.

The Greek alphabet has been used to write the Greek language since the late ninth or early eighth century BC. It is derived from the earlier Phoenician alphabet, and was the first alphabetic script in history to have distinct letters for vowels as well as consonants. In Archaic and early Classical times, the Greek alphabet existed in many local variants, but, by the end of the fourth century BC, the Euclidean alphabet, with twenty-four letters, ordered from alpha to omega, had become standard and it is this version that is still used to write Greek today. These twenty-four letters are: Α α, Β β, Γ γ, Δ δ, Ε ε, Ζ ζ, Η η, Θ θ, Ι ι, Κ κ, Λ λ, Μ μ, Ν ν, Ξ ξ, Ο ο, Π π, Ρ ρ, Σ σ/ς, Τ τ, Υ υ, Φ φ, Χ χ, Ψ ψ, and Ω ω.

The romanization or Latinization of Ukrainian is the representation of the Ukrainian language using Latin letters. Ukrainian is natively written in its own Ukrainian alphabet, which is based on the Cyrillic script. Romanization may be employed to represent Ukrainian text or pronunciation for non-Ukrainian readers, on computer systems that cannot reproduce Cyrillic characters, or for typists who are not familiar with the Ukrainian keyboard layout. Methods of romanization include transliteration, representing written text, and transcription, representing the spoken word.

Romanization of Greek is the transliteration (letter-mapping) or transcription (sound-mapping) of text from the Greek alphabet into the Latin alphabet. The conventions for writing and romanizing Ancient Greek and Modern Greek differ markedly, which can create confusion. The sound of the English letter B was written as β in ancient Greek but is now written as the digraph μπ, while the modern β sounds like the English letter V instead. The Greek name Ἰωάννης became Johannes in Latin and then John in English, but in modern Greek has become Γιάννης; this might be written as Yannis, Jani, Ioannis, Yiannis, or Giannis, but not Giannes or Giannēs as it would be for ancient Greek. The word Άγιος might variously appear as Hagiοs, Agios, Aghios, or Ayios, or simply be translated as "Holy" or "Saint" in English forms of Greek placenames.

The romanization of Arabic refers to the standard norms for rendering written and spoken Arabic in the Latin script in one of various systematic ways. Romanized Arabic is used for a number of different purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used moreover or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters and are used in academic settings or for the benefit of non-speakers, contrast with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.

The Arabic chat alphabet, Arabizi, Franco-Arabic, Arabish, Araby, and Mu'arrab (معرب), refer to the Romanized alphabets for informal Arabic dialects in which script is transcribed or encoded into a combination of Latin script and Arabic numerals. These informal chat alphabets were originally used primarily by youth in the Arab world in very informal settings—especially for communicating over the Internet or for sending messages via cellular phones—though use is not necessarily restricted by age any more and these chat alphabets have been used in other media such as advertising.

The diaeresis and the umlaut are two different homoglyphic diacritical marks. They both consist of two dots ( ¨ ) placed over a letter, usually a vowel. When that letter is an i or a j, the diacritic replaces the tittle: ï.

Latin script writing system used for most European languages

Latin script, also known as Roman script, is a set of graphic signs (script) based on the letters of the classical Latin alphabet. This is derived from a form of the Cumaean Greek version of the Greek alphabet used by the Etruscans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.

Heta is a conventional name for the historical Greek alphabet letter Eta (Η) and several of its variants, when used in their original function of denoting the consonant.

Greek orthography has used a variety of diacritics starting in the Hellenistic period. The more complex polytonic orthography, which includes five diacritics, notates Ancient Greek phonology. The simpler monotonic orthography, introduced in 1982, corresponds to Modern Greek phonology, and requires only two diacritics.