Scientific transliteration of Cyrillic

Last updated

Scientific transliteration, variously called academic, linguistic, international, or scholarly transliteration, is an international system for transliteration of text from the Cyrillic script to the Latin script (romanization). This system is most often seen in linguistics publications on Slavic languages.



The scientific transliteration system is roughly as phonemic as is the orthography of the language transliterated. The deviations are with щ, where the transliteration makes clear that two phonemes are involved, and џ, where it fails to represent the (monophonemic) affricate with a single letter. The transliteration system is based on the Gaj's Latin alphabet used in Serbo-Croatian, in which each letter corresponds directly to a Cyrillic letter in Bosnian, Montenegrin and Serbian official standards, and was heavily based on the earlier Czech alphabet. The Cyrillic letter х, representing the sound [χ] as in Bach, was romanized h in Serbo-Croatian, but in German-speaking countries the native digraph ch was used instead. [1] It was codified in the 1898 Prussian Instructions for libraries, or Preußische Instruktionen (PI), which were adopted in Central Europe and Scandinavia. Scientific transliteration can also be used to romanize the early Glagolitic alphabet, which has a close correspondence to Cyrillic.

Scientific transliteration is often adapted to serve as a phonetic alphabet. [2]

Scientific transliteration was the basis for the ISO 9 transliteration standard. While linguistic transliteration tries to preserve the original language's pronunciation to a certain degree, the latest version of the ISO standard (ISO 9:1995) has abandoned this concept, which was still found in ISO/R 9:1968 and is now restricted to a one-to-one mapping of letters. It thus allows for unambiguous reverse transliteration into the original Cyrillic text and is language-independent.

The previous official Soviet romanization system, GOST 16876-71, is also based on scientific transliteration but used Latin h for Cyrillic х instead of Latin x or ssh and sth for Cyrillic Щ, and had a number of other differences. Most countries using Cyrillic script now have adopted GOST 7.79 instead, which is not the same as ISO 9 but close to it.

Representing all of the necessary diacritics on computers requires Unicode, Latin-2, Latin-4, or Latin-7 encoding.


Prussian Instructions, scientific transliteration, and ISO 9
Cyrillicscientific transliterationPI [3] ISO 9
А а aaaaaaaaa
Б б bbbbbbbbb
В в vvvvvvvvv
Г г ggghhggg (h BEUK)g
Ґ ґ g [lower-alpha 1] gġ
Д д ddddddddd
Ѓ ѓ ǵǵ
Ђ ђ đ (dj)đ
Е е eeeeeeee
Ё ё ëëëë
Є є ejejeê
Ж ж žžžžžžžžž
З з zzzzzzzzz
Ѕ ѕ dzdz
И и iiiyiiii
I і ii [lower-alpha 1] iiīì
Ї ї iji (ï)ï
Й й jjjjjj
Ј ј jjjǰ
К к kkkkkkkkk
Л л lllllllll
Љ љ lj (ļ)lj
М м mmmmmmmmm
Н н nnnnnnnnn
Њ њ nj (ń)njń
О о ooooooooo
П п ppppppppp
Р р rrrrrrrrr
С с sssssssss
Т т ttttttttt
Ќ ќ
Ћ ћ ǵććć
У у uuuuuuuu
ОУ оу u
Ў ў ŭ (w)ŭ
Ф ф fffffffff
Х х xhxxx (ch)hhchh
Ц ц ccccccccc
Ч ч ččččččččč
Џ џ dž (ģ)ǵ
Ш ш ššššššššš
Щ щ šč (št)štščščšč (št BG)ŝ
Ъ ъ ъ (ǔ)ǎʺ- [lower-alpha 2] BG)ʺ
Ы ы y (ū)yyyy
Ь ь ь (ǐ)jʹʹʹʹʹ
Ѣ ѣ ěě [lower-alpha 1] ě [lower-alpha 1] ě [lower-alpha 1] ěě
Э э èèėè
Ю ю jujujujujujuû
Я я jajajajajaâ
 ʼ  ʼ
Ѡ ѡ o, ô
Ѧ ѧ ę
Ѩ ѩ
Ѫ ѫ ǫăǎ
Ѭ ѭ
Ѯ ѯ ks
Ѱ ѱ ps
Ѳ ѳ th (θ)f [lower-alpha 1] f [lower-alpha 1] f [lower-alpha 1]
Ѵ ѵ ü(i) [lower-alpha 1] (i) [lower-alpha 1] (i) [lower-alpha 1]
Ѥ ѥ je
Ꙗ ꙗ ja
  1. 1 2 3 4 5 6 7 8 9 10 11 archaic letter
  2. Indicated by - (hyphen) if medial, disregarded if final.

( ) Letters in parentheses are older or alternate transliterations.Ukrainian and Belarusian apostrophe are not transcribed. The early Cyrillic letter koppa (Ҁ, ҁ) was used only for transliterating Greek and its numeric value and was thus omitted. Prussian Instructions and ISO 9:1995 are provided for comparison.

Unicode encoding is:

See also


  1. Hans H. Wellisch (1978), The Conversion of Scripts, New York City: Wiley, p. 257, Wikidata   Q104231343
  2. Timberlake 2004, p 24.
  3. Hans H. Wellisch (1978), The Conversion of Scripts, New York City: Wiley, pp. 260–62, Wikidata   Q104231343
  4. 1 2 The templates {{ softsign }} and {{ hardsign }} may be used for the proper character.

Related Research Articles

Cyrillic script Writing system used for various languages of Eurasia

The Cyrillic script is a writing system used for various languages across Eurasia and is used as the national script in various Slavic, Turkic, Mongolic and Iranic-speaking countries in Southeastern Europe, Eastern Europe, the Caucasus, Central Asia, North Asia and East Asia.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways, such as Greek ⟨α⟩ → ⟨a⟩, Cyrillic ⟨д⟩ → ⟨d⟩, Greek ⟨χ⟩ → the digraph ⟨ch⟩, Armenian ⟨ն⟩ → ⟨n⟩ or Latin ⟨æ⟩ → ⟨ae⟩.


Romanization or romanisation, in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

Devanāgarī is an Indian script used for languages including Hindi, Marathi, Nepali and Sanskrit. There are several somewhat similar methods of transliteration from Devanāgarī to the Roman script, including the influential and lossless IAST notation.

The ISO international standard ISO 9 establishes a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages.

Romanization of Russian Romanization of the Russian alphabet

Romanization of Russian is the process of transliterating the Russian language from the Cyrillic script into the Latin script.

The romanizationof Ukrainian is the representation of the Ukrainian language using Latin letters. Ukrainian is natively written in its own Ukrainian alphabet, which is based on the Cyrillic script. Romanization may be employed to represent Ukrainian text or pronunciation for non-Ukrainian readers, on computer systems that cannot reproduce Cyrillic characters, or for typists who are not familiar with the Ukrainian keyboard layout. Methods of romanization include transliteration, representing written text, and transcription, representing the spoken word.

The Ukrainian alphabet is the set of letters used to write Ukrainian, the official language of Ukraine. It is one of the national variations of the Cyrillic script. The modern Ukrainian alphabet consists of 33 letters.

Ayin is the sixteenth letter of the Semitic abjads, including Phoenician ʿayin, Hebrew ʿayinע‎, Aramaic ʿē, Syriac ʿē ܥ, and Arabic ʿayn ع‎.

Romanization of Bulgarian

Romanization of Bulgarian is the practice of transliteration of text in Bulgarian from its conventional Cyrillic orthography into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names and place names in foreign-language contexts, or for informal writing of Bulgarian in environments where Cyrillic is not easily available. Official use of romanization by Bulgarian authorities is found, for instance, in identity documents and in road signage. Several different standards of transliteration exist, one of which was chosen and made mandatory for common use by the Bulgarian authorities in a law of 2009.

The romanization of Arabic refers to the standard norms for rendering written and spoken Arabic in the Latin script in one of various systematic ways. Romanized Arabic is used for a number of different purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used moreover or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters and are used in academic settings or for the benefit of non-speakers, contrast with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.

Romanization or Latinization of Belarusian is any system for transliterating written Belarusian from Cyrillic to the Latin.

BGN/PCGN romanization system for Russian is a method for romanization of Cyrillic Russian texts, that is, their transliteration into the Latin alphabet as used in the English language.

GOST 16876-71 is a romanization system devised by the National Administration for Geodesy and Cartography of the Soviet Union. It is based on the scientific transliteration system used in linguistics. GOST was an international standard so it included provision for a number of the languages of the Soviet Union. The standard was revised twice in 1973 and 1980 with minor changes.

The American Library Association and Library of Congress Romanization Tables for Russian, or the Library of Congress system, are a set of rules for the romanization of Russian-language text from Cyrillic script to Latin script.

The Romanization of Macedonian is the transliteration of text in the Macedonian language from the Macedonian Cyrillic alphabet into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names in foreign contexts, or for informal writing of Macedonian in environments where Cyrillic is not easily available. Official use of Romanization by North Macedonia's authorities is found, for instance, on road signage and in passports. Several different codified standards of transliteration currently exist and there is widespread variability in practice.

Latin script writing system used for most European languages

Latin script, also known as Roman script, is a set of graphic signs (script) based on the letters of the classical Latin alphabet. This is derived from a form of the Cumaean Greek version of the Greek alphabet used by the Etruscans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.

Informal or ad hoc romanizations of Cyrillic have been in use since the early days of electronic communications, starting from early e-mail and bulletin board systems. Their use faded with the advances in the Russian internet that made support of Cyrillic script standard, but resurfaced with the proliferation of instant messaging, SMS and mobile phone messaging in Russia.

There are various systems of romanization of the Armenian alphabet.

Cyrillic alphabets Related alphabets based on Cyrillic scripts

Numerous Cyrillic alphabets are based on the Cyrillic script. The early Cyrillic alphabet was developed in the First Bulgarian Empire during the 9th century AD at the Preslav Literary School by Saint Clement of Ohrid and Saint Naum and replaced the earlier Glagolitic script developed by the Byzantine theologians Cyril and Methodius. It is the basis of alphabets used in various languages, past and present, in parts of Southeastern Europe and Northern Eurasia, especially those of Slavic origin, and non-Slavic languages influenced by Russian. As of 2011, around 252 million people in Eurasia use it as the official alphabet for their national languages. About half of them are in Russia. Cyrillic is one of the most-used writing systems in the world.