ISO 9

Last updated

ISO 9 is an international standard establishing a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages. [1]

Contents

Published on February 23, 1995 by the International Organization for Standardization, [2] the major advantage ISO 9 has over other competing systems is its univocal system of one character for one character equivalents (by the use of diacritics), which faithfully represents the original spelling and allows for reverse transliteration, even if the language is unknown.

Earlier versions of the standard, ISO/R 9:1954, ISO/R 9:1968 and ISO 9:1986, were more closely based on the international scholarly system for linguistics (scientific transliteration), but have diverged in favour of unambiguous transliteration over phonemic representation. The edition of 1995 supersedes the edition of 1986. [1]

ISO 9:1995

The standard features three mapping tables: the first covers contemporary Slavic languages, the second older Slavic orthographies (excluding letters from the first), and the third non-Slavic languages (including most letters from the first). Several Cyrillic characters included in ISO 9 are not available as pre-composed characters in Unicode, nor are some of the transliterations; combining diacritical marks have to be used in these cases. Unicode, on the other hand, includes some historic characters that are not dealt with in ISO 9.

Transliteration table

The following combined table shows characters for various Slavic, Iranian, Romance, Turkic, Uralic, Mongolic, Caucasian, Tungusic, Paleosiberian and other languages of the former USSR which are written in Cyrillic.

ISO 9:1995
CyrillicLatin
CharCharUnicodeDescription
АаAa
ӒӓÄä00C400E4a diaeresis
Ӓ̄ӓ̄Ạ̈ạ̈00C4+032300E4+0323a diaeresis and dot below
ӐӑĂă01020103a breve
А̄а̄Āā01000101a macron
ӔӕÆæ00C600E6ae ligature
А́а́Áá00C100E1a acute
А̊а̊Åå00C500E5a ring
БбBb
ВвVv
ГгGg
Ґґ0047+03000067+0300g grave
ЃѓǴǵ01F401F5g acute
ҒғĠġ01200121g dot
ҔҕĞğ011E011Fg breve
Һһ1E241E25h dot
ДдDd
ЂђĐđ01100111d stroke
ЕеEe
ӖӗĔĕ01140115e breve
ЁёËë00CB00EBe diaeresis
ЄєÊê00CA00EAe circumflex
ЖжŽž017D017Ez caron
ҖҗŽ̦ž̦017D+0326017E+0326z caron and comma below [3]
Ž̧ž̧017D+0327017E+0327z caron and cedilla [3]
Ӝӝ005A+0304007A+0304z macron
Ӂӂ005A+0306007A+0306z breve
ЗзZz
Ӟӟ005A+0308007A+0308z diaeresis
ӠӡŹź0179017Az acute
Ѕѕ1E901E91z circumflex
ИиIi
ӢӣĪī012A012Bi macron
И́и́Íí00CD00EDi acute
ӤӥÎî00CE00EEi circumflex
ЙйJj
ІіÌì00CC00ECi grave
ЇїÏï00CF00EFi diaeresis
І̄і̄Ǐǐ01CF (012C)01D0 (012D)i caron (or breve)
Јјǰ004A+030C01F0j caron
Ј̵ј̵004A+0301006A+0301j acute
КкKk
Ќќ1E301E31k acute
Ӄӄ1E321E33k dot below
Ҝҝ004B+0302006B+0302k circumflex
ҠҡǨǩ01E801E9k caron
Ҟҟ004B+0304006B+0304k macron
Ққ004B+0326006B+0326k comma below [3]
Ķķ01360137k cedilla [3]
К̨к̨004B+0300006B+0300k grave
ԚԛQq
ЛлLl
Љљ004C+0302006C+0302l circumflex
Ԡԡ004C+0326006C+0326l comma below [3]
Ļļ013B013Cl cedilla [3]
МмMm
НнNn
Њњ004E+0302006E+0302n circumflex
Ңң004E+0326006E+0326n comma below [3]
Ņņ01450146n cedilla [3]
Ӊӊ1E461E47n dot below
Ҥҥ1E441E45n dot
ԊԋǸǹ01F801F9n grave
ԢԣŃń01430144n acute
ӇӈŇň01470148n caron
Н̄н̄004E+0304006E+0304n macron
ОоOo
ӦӧÖö00D600F6o diaeresis
ӨөÔô00D400F4o circumflex
ӪӫŐő01500151o double acute
Ӧ̄о̄̈Ọ̈ọ̈00D6+032300F6+0323o diaeresis and dot below
ҨҩÒò00D200F2o grave
О́о́Óó00D300F3o acute
О̄о̄Ōō014C014Do macron
ŒœŒœ01520153oe ligature
ПпPp
Ҧҧ1E541E55p acute
Ԥԥ0050+03000070+0300p grave
РрRr
СсSs
ҪҫȘș02180219s comma below [3]
Şş015E015Fs cedilla [3]
С̀с̀0053+03000073+0300s grave
ТтTt
ЋћĆć01060107c acute
Ԏԏ0054+03000074+0300t grave
Т̌т̌Ťť01640165t caron
ҬҭȚț021A021Bt comma below [3]
Ţţ01620163t cedilla [3]
УуUu
ӰӱÜü00DC00FCu diaeresis
ӮӯŪū016A016Bu macron
ЎўŬŭ016C016Du breve
ӲӳŰű01700171u double acute
У́у́Úú00DA00FAu acute
Ӱ̄ӱ̄Ụ̈ụ̈00DC+032300FC+0323u diaeresis and dot below
Ụ̄ụ̄016A+0323016B+0323u macron and dot below
ҮүÙù00D900F9u grave
Ұұ0055+03070075+0307u dot
ԜԝWw
ФфFf
ХхHh
Ҳҳ0048+03260068+0326h comma below [3]
1E281E29h cedilla [3]
ЦцCc
Ҵҵ0043+03040063+0304c macron
Џџ0044+03020064+0302d circumflex
ЧчČč010C010Dc caron
Ҷҷ0043+03260063+0326c comma below [3]
Çç00C700E7c cedilla [3]
Ӌӌ0043+03230063+0323c dot below
Ӵӵ0043+03080063+0308c diaeresis
ҸҹĈĉ01080109c circumflex
Ч̀ч̀0043+03000063+0300c grave
Ҽҽ0043+03060063+0306c breve
ҾҿC̨̆c̨̆0043+0328+03060063+0328+0306c ogonek [3] and breve
ШшŠš01600161s caron
ЩщŜŝ015C015Ds circumflex
Ъъʺ02BAmodifier letter double prime [4]
ЫыYy
ӸӹŸÿ017800FFy diaeresis
Ы̄ы̄Ȳȳ02320233y macron
Ььʹ02B9modifier letter prime [4]
ЭэÈè00C800E8e grave
Әә0041+030B0061+030Ba double acute
ӚӛÀà00C000E0a grave
ЮюÛû00DB00FBu circumflex
Ю̄ю̄Û̄û̄00DB+030400FB+0304u circumflex and macron
ЯяÂâ00C200E2a circumflex
ѢѣĚě011A011Be caron
ѪѫǍǎ01CD01CEa caron
Ѳѳ0046+03000066+0300f grave
Ѵѵ1EF21EF3y grave
Ӏ2021double dagger
ʼʼ02BCmodifier apostrophe
ˮˮ02EEmodifier double apostrophe

National adoptions

DateRegionNameDescriptive name
1995-06-01FranceNF ISO 9:1995-06-01 [5] Information et documentation - Translittération des caractères cyrilliques en caractères latins - Langues slaves et non slaves.
1995-09-29SwedenSS-ISO 9 [6] Translitterering av kyrilliska bokstäver till latinska - Slaviska och icke-slaviska språk
1997RomaniaSR ISO 9:1997 [7] Informare şi documentare. Transliterarea caracterelor chirilice în caractere latine. Limbi slave şi neslave
1997-12-11CroatiaHRN ISO 9:1997 [8] Informacije i dokumentacija—Transliteracija ćiriličnih u latinične znakove za slavenske i neslavenske jezike (ISO 9:1995)
2000PolandPN-ISO 9:2000 [9] Informacja i dokumentacja. Transliteracja znaków cyrylickich na znaki łacińskie — Języki słowiańskie i niesłowiańskie
2002LithuaniaLST ISO 9:2002Informacija ir dokumentai. Kirilicos rašmenų transliteravimas lotyniškais rašmenimis. Slavų ir ne slavų kalbos
2002-07-01RussiaGOST 7.79-2000 System AСистема стандартов по информации, библиотечному и издательскому делу. Правила транслитерации кирилловского письма латинским алфавитом
2002-10CzechiaČSN ISO 9 (010185) [10] Informace a dokumentace - Transliterace cyrilice do latinky - slovanské a neslovanské jazyky
2005-03-01ItalyUNI ISO 9:2005 [11] Informazione e documentazione - Traslitterazione dei caratteri cirillici in caratteri latini - Linguaggi slavi e non slavi
2005-11-01SloveniaSIST ISO 9:2005 [12] Informatika in dokumentacija – Transliteracija ciriličnih znakov v latinične znake – Slovanski in neslovanski jeziki
2011EstoniaEVS-ISO 9:2011 [13] Informatsioon ja dokumentatsioon. Kirillitsa translitereerimine ladina keelde. Slaavi ja mitte-slaavi keeled
2013 GCC: Bahrain, Kuwait, Oman, Qatar, Saudi Arabia, United Arab EmiratesGSO ISO 9:2013 [14] التوثيق والمعلومات - الحروف السير يليه بترجمة إلى اللغة اللاتينية - السلافيه وغير اللغات السلافيه

Sample text

The following text is a fragment of the Preamble of the Universal Declaration of Human Rights in Bulgarian: [15]

Като взе предвид, че признаването на достойнството, присъщо на всички членове на човешкия род,
на техните равни и неотменими права представлява основа на свободата, справедливостта и мира в света,
Kato vze predvid, če priznavaneto na dostojnstvoto, prisʺŝo na vsički členove na čoveškiâ rod,
na tehnite ravni i neotmenimi prava predstavlâva osnova na svobodata, spravedlivostta i mira v sveta,

ISO/R 9

ISO Recommendation No. 9, published 1954 and revised 1968, is an older version of the standard, with different transliteration for different Slavic languages, reflecting their phonemic differences. It is closer to the original international system of Slavist scientific transliteration.

A German adaptation of this standard was published by the Deutsches Institut für Normung as DIN 1460 (1982) for Slavic languages and supplemented by DIN 1460-2 (2010) for non-Slavic languages.

The languages covered are Russian (RU), Belarusian (BE), Ukrainian (UK), Bulgarian (BG), Serbo-Croatian (SH) and Macedonian (MK). For comparison, ISO 9:1995 is shown in the table below.

Alternative schemes: ISO/R 9:1968 permits some deviations from the main standard. In the table below, they are listed in the columns alternative 1 and alternative2.

  1. The first sub-standard defines some language-dependent transliterations for Russian (RU), Ukrainian (UK), Belarusian (BE) and Bulgarian (BG).
  2. The second sub-standard permits, in countries where tradition favours it, a set of alternative transliterations, but only as a group. It is identical to the British Standard 2979:1958 for Cyrillic romanization. [16]
ISO/R 9:1954, ISO/R 9:1968 and ISO 9:1995
CyrillicISO/R 9
1954
ISO/R 9 1968ISO 9
1995
Usage per language
basealt. 1alt. 2RUBEUKBGSHMK
А а A aA aYes
Б б B bB bYes
В в V vV vYes
Г г G gG gH h(BE, UK)G gYesRegionalYes
Ґ ґ Ġ ġG̀ g̀NoYesNo
Д д D dD dYes
Ѓ ѓ Ǵ ǵǴ ǵNoYes
Ђ ђ Đ đĐ đNoYesNo
Е е E eE eYes
Ё ё Ë ëË ëYesNo
Є є Je jeÊ êNoYesNo
Ж ж Ž žZh zhŽ žYes
З з Z zZ zYes
Ѕ ѕ Dz dzẐ ẑNoYes
И и I i, Y yI iY y(UK)I iYesNoRegionalYes
I і I iĪ īI i(BE, UK)Ì ìArchaicYesNo
Ї ї Ji jiÏ ïÏ ïNoYesNo
Й й J jĬ ĭJ jYesNo
Ј ј J jĴ ĵY yJ̌ ǰNoYes
К к K kK kYes
Л л L lL lYes
Љ љ Lj ljĹ ĺL̂ l̂NoYes
М м M mM mYes
Н н N nN nYes
Њ њ Nj njŃ ńN̂ n̂NoYes
О о O oO oYes
П п P pP pYes
Р р R rR rYes
С с S sS sYes
Т т T tT tYes
Ќ ќ Ḱ ḱḰ ḱNoYes
Ћ ћ Ć ćĆ ćNoYesNo
У у U uU uYes
Ў ў Ŭ ŭŬ ŭNoYesNo
Ф ф F fF fYes
Х х H hCh ch(BE, RU, UK)Kh khH hRegionalYes
Ц ц C cTs tsC cYes
Ч ч Č čCh chČ čYes
Џ џ Dž džDj djDĵ dĵD̂ d̂NoYes
Ш ш Š šSh shŠ šYes
Щ щ Šč šč, Št štŠč ščŠt št(BG)Shch shchŜ ŝYesNoYesRegionalNo
Ъ ъ Ă ă, "ʺĂ ă(BG)ʺYes [17] Archaic [17] Regional [18] No
Ы ы Y yY yYesNo
Ь ь ʹʹYesNo
Ѣ ѣ Ě ěĚ ěArchaicNo
Э э E̊ e̊È èYesNo
Ю ю Ju juYu yuÛ ûYesNo
Я я Ja jaYa ya âYesNo
", ’ArchaicYesNoRegional
Ѫ ѫ Ȧ ȧʺ̣Ȧ ȧ (BG)Ǎ ǎNoArchaic [18] No
Ѳ ѳ Ḟ ḟF̀ f̀ArchaicNo
Ѵ ѵ Ẏ ẏỲ ỳArchaicNo

See also

Notes

  1. 1 2 "ISO 9:1995: Information and documentation -- Transliteration of Cyrillic characters into Latin characters -- Slavic and non-Slavic languages". International Organization for Standardization. Retrieved 13 April 2012.
  2. "ISO 9:1995". www.standard.no.
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 The "informative" Annex A of ISO 9:1995 uses ISO 5426 0x52hook to left which can be mapped to Unicode's comma below U+0326 (while the ISO 5426 also has 0x50cedilla which can be mapped to Unicode's cedilla U+0327), it also uses ISO 5426 0x53hook to right which can be mapped to Unicode's ogonek U+0328. See for example Evertype.com's ISO 5426 Archived 2020-10-21 at the Wayback Machine mapping to Unicode or Joan M. Aliprand's Finalized Mapping between Characters of ISO 5426 and ISO/IEC 10646-1 Archived 2020-08-02 at the Wayback Machine .
  4. 1 2 Evertype.com: ISO 5426 mapping to Unicode Archived 2020-10-21 at the Wayback Machine ; Joan M. Aliprand: Finalized Mapping between Characters of ISO 5426 and ISO/IEC 10646-1 Archived 2020-08-02 at the Wayback Machine ; The Unicode Standard: Spacing Modifier Letters Archived 2019-06-13 at the Wayback Machine .
  5. "DIN - German Institute for Standardization". Archived from the original on 3 February 2019. Retrieved 3 February 2019.
  6. "Standard - Information and documentation - Transliteration of Cyrillic characters into Latin characters - Slavic and non-Slavic languages SS-ISO 9 - Swedish Institute for Standards, SIS".
  7. "Magazin ASRO". magazin.asro.ro. Archived from the original on 13 February 2019.
  8. "HRVATSKI NORMATIVNI DOKUMENT" . Retrieved 21 December 2022.
  9. "Sklep PKN". sklep.pkn.pl.
  10. "ČSN ISO 9 (010185)". www.technicke-normy-csn.cz.
  11. "Uni Iso 9:2005".
  12. "Spletna trgovina SIST - SIST ISO 9:2005".
  13. "EVS-ISO 9:2011". EVS.
  14. "GSO ISO 9:2013 - متجر المواصفات - Ministry of Industry, Commerce & Tourism - Kingdom of Bahrain". Archived from the original on 21 December 2022. Retrieved 3 February 2019.
  15. "OHCHR | Universal Declaration of Human Rights - Bulgarian (Balgarski)". OHCHR.
  16. Hans H. Wellisch (1978), The Conversion of Scripts: Its Nature, History, and Utilization, New York City: Wiley, p. 262, Wikidata   Q104231343
  17. 1 2 In Russian and Belarusian, ъ is not transliterated at the end of a word (where it occurred in the pre-1918 orthography).
  18. 1 2 In Bulgarian, ъ and ѫ are not transliterated at the end of a word (where it occurred in the pre-1945 orthography).

Related Research Articles

<span class="mw-page-title-main">Cyrillic script</span> Writing system used for various Eurasian languages

The Cyrillic script, Slavonic script or simply Slavic script is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Eastern Europe, the Caucasus, Central Asia, North Asia, and East Asia, and used by many other minority languages.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways, such as Greek ⟨α⟩⟨a⟩, Cyrillic ⟨д⟩⟨d⟩, Greek ⟨χ⟩ → the digraph ⟨ch⟩, Armenian ⟨ն⟩⟨n⟩ or Latin ⟨æ⟩⟨ae⟩.

<span class="mw-page-title-main">Romanization</span> Transliteration or transcription to Latin letters

In linguistics, romanization is the conversion of text from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

A caron is a diacritic mark commonly placed over certain letters in the orthography of some languages to indicate a change of the related letter's pronunciation.

<span class="mw-page-title-main">Š</span> Latin letter S with caron

The grapheme Š, š is used in various contexts representing the sh sound like in the word show, usually denoting the voiceless postalveolar fricative /ʃ/ or similar voiceless retroflex fricative /ʂ/. In the International Phonetic Alphabet this sound is denoted with ʃ or ʂ, but the lowercase š is used in the Americanist phonetic notation, as well as in the Uralic Phonetic Alphabet. It represents the same sound as the Turkic letter Ş and the Romanian letter Ș (S-comma), the Hebrew and Yiddish letter ש, the Ge'ez (Ethiopic) letter ሠ, the Arabic letter ش, and the Armenian letter Շ(շ).

<span class="mw-page-title-main">Romanian alphabet</span> Variant of the Latin alphabet

The Romanian alphabet is a variant of the Latin alphabet used for writing the Romanian language. It is a modification of the classical Latin alphabet and consists of 31 letters, five of which have been modified from their Latin originals for the phonetic requirements of the language:

<span class="mw-page-title-main">Michael Everson</span> American-Irish type designer (born 1963)

Michael Everson is an American and Irish linguist, script encoder, typesetter, type designer and publisher. He runs a publishing company called Evertype, through which he has published over one hundred books since 2006.

<span class="mw-page-title-main">Romanization of Russian</span> Romanization of the Russian alphabet

The romanization of the Russian language, aside from its primary use for including Russian names and words in text written in a Latin alphabet, is also essential for computer users to input Russian text who either do not have a keyboard or word processor set up for inputting Cyrillic, or else are not capable of typing rapidly using a native Russian keyboard layout (JCUKEN). In the latter case, they would type using a system of transliteration fitted for their keyboard layout, such as for English QWERTY keyboards, and then use an automated tool to convert the text into Cyrillic.

The romanization of Ukrainian, or Latinization of Ukrainian, is the representation of the Ukrainian language in Latin letters. Ukrainian is natively written in its own Ukrainian alphabet, which is based on the Cyrillic script. Romanization may be employed to represent Ukrainian text or pronunciation for non-Ukrainian readers, on computer systems that cannot reproduce Cyrillic characters, or for typists who are not familiar with the Ukrainian keyboard layout. Methods of romanization include transliteration and transcription.

KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. It is an extension of KOI-7 which allows the use of the Latin alphabet along with the Russian alphabet, both the upper and lower case letters; however, the letter Ёё and the uppercase Ъ are missed, the latter to avoid conflicts with the delete character. The first 127 code points are identical to ASCII with the exception of the dollar sign $ replaced by the universal currency sign ¤. The rows x8_ and x9_ might be filled with the additional control characters from EBCDIC.

<span class="mw-page-title-main">Kra (letter)</span> Letter used in an Inuktitut dialect

Kra is a glyph formerly used to write the Kalaallisut language of Greenland and is now only found in Nunatsiavummiutut, a distinct Inuktitut dialect. It is visually similar to a Latin small capital letter K, a Greek letter Kappa: κ, or a Cyrillic small letter Ka: к.

<span class="mw-page-title-main">Romanization of Bulgarian</span> Transliteration of Bulgarian text

Romanization of Bulgarian is the practice of transliteration of text in Bulgarian from its conventional Cyrillic orthography into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names and place names in foreign-language contexts, or for informal writing of Bulgarian in environments where Cyrillic is not easily available. Official use of romanization by Bulgarian authorities is found, for instance, in identity documents and in road signage. Several different standards of transliteration exist, one of which was chosen and made mandatory for common use by the Bulgarian authorities in a law of 2009.

As of Unicode version 15.1, Cyrillic script is encoded across several blocks:

Scientific transliteration, variously called academic, linguistic, international, or scholarly transliteration, is an international system for transliteration of text from the Cyrillic script to the Latin script (romanization). This system is most often seen in linguistics publications on Slavic languages.

<span class="mw-page-title-main">Latin script</span> Writing system based on the alphabet used by the Romans

The Latin script, also known as the Roman script, and technically Latin writing system, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern Italy. The Greek alphabet was altered by the Etruscans, and subsequently their alphabet was altered by the Romans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.

<span class="mw-page-title-main">Romanization of Georgian</span> Transliteration of text from the Georgian script into the Latin script

Romanization of Georgian is the process of transliterating the Georgian language from the Georgian script into the Latin script.

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

ISO 5426 is a character set developed by ISO, similar to ISO/IEC 6937. It was first published in 1983.

The DIN standard DIN 91379: "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" defines a normative subset of Unicode Latin characters, sequences of base characters and diacritic signs, and special characters for use in names of persons, legal entities, products, addresses etc. The standard defines a normative mapping of Latin letters to base letters A-Z as an extension of the recommendations of ICAO.