Informal romanizations of Cyrillic

Last updated

Informal or ad hoc romanizations of Cyrillic have been in use since the early days of electronic communications, starting from early e-mail and bulletin board systems. [1] Their use faded with the advances in the Russian internet that made support of Cyrillic script standard, [1] but resurfaced with the proliferation of instant messaging, SMS and mobile phone messaging in Russia.

Contents

Development

Due to its informal character, there was neither a well-established standard nor a common name. In the early days of e-mail, the humorous term "Volapuk encoding" (Russian : кодировка "воляпюк" or "волапюк", romanized: kodirovka volapyuk) was sometimes used. [1]

More recently the term "translit" emerged to indiscriminately refer to both programs that transliterate Cyrillic (and other non-Latin alphabets) into Latin, as well as the result of such transliteration. The word is an abbreviation of the term transliteration, and most probably its usage originated in several places. An example of early "translit" is the DOS program TRANSLIT [2] by Jan Labanowski, which runs from the command prompt to convert a Cyrillic file to a Latin one using a specified transliteration table.

There are two basic varieties of romanization of Russian: transliterations and Leetspeak-type of rendering of Russian text. The latter one is often heavily saturated with common English words, which are often much shorter than the corresponding Russian ones, and is sometimes referred to as Runglish or Russlish.

Translit

Translit is a method of encoding Cyrillic letters with Latin ones. The term is derived from transliteration, the system of replacing letters of one alphabet with letters of another. Translit found its way into web forums, chats, messengers, emails, MMORPGs and other network games. Some Cyrillic web sites had a translit version for cases of encoding problems.

As computer and network technologies developed support for the Cyrillic script, translit fell into disrepute.[ citation needed ]

Translit received its last development impulse with the increasing availability of mobile phones in Cyrillic-using countries. At first, the situation was the same as with computers; neither mobile phones nor mobile network operators supported Cyrillic. Although mobile phone technology now supports Unicode including all variants of Cyrillic alphabets, a single SMS in Cyrillic is limited to 70 characters, [lower-alpha 1] whereas a Latinate SMS can have up to 160 characters. If a message exceeds the character limit, it is split into multiple parts. That makes messages written in Cyrillic more expensive.

Common transliterations in translit
LetterTransliteration LetterTransliteration LetterTransliteration
аa кk, c чch, č, Ψ, 4
бb, 6 лl, Jl, Λ шsch, sh, š, w, Ψ, 6
вv, w мm щsh, shch, sch,
šč, shh, w
гg, Γ, s нn
дd, Δ, g оo ъ', y, j, '', #,
or absent
еe, ye, je, ieпp, Π, n
ёyo, jo, io, e рr ыy, i, q, bl
жzh, ž, j, z, g, *, >|<сs ь', y, j, b
or absent
зz, 3 тt, m
иi, u уu, y эe, e', eh
йy, j, i, u,
or absent;
ий → iy, ый → yy;
-ий/ый → i, y
фf, Φ
хh, kh, x юyu, ju, iu, u
цc, z, ts, tc, u яya, ja, ia, ea, a, q, 9, 9I

Volapuk encoding

Volapuk encoding (Russian : кодировка "волапюк", kodirovka "volapük") or latinitsa (латиница) is a slang term for rendering the letters of the Cyrillic script with Latin ones. Unlike Translit, in which characters are replaced to sound the same, in volapuk characters can be replaced to look or sound the same.

Etymology

The name Volapuk encoding comes from the constructed language Volapük, for two reasons. Cyrillic text written in this way looks strange and often funny, just as a Volapük-language text may appear. At the same time, the word "Volapük" ("Волапюк/Воляпюк" Volapyuk/Volyapyuk in Russian) itself sounds close to the words "воля" (will) and "пук" (fart), funny enough for the name to have stuck.

The term was popularized by its use in the first Soviet commercially available UUCP and TCP/IP network, RELCOM (a typical networking software package included Cyrillic KOI-8 to Volapuk transcoding utilities called tovol and fromvol, originally implemented by Vadim Antonov), making it the likely origin of the usage of Volapuk as applied to Cyrillic encoding.

History

Volapuk and Translit have been in use since the early days of the Internet to write e-mail messages and other texts in Russian where the support of Cyrillic fonts was limited: either the sender did not have a keyboard with Cyrillic letters or the receiver did not necessarily have Cyrillic screen fonts. In the early days, the situation was aggravated by a number of mutually incompatible computer encodings for the Cyrillic script, so that the sender and receiver were not guaranteed to have the same one. Also, the 7-bit character encoding of the early days was an additional hindrance.

Some Russian e-mail providers even included Volapuk encoding in the list of available options for the e-mails routed abroad, e.g.,

"MIME/BASE64, MIME/Quoted-Printable, volapuk, uuencode" [3]

By the late 1990s, the encoding problem had been almost completely resolved, due to increasing support from software manufacturers and Internet service providers. [3] Volapuk still maintains a level of use for SMS text messages, because it is possible to fit more characters in a Latinized SMS message than a Cyrillic one. It is also used in computer games that do not allow Cyrillic text in chat, particularly Counter-Strike .

Rules

Volapuk often replaces Cyrillic letters with Latin ones in order to look the same or at least similar as typed or handwritten Cyrillic letters.

  1. Replace "the same" letters: a, e, K, M, T, o. Capitalize when necessary for closer resemblance (к: K better than k, м: M better than m, т: T better than t (which looks exactly like 'm' in handwritten Cyrillic).
  2. Replace similar-looking letters: в – B, г – r (handwritten resemblance), з – 3 (i.e. number three), л – J| or /\ (the last is again handwritten resemblance), н – Н, п – n (handwritten resemblance), р – p, с – c, у – y, х – x, ч – 4, я – R, и – N or u (handwritten resemblance). This may vary.
  3. Replace all other non-obvious hard-to-represent characters using leet (any combination of Latin letters, numbers or punctuation that might bear a passing resemblance to the Cyrillic letter in question); there are many options for each letter. (For example, letter 'щ' can be encoded in more than 15 different ways). Examples: ж – *, щ – LLI_, э – -) and so on. The choice for each letter depends on the preferences of the individual user.

Encoding depends on the language as well. For example, Ukrainian-speaking users [4] have their own traditions, distinct from the Russian ones.

Example

See also

Notes

  1. Because the UCS-2 encoding of Cyrillic for SMS requires 16 bits whereas Latin encoding requires just seven. For the long explanation, see SMS#Message size. (Most web sites use UTF-8, which encodes latin in 8 bits but still requires 16 for Cyrillic and even more  much more  for East Asian scripts.)

Related Research Articles

<span class="mw-page-title-main">Cyrillic script</span> Writing system used for various Eurasian languages

The Cyrillic script, Slavonic script or simply Slavic script is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Eastern Europe, the Caucasus, Central Asia, North Asia, and East Asia, and used by many other minority languages.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways, such as Greek ⟨α⟩⟨a⟩, Cyrillic ⟨д⟩⟨d⟩, Greek ⟨χ⟩ → the digraph ⟨ch⟩, Armenian ⟨ն⟩⟨n⟩ or Latin ⟨æ⟩⟨ae⟩.

<span class="mw-page-title-main">Romanization</span> Transliteration or transcription to Latin letters

In linguistics, romanization or romanisation is the conversion of text from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into phonemic transcription, which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription, which records speech sounds with precision.

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encodings

Mojibake is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

<span class="mw-page-title-main">Romanization of Russian</span> Romanization of the Russian alphabet

The romanization of the Russian language, aside from its primary use for including Russian names and words in text written in a Latin alphabet, is also essential for computer users to input Russian text who either do not have a keyboard or word processor set up for inputting Cyrillic, or else are not capable of typing rapidly using a native Russian keyboard layout (JCUKEN). In the latter case, they would type using a system of transliteration fitted for their keyboard layout, such as for English QWERTY keyboards, and then use an automated tool to convert the text into Cyrillic.

The romanization of Ukrainian, or Latinization of Ukrainian, is the representation of the Ukrainian language in Latin letters. Ukrainian is natively written in its own Ukrainian alphabet, which is based on the Cyrillic script. Romanization may be employed to represent Ukrainian text or pronunciation for non-Ukrainian readers, on computer systems that cannot reproduce Cyrillic characters, or for typists who are not familiar with the Ukrainian keyboard layout. Methods of romanization include transliteration and transcription.

<span class="mw-page-title-main">Ukrainian alphabet</span> Alphabet that uses letters from the Cyrillic script

The Ukrainian alphabet is the set of letters used to write Ukrainian, which is the official language of Ukraine. It is one of several national variations of the Cyrillic script. It comes from the Cyrillic script, which was devised in the 9th century for the first Slavic literary language, called Old Slavonic. In the 10th century, it became used in Kievan Rus' to write Old East Slavic, from which the Belarusian, Russian, Rusyn, and Ukrainian alphabets later evolved. The modern Ukrainian alphabet has 33 letters in total: 21 consonants, 1 semivowel, 10 vowels and 1 palatalization sign. Sometimes the apostrophe (') is also included, which has a phonetic meaning and is a mandatory sign in writing, but is not considered as a letter and is not included in the alphabet.

<span class="mw-page-title-main">Faux Cyrillic</span> Using Cyrillic letters to represent Latin ones

Faux Cyrillic, pseudo-Cyrillic, pseudo-Russian or faux Russian typography is the use of Cyrillic letters in Latin text, usually to evoke the Soviet Union or Russia, though it may be used in other contexts as well. It is a common Western trope used in book covers, film titles, comic book lettering, artwork for computer games, or product packaging which are set in or wish to evoke Eastern Europe, the Soviet Union, or Russia. A typeface designed to emulate Cyrillic is classed as a mimicry typeface.

<span class="mw-page-title-main">Romanization of Bulgarian</span> Transliteration of Bulgarian text

Romanization of Bulgarian is the practice of transliteration of text in Bulgarian from its conventional Cyrillic orthography into the Latin alphabet. Romanization can be used for various purposes, such as rendering of proper names and place names in foreign-language contexts, or for informal writing of Bulgarian in environments where Cyrillic is not easily available. Official use of romanization by Bulgarian authorities is found, for instance, in identity documents and in road signage. Several different standards of transliteration exist, one of which was chosen and made mandatory for common use by the Bulgarian authorities in a law of 2009.

The Arabic chat alphabet, Arabizi, Arabeezi, Arabish or Franco-Arabic refer to the romanized alphabets for informal Arabic dialects in which Arabic script is transcribed or encoded into a combination of Latin script and Arabic numerals. These informal chat alphabets were originally used primarily by youth in the Arab world in very informal settings—especially for communicating over the Internet or for sending messages via cellular phones—though use is not necessarily restricted by age anymore and these chat alphabets have been used in other media such as advertising.

<span class="mw-page-title-main">Kazakh alphabets</span> Alphabets used to write the Kazakh language

Three alphabets are used to write Kazakh: the Cyrillic, Latin and Arabic scripts. The Cyrillic script is used in Kazakhstan and Mongolia. An October 2017 Presidential Decree in Kazakhstan ordered that the transition from Cyrillic to a Latin script be completed by 2031. The Arabic script is used in Saudi Arabia, Iran, Afghanistan, and parts of China.

Scientific transliteration, variously called academic, linguistic, international, or scholarly transliteration, is an international system for transliteration of text from the Cyrillic script to the Latin script (romanization). This system is most often seen in linguistics publications on Slavic languages.

YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

<span class="mw-page-title-main">Tajik alphabet</span> Alphabet used to write the Tajik language

The Tajik language has been written in three alphabets over the course of its history: an adaptation of the Perso-Arabic script, an adaptation of the Latin script and an adaptation of the Cyrillic script. Any script used specifically for Tajik may be referred to as the Tajik alphabet, which is written as алифбои тоҷикӣ in Cyrillic characters, الفبای تاجیکی with Perso-Arabic script and alifboji toçikī in Latin script.

<span class="mw-page-title-main">Cyrillic alphabets</span> Related alphabets based on Cyrillic scripts

Numerous Cyrillic alphabets are based on the Cyrillic script. The early Cyrillic alphabet was developed in the 9th century AD and replaced the earlier Glagolitic script developed by the Bulgarian theologians Cyril and Methodius. It is the basis of alphabets used in various languages, past and present, Slavic origin, and non-Slavic languages influenced by Russian. As of 2011, around 252 million people in Eurasia use it as the official alphabet for their national languages. About half of them are in Russia. Cyrillic is one of the most-used writing systems in the world. The creator is Saint Clement of Ohrid from the Preslav literary school in the First Bulgarian Empire.

<span class="mw-page-title-main">Romanization of Serbian</span> Use of Latin in the Serbian language

The romanization or Latinization of Serbian is the representation of the Serbian language using Latin letters. Serbian is written in two alphabets, Serbian Cyrillic, a variation of the Cyrillic alphabet, and Gaj's Latin, or latinica, a variation of the Latin alphabet. The Serbian language is an example of digraphia.

In mobile telephony GSM 03.38 or 3GPP 23.038 is a character encoding used in GSM networks for SMS, CB and USSD. The 3GPP TS 23.038 standard defines GSM 7-bit default alphabet which is mandatory for GSM handsets and network elements, but the character set is suitable only for English and a number of Western-European languages. Languages such as Chinese, Korean or Japanese must be transferred using the 16-bit UCS-2 character encoding. A limited number of languages, like Portuguese, Spanish, Turkish and a number of languages used in India written with a Brahmic scripts may use 7-bit encoding with national language shift table defined in 3GPP 23.038. For binary messages, 8-bit encoding is used.

<span class="mw-page-title-main">Mongolian Cyrillic alphabet</span> Writing system of standard Mongolian in Mongolia

The Mongolian Cyrillic alphabet is the writing system used for the standard dialect of the Mongolian language in the modern state of Mongolia. It has a largely phonemic orthography, meaning that there is a fair degree of consistency in the representation of individual sounds. Cyrillic has not been adopted as the writing system in the Inner Mongolia region of China, which continues to use the traditional Mongolian script.

Data Coding Scheme is a one-octet field in Short Messages (SM) and Cell Broadcast Messages (CB) which carries a basic information how the recipient handset should process the received message. The information includes:

The Komi language, a Uralic language spoken in the north-eastern part of European Russia, has been written in several different alphabets. Currently, Komi writing uses letters from the Cyrillic script. There have been five distinct stages in the history of Komi writing:

References

  1. 1 2 3 Notice of cancellation of automatic volapuk encoding (1997) (Russian, in KOI8-R encoding)
  2. Translit of early 1990s (Wayback Machine archived version)
  3. 1 2 A note of cancellation of automatic volapuk encoding (1997) (in Russian)
  4. Instructions at the Ukrainian chat server Nyshporka Archived 2007-01-01 at the Wayback Machine (in Russian)

Bibliography