ITRANS

Last updated

The "Indian languages TRANSliteration" (ITRANS) is an ASCII transliteration scheme for Indic scripts, particularly for the Devanagari script.

Contents

The need for a simple encoding scheme that used only keys available on an ordinary keyboard was felt in the early days of the rec.music.indian.misc (RMIM) Usenet newsgroup where lyrics and trivia about Indian popular movie songs were being discussed. In parallel was a Sanskrit Mailing list that quickly felt the need of an exact and unambiguous encoding. ITRANS emerged on the RMIM newsgroup as early as 1994. [1] This was spearheaded by Avinash Chopde, who developed a transliteration [2] package. Its latest version is v5.34. The package also enables automatic [3] conversion [4] of the Roman script to the Indic version.

ITRANS was in use for the encoding of Indian etexts - it is wider in scope than the Harvard-Kyoto scheme for Devanagari transliteration, with which it coincides largely, but not entirely. The early Sanskrit mailing list of the early 1990s, almost same time as RMIM, developed into the full blown Sanskrit Documents project and now uses ITRANS extensively, with thousands of encoded texts. With the wider implementation of Unicode, the traditional IAST is used increasingly also for electronic texts.

Like the Harvard-Kyoto scheme, the ITRANS romanization only uses diacritical signs found on the common English-language computer keyboard, and it is quite easy to read and pick up.

ITRANS transliteration scheme

ITRANS transliteration scheme [3] is given in the tables below. The ITRANS method is without using diacritics, as compared to other transliteration methods. While using ITRANS, for proper nouns, first letter capitalization is not possible since, ITRANS uses both capital and small letters in its lettering scheme.

Vowels

Table: Vowels
DevanagariGurmukhiTeluguKannadaTamilMalayalamBanglaSinhalaITRANS
a
A/aa
i
I/ii/ee
u
U/uu
RRi/R^i
RRI/R^I
LLi/L^i
LLI/L^I
[lower-alpha 1] ^e
e
E
ae
aE
ai
[lower-alpha 2] ^o
o
O
au
अं [lower-alpha 3] ਂ/ ੰఅంಅಂঅংM/.m/.n
अःಅಃಅಃঅঃH
अँਂ/ ੰఅఁঅঁ.N
[lower-alpha 4] .h [lower-alpha 5]
[lower-alpha 6] .a
[lower-alpha 7] ఓంOM, AUM

Consonants

The Devanāgarī consonant letters include an implicit 'a' sound. In all of the transliteration systems, that 'a' sound must be represented explicitly.

Standard Indic consonants

Table: ITRANS Devanagari consonants
Velar
kakhagagha~Na
Palatal
chaChajajha~na
Retroflex
TaThaDaDhaNa
Dental
tathadadhana
Labial
paphababhama
Semi-vowel
yaralava
Fricative
shaShasahaLa

Irregular consonant clusters

DevanāgarīITRANS
क्षkSa/kSha/xa
त्रtra
ज्ञGYa/j~na
श्रshra

Prenasalized consonants

SanskritSinhalaITRANS
ङ्ग~Nga
ञ्ज~nja
ण्डNDa
न्दnda
म्बmba

Consonants with Nuqta

DevanāgarīITRANS
क़qa
ख़Ka
ग़Ga
ज़za
फ़fa
ड़.Da/Ra
ढ़.Dha/Rha
व़wa

Dravidian consonants

DevanagariTamilMalayalamITRANS
zha
^ra
^na

Examples

Limitations

Since ITRANS was primarily designed for Sanskrit (and other modern Indo-Aryan languages), it lacks full-coverage for Indic scripts of other languages. Specifically, the support for Dravidian short-vowels 'e' and 'o' is considered ambiguous (since Indo-Aryan phonology does not differentiate them from long-vowels 'E' and 'O'). Also, the schwa used in languages like Bengali ([ɔ]) and Assamese ([ɒ]) differs from that of other languages ([ə]), causing a dissonant feeling when typing those languages. Moreover, although both Bengali and Assamese use Eastern Nagari, the phonology of Assamese varies from that of Bengali to a significant extent, causing more friction while typing Assamese.

The support for many phones of other languages like Dravidian, Hindustani nuqtas, Sinhala etc. is considered patchy and not consistent across implementations due to lack of standardization. Also, almost no ITRANS implementation fully supports languages like Kashmiri, Sindhi, etc.

New version

The ambiguity around Dravidian short-vowels 'e' and 'o' support has been addressed with a new ISO15919 compliant coding scheme, which is uniform across all supported languages/scripts, including nukta. The old version ITRANS 5.3 is maintained for backward compatibility.

The changed ones are listed below:

Changed mapping
DevanagariGurumukhiTeluguKannadaTamilMalayalamBangalaITRANS
ऎ e
E
o
O
Ra

The newly launched revamped package supports both the old ITRANS V-5.3 scheme as well as the ISO15919 scheme.

In addition, the new package can be customized for any specific INPUT codes.

[5]

See also

Notes

  1. Introduced in Parivardhita Devanagari to write Dravidian short-vowel 'e'
  2. Introduced in Parivardhita Devanagari to write Dravidian short-vowel 'o'
  3. added as anusvāra
  4. virāma/halant
  5. absence of 'a' after a consonant should also kill the schwa automatically
  6. avagraha:elision during sandhi
  7. Om symbol

Related Research Articles

<span class="mw-page-title-main">Devanagari</span> Writing script for many North Indian and Nepalese languages

Devanagari is an Indic script used in the northern Indian subcontinent. Also simply called Nāgari, it is a left-to-right abugida, based on the ancient Brāhmi script. It is one of the official scripts of the Republic of India and Nepal. It was developed and in regular use by the 7th century CE and achieved its modern form by 1000 CE. The Devanāgari script, composed of 48 primary characters, including 14 vowels and 34 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

<span class="mw-page-title-main">Kannada script</span> Abugida writing system of the Brahmic family

The Kannada script is an abugida of the Brahmic family, used to write Kannada, one of the Dravidian languages of South India especially in the state of Karnataka. It is one of the official scripts of the Indian Republic. Kannada script is also widely used for writing Sanskrit texts in Karnataka. Several minor languages, such as Tulu, Konkani, Kodava, Sanketi and Beary, also use alphabets based on the Kannada script. The Kannada and Telugu scripts share very high mutual intellegibility with each other, and are often considered to be regional variants of single script. Other scripts similar to Kannada script are Sinhala script, and Old Peguan script (used in Burma).

<span class="mw-page-title-main">Brahmic scripts</span> Family of abugida writing systems

The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South, East and Southeast Asia: Indo-Aryan, Dravidian, Tibeto-Burman, Mongolic, Austroasiatic, Austronesian, and Tai. They were also the source of the dictionary order (gojūon) of Japanese kana.

Anusvara, also known as Bindu, is a symbol used in many Indic scripts to mark a type of nasal sound, typically transliterated ⟨ṃ⟩ or ⟨ṁ⟩ in standards like ISO 15919 and IAST. Depending on its location in the word and the language for which it is used, its exact pronunciation can vary. In the context of ancient Sanskrit, anusvara is the name of the particular nasal sound itself, regardless of written representation.

Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanized Devanagari is also called Romanagari.

The National Library at Kolkata romanisation is a widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as (American) Library of Congress and is nearly identical to one of the possible ISO 15919 variants. The scheme is an extension of the IAST scheme that is used for transliteration of Sanskrit.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during the 19th century from suggestions by Charles Trevelyan, William Jones, Monier Monier-Williams and other scholars, and formalised by the Transliteration Committee of the Geneva Oriental Congress, in September 1894. IAST makes it possible for the reader to read the Indic text unambiguously, exactly as if it were in the original Indic script. It is this faithfulness to the original scripts that accounts for its continuing popularity amongst scholars.

<span class="mw-page-title-main">Grantha script</span> South Indian script

The Grantha script was a classical South-Indian script, found particularly in Tamil Nadu and Kerala. Originating from the Pallava script, the Grantha script is related to Tamil and Vatteluttu scripts. The modern Malayalam script of Kerala is a direct descendant of the Grantha script. The Southeast Asian and Indonesian scripts such as Thai and Javanese respectively, as well as South Asian Tigalari and Sinhala scripts, are derived or closely related to Grantha through the early Pallava script. The Pallava script or Pallava Grantha, emerged in the 4th century CE and was used until the 7th century CE, in India. This early Grantha script was used to write Sanskrit texts, inscriptions on copper plates and stones of Hindu temples and monasteries. It was also used for classical Manipravalam – a language that is a blend of Sanskrit and Tamil. From it evolved Middle Grantha by the 7th century, and Transitional Grantha by about the 8th century, which remained in use until about the 14th century. Modern Grantha has been in use since the 14th century and into the modern era, to write classical texts in Sanskrit and Dravidian languages. It is also used to chant hymns and in traditional Vedic schools.

The Harvard-Kyoto Convention is a system for transliterating Sanskrit and other languages that use the Devanāgarī script into ASCII. It is predominantly used informally in e-mail, and for electronic texts.

ISO 15919 is one of a series of international standards for romanization by the International Organization for Standardization. It was published in 2001 and uses diacritics to map the much larger set of consonants and vowels in Brahmic and Nastaliq scripts to the Latin script.

<span class="mw-page-title-main">Assamese alphabet</span> Writing system of the Assamese language

The Assamese alphabet is a writing system of the Assamese language and is a part of the Bengali-Assamese script. This script was also used in Assam and nearby regions for Sanskrit as well as other languages such as Bodo, Khasi, Mising, Jaintia etc. It evolved from Kamarupi script. The current form of the script has seen continuous development from the 5th-century Umachal/Nagajari-Khanikargaon rock inscriptions written in an eastern variety of the Gupta script, adopting significant traits from the Siddhaṃ script in the 7th century. By the 17th century three styles of Assamese alphabets could be identified that converged to the standard script following typesetting required for printing. The present standard is identical to the Bengali alphabet except for two letters, ৰ (ro) and ৱ (vo); and the letter ক্ষ (khya) has evolved into an individual consonant by itself with its own phonetic quality whereas in the Bengali alphabet it is a conjunct of two letters.

Chandrabindu is a diacritic sign with the form of a dot inside the lower half of a circle. It is used in the Devanagari (ँ), Bengali-Assamese (ঁ), Gujarati (ઁ), Odia (ଁ), Telugu (ఁ), Javanese ( ꦀ) and other scripts.

The nuqta, is a diacritic mark that was introduced in Devanagari and some other Indic scripts to represent sounds not present in the original scripts. It takes the form of a dot placed below a character. This idea is inspired from the Arabic script; for example, there are some letters in Urdu that share the same basic shape but differ in the placement of dots(s) or nuqta(s) in the Perso-Arabic script: the letter ع ayn, with the addition of a nuqta on top, becomes the letter غ g͟hayn.

There are several romanisation schemes for the Malayalam script, including ITRANS and ISO 15919.

Romanisation of Bengali is the representation of written Bengali language in the Latin script. Various romanisation systems for Bengali are used, most of which do not perfectly represent Bengali pronunciation. While different standards for romanisation have been proposed for Bengali, none has been adopted with the same degree of uniformity as Japanese or Sanskrit.

The Mozhi is a popular romanization scheme for Malayalam script. It is primarily used for Input Method Editors for Malayalam and loosely based on ITrans scheme for Devanagari.

<span class="mw-page-title-main">WX notation</span> Transliteration for Indian languages

WX notation is a transliteration scheme for representing Indian languages in ASCII. This scheme originated at IIT Kanpur for computational processing of Indian languages, and is widely used among the natural language processing (NLP) community in India. The notation is used, for example, in a textbook on NLP from IIT Kanpur. The salient features of this transliteration scheme are: Every consonant and every vowel has a single mapping into Roman. Hence it is a prefix code, advantageous from a computation point of view. Typically the small case letters are used for un-aspirated consonants and short vowels while the capital case letters are used for aspirated consonants and long vowels. While the retroflexed voiceless and voiced consonants are mapped to 't, T, d and D', the dentals are mapped to 'w, W, x and X'. Hence the name of the scheme "WX", referring to the idiosyncratic mapping. Ubuntu Linux provides a keyboard support for WX notation.

The Sanskrit Library Phonetic basic encoding scheme (SLP1) is an ASCII transliteration scheme for the Sanskrit language from and to the Devanagari script.

The Velthuis system of transliteration is an ASCII transliteration scheme for the Sanskrit language from and to the Devanagari script. It was developed in about 1983 by Frans Velthuis, a scholar living in Groningen, Netherlands, who created a popular, high-quality software package in LaTeX for typesetting Devanāgarī. The primary documentation for the scheme is the system's clearly-written software manual. It is based on using the ISO 646 repertoire to represent mnemonically the accents used in standard scholarly transliteration. It does not use diacritics as IAST does. It may optionally use capital letters in a manner similar but not identical to the Harvard-Kyoto or ITRANS schemes.manual para 4.1

References

  1. "An early post from 1995 referring to ITRANS effort going on RMIM newsgroup" . Retrieved 15 December 2015.
  2. Aksharamukha transliteration tool. Akshara Mukha is an Asian script (two way) converter freeware. It converts between 20 different South Asian & East Asian scripts. It also supports 5 major Latin transliteration conventions such as IAST, ISO, Harvard Kyoto, ITRANS & Velthuis. You can access the project from here. While using the tool, 'source' can be set to for example: ITRANS or Harvard-Kyoto, and 'target' can be set to a particular script like Devanagari-Hindi.(When you are using a north Indian script, tick the box: Remove ‘a’.) It can work in reverse too, for example from Hindi to Latin by ISO transliteration.
  3. 1 2 "ITRANS (version 5.34) website describing scheme. (Avinash Chopde)". www.aczoom.com. Retrieved 15 December 2015. Online Interface to ITRANS (Online converter tool from Latin script using ITRANS to various Indic scripts. Reliable source at converter tool page gives the mapping spreadsheet (has clear tilde sign). Scheme for Devanagari and tables for all the languages covered. Ultimately the conversion tool follows the mapping spreadsheet. Source code at GitHub itrans
  4. Google Transliteration (supports Indic languages) Online and downloadable tool for transliteration by Google. (Also additionally uses ITRANS but older version 1)
  5. "Online Interface to ITRANS".