This article needs additional citations for verification .(February 2014) |
WX notation is a transliteration scheme for representing Indian languages in ASCII. This scheme originated at IIT Kanpur for computational processing of Indian languages, and is widely used among the natural language processing (NLP) community in India. The notation (though unidentified) is used, for example, in a textbook on NLP from IIT Kanpur. [1] The salient features of this transliteration scheme are: Every consonant and every vowel has a single mapping into Roman. Hence it is a prefix code, [2] advantageous from a computation point of view. Typically the small case letters are used for un-aspirated consonants and short vowels while the capital case letters are used for aspirated consonants and long vowels. While the retroflexed voiceless and voiced consonants are mapped to 't, T, d and D', the dentals are mapped to 'w, W, x and X'. Hence the name of the scheme "WX", referring to the idiosyncratic mapping. Ubuntu Linux provides a keyboard support for WX notation.
In the following tables, such idiosynratic assignment of letters to phonemes with which they are normally not associated are boldfaced. Besides 'w', 'W', 'x', and 'X', these are 'f', 'F', 'q', 'Q', 'L', and 'R'.
अ | आ | इ | ई | उ | ऊ | ए | ऐ | ओ | औ |
a | A | i | I | u | U | e | E | o | O |
ऋ | ॠ | ऌ |
q | Q | L |
ं | ः |
M | H |
The Anunasika is represented by 'z'. For example, अँ = az. In Sanskrit, the Avagraha is represented by 'Z'. For example, वमतोऽन्तः = vamawoZnwaH. This may cause confusion as 'Z' is also used for another purpose in the case of other Indic languages (see below, last paragraph).
क् | ख् | ग् | घ् | ङ् | Velar |
k | K | g | G | f | |
च् | छ् | ज् | झ् | ञ् | Palatal |
c | C | j | J | F | |
ट् | ठ् | ड् | ढ् | ण् | Retroflex |
t | T | d | D | N | |
त् | थ् | द् | ध् | न् | Dental |
w | W | x | X | n | |
प् | फ् | ब् | भ् | म् | Labial |
p | P | b | B | m | |
य् | र् | ल् | व् | Semi-vowel | |
y | r | l | v | ||
श् | ष् | स् | ह् | Fricative | |
S | R | s | h |
This scheme was further extended to represent all the Indian scripts derived from Brahmi. To account for the characters from other Indian languages that are missing in Devanagari, three operators are used: 'Y' to get the next ISCII character, 'V' to get the previous ISCII character and 'Z' to add the nuqta. Thus for example, 'l' represents ल (U0932) of Devanagari, and 'lY' represents ळ (U0933) in Marathi. 'e' represents ए (U090F) of Devanagari or ఏ (U0C0F) of Telugu and eV represents ऎ (U090E) or ఎ (U0C0E) of Telugu. Similarly 'ka' represents क of Devanagari, and 'kZa' represents क़.
Devanagari is an Indic script used in the northern Indian subcontinent. Also simply called Nāgari, it is a left-to-right abugida, based on the ancient Brāhmi script. It is one of the official scripts of the Republic of India and Nepal. It was developed and in regular use by the 8th century CE and achieved its modern form by 1200 CE. The Devanāgari script, composed of 48 primary characters, including 14 vowels and 34 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.
The Gujarati script is an abugida for the Gujarati language, Kutchi language, and various other languages. It is one of the official scripts of the Indian Republic. It is a variant of the Devanagari script differentiated by the loss of the characteristic horizontal line running above the letters and by a number of modifications to some characters.
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South, East and Southeast Asia: Indo-Aryan, Dravidian, Tibeto-Burman, Mongolic, Austroasiatic, Austronesian, and Tai. They were also the source of the dictionary order (gojūon) of Japanese kana.
Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanised Devanagari is also called Romanagari.
Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.
The Harvard-Kyoto Convention is a system for transliterating Sanskrit and other languages that use the Devanāgarī script into ASCII. It is predominantly used informally in e-mail, and for electronic texts.
Virama is a Sanskrit phonological concept to suppress the inherent vowel that otherwise occurs with every consonant letter, commonly used as a generic term for a codepoint in Unicode, representing either
Ka is the first consonant of the Indic abugidas. In modern Indic scripts, ka is derived from the Brāhmī letter , which is derived from the Aramaic ("K").
Kha is the second consonant of Indic abugidas. In modern Indic scripts, kha is derived from the Brahmi letter , which is probably derived from the Aramaic ("Q").
Ga is the third consonant of Indic abugidas. In modern Indic scripts, ga is derived from the early "Ashoka" Brahmi letter , which is probably derived from the Aramaic letter after having gone through the Gupta letter .
Ṅa is the fifth consonant of Indic abugidas. In modern Indic scripts, It is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .
Cha is the seventh consonant of Indic abugidas. In modern Indic scripts, cha is derived from the early "Ashoka" Brahmi letter , which is probably derived from the Aramaic letter ("Q") after having gone through the Gupta letter .
Ja is the eighth consonant of Indic abugidas. In modern Indic scripts, ja is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .
Jha is the ninth consonant of Indic abugidas. In modern Indic scripts, jha is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .
The Velthuis system of transliteration is an ASCII transliteration scheme for the Sanskrit language from and to the Devanagari script. It was developed in about 1983 by Frans Velthuis, a scholar living in Groningen, Netherlands, who created a popular, high-quality software package in LaTeX for typesetting Devanāgarī. The primary documentation for the scheme is the system's clearly-written software manual. It is based on using the ISO 646 repertoire to represent mnemonically the accents used in standard scholarly transliteration. It does not use diacritics as IAST does. It may optionally use capital letters in a manner similar but not identical to the Harvard-Kyoto or ITRANS schemes.manual para 4.1
Ṭha is a consonant of Indic abugidas. In modern Indic scripts, Ṭha is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter . As with the other cerebral consonants, ṭha is not found in most scripts for Tai, Sino-Tibetan, and other non-Indic languages, except for a few scripts, which retain these letters for transcribing Sanskrit religious terms.
Ḍha is a consonant of Indic abugidas. In modern Indic scripts, Ḍha is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter . As with the other cerebral consonants, ḍha is not found in most scripts for Tai, Sino-Tibetan, and other non-Indic languages, except for a few scripts, which retain these letters for transcribing Sanskrit religious terms.
Va or Wa is a consonant of Indic abugidas. In modern Indic scripts, Va is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter . It is generally romanized as "Va" in scripts for Indic languages, but as "Wa" in many scripts for other language families.
Ṣa is a consonant of Indic abugidas. In modern Indic scripts, Ssa is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .
Hindi–Urdu is the lingua franca of modern-day Northern India and Pakistan. Modern Standard Hindi is officially registered in India as a standard written using the Devanagari script, and Standard Urdu is officially registered in Pakistan as a standard written using an extended Perso-Arabic script.