WX notation

Last updated
Ubuntu Hindi (WX) keyboard layout Ubuntu Hindi Wx keyboard layout.png
Ubuntu Hindi (WX) keyboard layout

WX notation is a transliteration scheme for representing Indian languages in ASCII. This scheme originated at IIT Kanpur for computational processing of Indian languages, and is widely used among the natural language processing (NLP) community in India. The notation (though unidentified) is used, for example, in a textbook on NLP from IIT Kanpur. [1] The salient features of this transliteration scheme are: Every consonant and every vowel has a single mapping into Roman. Hence it is a prefix code, [2] advantageous from a computation point of view. Typically the small case letters are used for un-aspirated consonants and short vowels while the capital case letters are used for aspirated consonants and long vowels. While the retroflexed voiceless and voiced consonants are mapped to 't, T, d and D', the dentals are mapped to 'w, W, x and X'. Hence the name of the scheme "WX", referring to the idiosyncratic mapping. Ubuntu Linux provides a keyboard support for WX notation.

Contents

In the following tables, such idiosynratic assignment of letters to phonemes with which they are normally not associated are boldfaced. Besides 'w', 'W', 'x', and 'X', these are 'f', 'F', 'q', 'Q', 'L', and 'R'.

Vowels

aAiIuUeEoO

Sonorants

qQL

Anusvāra and visarga

MH

The Anunasika is represented by 'z'. For example, अँ = az. In Sanskrit, the Avagraha is represented by 'Z'. For example, वमतोऽन्तः = vamawoZnwaH. This may cause confusion as 'Z' is also used for another purpose in the case of other Indic languages (see below, last paragraph).

Consonants

क्ख्ग्घ्ङ् Velar
kKgGf
च्छ्ज्झ्ञ् Palatal
cCjJF
ट्ठ्ड्ढ्ण् Retroflex
tTdDN
त्थ्द्ध्न् Dental
wWxXn
प्फ्ब्भ्म् Labial
pPbBm
य्र्ल्व् Semi-vowel
yrlv
श्ष्स्ह् Fricative
SRsh

This scheme was further extended to represent all the Indian scripts derived from Brahmi. To account for the characters from other Indian languages that are missing in Devanagari, three operators are used: 'Y' to get the next ISCII character, 'V' to get the previous ISCII character and 'Z' to add the nukta. Thus for example, 'l' represents ल (U0932) of Devanagari, and 'lY' represents ळ (U0933) in Marathi. 'e' represents ए (U090F) of Devanagari or ఏ (U0C0F) of Telugu and eV represents ऎ (U090E) or ఎ (U0C0E) of Telugu. Similarly 'ka' represents क of Devanagari, and 'kZa' represents क़.

See also

Related Research Articles

<span class="mw-page-title-main">Abugida</span> Writing system

An abugida, sometimes known as alphasyllabary, neosyllabary or pseudo-alphabet, is a segmental writing system in which consonant–vowel sequences are written as units; each unit is based on a consonant letter, and vowel notation is secondary, like a diacritical mark. This contrasts with a full alphabet, in which vowels have status equal to consonants, and with an abjad, in which vowel marking is absent, partial, or optional – in less formal contexts, all three types of script may be termed "alphabets". The terms also contrast them with a syllabary, in which a single symbol denotes the combination of one consonant and one vowel.

<span class="mw-page-title-main">Devanagari</span> Writing script for many North Indian and Nepalese languages

Devanāgarī or Devanagari, also called Nāgarī, is a left-to-right abugida, based on the ancient Brāhmī script, used in the northern Indian subcontinent. It is one of the official scripts of the Republic of India and Nepal. It was developed and in regular use by the 7th century CE and achieved its modern form by 1000 CE. The Devanāgarī script, composed of 48 primary characters, including 14 vowels and 34 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

<span class="mw-page-title-main">Gujarati script</span> Indian script

The Gujarati script is an abugida for the Gujarati language, Kutchi language, and various other languages. It is one of the official scripts of the Indian Republic. It is a variant of the Devanagari script differentiated by the loss of the characteristic horizontal line running above the letters and by a number of modifications to some characters.

<span class="mw-page-title-main">Brahmic scripts</span> Family of abugida writing systems

The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South, East and Southeast Asia: Indo-Aryan, Dravidian, Tibeto-Burman, Mongolic, Austroasiatic, Austronesian, and Tai. They were also the source of the dictionary order (gojūon) of Japanese kana.

Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanized Devanagari is also called Romanagari.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

The Harvard-Kyoto Convention is a system for transliterating Sanskrit and other languages that use the Devanāgarī script into ASCII. It is predominantly used informally in e-mail, and for electronic texts.

Virama is a Sanskrit phonological concept to suppress the inherent vowel that otherwise occurs with every consonant letter, commonly used as a generic term for a codepoint in Unicode, representing either

  1. halanta, hasanta or explicit virāma, a diacritic in many Brahmic scripts, including the Devanagari and Bengali scripts, or
  2. saṃyuktākṣara or implicit virama, a conjunct consonant or ligature.

Ka is the first consonant of the Indic abugidas. In modern Indic scripts, ka is derived from the Brāhmī letter , which is derived from the Aramaic ("K").

Ga is the third consonant of Indic abugidas. In modern Indic scripts, ga is derived from the early "Ashoka" Brahmi letter , which is probably derived from the Aramaic letter after having gone through the Gupta letter .

Ṅa is the fifth consonant of Indic abugidas. In modern Indic scripts, It is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .

Cha is the seventh consonant of Indic abugidas. In modern Indic scripts, cha is derived from the early "Ashoka" Brahmi letter , which is probably derived from the Aramaic letter ("Q") after having gone through the Gupta letter .

Ja is the eighth consonant of Indic abugidas. In modern Indic scripts, ja is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .

Jha is the ninth consonant of Indic abugidas. In modern Indic scripts, jha is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .

The Velthuis system of transliteration is an ASCII transliteration scheme for the Sanskrit language from and to the Devanagari script. It was developed in about 1983 by Frans Velthuis, a scholar living in Groningen, Netherlands, who created a popular, high-quality software package in LaTeX for typesetting Devanāgarī. The primary documentation for the scheme is the system's clearly-written software manual. It is based on using the ISO 646 repertoire to represent mnemonically the accents used in standard scholarly transliteration. It does not use diacritics as IAST does. It may optionally use capital letters in a manner similar but not identical to the Harvard-Kyoto or ITRANS schemes.manual para 4.1

Ṭha is a consonant of Indic abugidas. In modern Indic scripts, Ṭha is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter . As with the other cerebral consonants, ṭha is not found in most scripts for Tai, Sino-Tibetan, and other non-Indic languages, except for a few scripts, which retain these letters for transcribing Sanskrit religious terms.

Ḍha is a consonant of Indic abugidas. In modern Indic scripts, Ḍha is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter . As with the other cerebral consonants, ḍha is not found in most scripts for Tai, Sino-Tibetan, and other non-Indic languages, except for a few scripts, which retain these letters for transcribing Sanskrit religious terms.

Va or Wa is a consonant of Indic abugidas. In modern Indic scripts, Va is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter . It is generally romanized as "Va" in scripts for Indic languages, but as "Wa" in many scripts for other language families.

Ṣa is a consonant of Indic abugidas. In modern Indic scripts, Ssa is derived from the early "Ashoka" Brahmi letter after having gone through the Gupta letter .

Hindi–Urdu is the lingua franca of modern-day Northern India and Pakistan. Modern Standard Hindi is officially registered in Indian Republic as a standard written using Devanagari script, and Urdu is officially registered in Pakistan as a standard written using extended Perso-Arabic script.

References

  1. Akshar Bharati; Vineet Chaitanya; Rajeev Sangal (1996). "Appendix B". Natural Language Processing: A Paninian Perspective (PDF). Prentice-Hall of India. pp. 191–193. ISBN   9788120309210 . Retrieved 16 February 2014.
  2. {{Sanskrit Computational Linguistics First and Second International Symposia Rocquencourt, France, October 29-31, 2007 Providence, RI, USA, May 15-17, 2008, Revised Selected Papers Editors: Huet, Gérard, Kulkarni, Amba, Scharf, Peter (Eds.) | url=https://www.springer.com/gp/book/9783642001543  ; refer to Appendix B}}