CSX Indic character set

Last updated

The CSX Indic character set, or the Classical Sanskrit eXtended Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit. [1] [2] [3] It has no association with American railroad company CSX Transportation. It is an extension of the CS Indic character set, and is based on Code Page 437. [4] An extended version is the CSX+ Indic character set. [5] Michael Everson made a font in this character set for the Macintosh. [6]

Contents

Code page layout

CSX
0123456789ABCDEF
8x Ç ü é â ä à å ç ê ë è ï î ì Ä Å
9x É æ Æ ô ö ò û ù ÿ Ö Ü ¢ £ ¥
Ax á í ó ú ñ Ñ ā̆ ī̆ ū̆ ā̃ ī̃ « »
Bx ā́ ā̀ ī́ ī̀ ū́ ū̀
Cx ṛ́ ṛ̀ ṝ́
Dx ã ĩ ũ õ ĕ ŏ ū̃
Ex ā ß Ā ī Ī ū Ū
Fx ś Ś

Note that some fonts have ā̃ (U+0101 LATIN SMALL LETTER A WITH MACRON, U+0303 COMBINING TILDE) at code point 171 (0xAC), ī̃ (U+012B LATIN SMALL LETTER I WITH MACRON, U+0303 COMBINING TILDE) at code point 172 (0xAD), and ū̃ (U+016B LATIN SMALL LETTER U WITH MACRON, U+0303 COMBINING TILDE) at code point 216 (0xD8). [3]

History

See the shared history of the CS character set.

Related Research Articles

A macron is a diacritical mark: it is a straight bar ¯ placed above a letter, usually a vowel. Its name derives from Ancient Greek μακρόν (makrón) 'long' because it was originally used to mark long or heavy syllables in Greco-Roman metrics. It now more often marks a long vowel. In the International Phonetic Alphabet, the macron is used to indicate a mid-tone; the sign for a long vowel is instead a modified triangular colon ː.

N, or n, is the fourteenth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is en, plural ens.

P, or p, is the sixteenth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is pee, plural pees.

R, or r, is the eighteenth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ar, plural ars, or in Ireland or.

T, or t, is the twentieth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is tee, plural tees. It is derived from the Semitic Taw 𐤕 of the Phoenician and Paleo-Hebrew script via the Greek letter τ (tau). In English, it is most commonly used to represent the voiceless alveolar plosive, a sound it also denotes in the International Phonetic Alphabet. It is the most commonly used consonant and the second-most commonly used letter in English-language texts.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.

The Coptic script is the script used for writing the Coptic language, the latest stage of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.

The tilde˜ or ~, is a grapheme with several uses. The name of the character came into English from Spanish, which in turn came from the Latin titulus, meaning "title" or "superscription". Its primary use is as a diacritic (accent) in combination with a base letter; but for historical reasons, it is also used in standalone form within a variety of contexts.

<span class="mw-page-title-main">Ñ</span> Letter of the modern Latin alphabet

Ñ, or ñ, is a letter of the modern Latin alphabet, formed by placing a tilde on top of an upper- or lower-case N. It became part of the Spanish alphabet in the eighteenth century when it was first formally defined, but it has subsequently been used in other languages, such as Galician, Asturian, the Aragonese Grafía de Uesca, Basque, Chavacano, some Philippine languages, Chamorro, Guarani, Quechua, Mapudungun, Mandinka, Papiamento, and Tetum alphabets, as well as in Latin transliteration of Tocharian and many Indian languages, where it represents or. It represents in Crimean Tatar, Kazakh, ALA-LC romanization for Turkic languages, the Common Turkic Alphabet, Nauruan and romanized Quenya. In Breton and in Rohingya, it denotes nasalization of the preceding vowel.

Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanized Devanagari is also called Romanagari.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms in a single typeset or handwritten character

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters æ and œ used in English and French, in which the letters 'a' and 'e' are joined for the first ligature and the letters 'o' and 'e' are joined for the second ligature. For stylistic and legibility reasons, 'f' and 'i' are often merged to create 'fi' ; the same is true of 's' and 't' to create 'st'. The common ampersand (&) developed from a ligature in which the handwritten Latin letters 'E' and 't' were combined.

A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.

The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during the nineteenth century from suggestions by Charles Trevelyan, William Jones, Monier Monier-Williams and other scholars, and formalised by the Transliteration Committee of the Geneva Oriental Congress, in September 1894. IAST makes it possible for the reader to read the Indic text unambiguously, exactly as if it were in the original Indic script. It is this faithfulness to the original scripts that accounts for its continuing popularity amongst scholars.

T-comma is a letter which is part of the Romanian alphabet, used to represent the Romanian language sound, the voiceless alveolar affricate. The letter is also a part of the Finno-Ugric Livonian language alphabet, representing the sound.

Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in a number of languages for several different purposes. The most familiar to English language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such dots are also sometimes used for stylistic reasons.

L or l is the twelfth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is el, plural els.

is a Latin P with a diacritical tilde. It is or was used as a grapheme in some languages of Vanuatu, such as North Efate, South Efate and Namakura, to represent a sound. It is also used in the Yanesha language.

The Cork encoding is a character encoding used for encoding glyphs in fonts. It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX. It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.

The CS Indic character set, or the Classical Sanskrit Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit. It is used in fonts, and is based on Code Page 437. Extended versions are the CSX Indic character set and the CSX+ Indic character set.

The CSX+ Indic character set, or the Classical Sanskrit eXtended Plus Indic Character Set, is used by LaTeX to represent text used in the Romanization of Sanskrit. It is an extension of the CSX Indic character set, which in turn is an extension of the CS Indic character set, and is based on Code Page 437. It fixes an issue with Windows programs, by moving á from code point 160 (0xA0), to code point 158 (0x9E).

References

  1. Anshuman Pandey (December 1998). "Romanized Indix and LaTex" (PDF). TUGboat . TeX Users Group. 19 (4): 417.
  2. "Classical Sanskrit eXtended encoding for the representation of Indian languages in Roman script".
  3. 1 2 "The CSX encoding".
  4. "CTAN: /Tex-archive/Fonts/CSX/Fonts/Charter".
  5. "The CSX+ encoding (Classical Sanskrit eXtended Plus) encoding used in (La)TeX".
  6. "Everson Mono for Macintosh".