CS Indic character set

Last updated

The CS Indic character set, or the Classical Sanskrit Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit. [1] It is used in fonts, and is based on Code Page 437. [2] Extended versions are the CSX Indic character set and the CSX+ Indic character set. [3] [4]

Contents

Code page layout

CS Indic [5]
0123456789ABCDEF
8x
9x
Axtitle="Alt+164U+00F1 LATIN SMALL LETTER N WITH TILDE" style="padding:0px;"}}|ñ title="Alt+165U+00D1 LATIN CAPITAL LETTER N WITH TILDE" style="padding:0px;"}}|Ñ title="Alt+166U+006C LATIN SMALL LETTER L, U+0303 COMBINING TILDE" style="padding:0px;"}}| title="Alt+167U+1E41 LATIN SMALL LETTER M WITH DOT ABOVE" style="padding:0px;"}}|
Bx
Cx
Dx
Extitle="Alt+224U+0101 LATIN SMALL LETTER A WITH MACRON" style="padding:0px;"}}|ā title="Alt+225" style="padding:0px;background:#DDD"}}|title="Alt+226U+0100 LATIN CAPITAL LETTER A WITH MACRON" style="padding:0px;"}}|Ā title="Alt+227U+012B LATIN SMALL LETTER I WITH MACRON" style="padding:0px;"}}|ī title="Alt+228U+012A LATIN CAPITAL LETTER I WITH MACRON" style="padding:0px;"}}|Ī title="Alt+229U+016B LATIN SMALL LETTER U WITH MACRON" style="padding:0px;"}}|ū title="Alt+230U+016A LATIN CAPITAL LETTER U WITH MACRON" style="padding:0px;"}}|Ū title="Alt+231U+1E5B LATIN SMALL LETTER R WITH DOT BELOW" style="padding:0px;"}}| title="Alt+232U+1E5A LATIN CAPITAL LETTER R WITH DOT BELOW" style="padding:0px;"}}| title="Alt+233U+1E5D LATIN SMALL LETTER R WITH DOT BELOW AND MACRON" style="padding:0px;"}}| title="Alt+234U+1E5C LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON" style="padding:0px;"}}| title="Alt+235U+1E37 LATIN SMALL LETTER L WITH DOT BELOW" style="padding:0px;"}}| title="Alt+236U+1E36 LATIN CAPITAL LETTER L WITH DOT BELOW" style="padding:0px;"}}| title="Alt+237U+1E39 LATIN SMALL LETTER L WITH DOT BELOW AND MACRON" style="padding:0px;"}}| title="Alt+238U+1E38 LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON" style="padding:0px;"}}| title="Alt+239U+1E45 LATIN SMALL LETTER N WITH DOT ABOVE" style="padding:0px;"}}|
Fxtitle="Alt+240U+1E44 LATIN CAPITAL LETTER N WITH DOT ABOVE" style="padding:0px;"}}| title="Alt+241U+1E6D LATIN SMALL LETTER T WITH DOT BELOW" style="padding:0px;"}}| title="Alt+242U+1E6C LATIN CAPITAL LETTER T WITH DOT BELOW" style="padding:0px;"}}| title="Alt+243U+1E0D LATIN SMALL LETTER D WITH DOT BELOW" style="padding:0px;"}}| title="Alt+244U+1E0C LATIN CAPITAL LETTER D WITH DOT BELOW" style="padding:0px;"}}| title="Alt+245U+1E47 LATIN SMALL LETTER N WITH DOT BELOW" style="padding:0px;"}}| title="Alt+246U+1E46 LATIN CAPITAL LETTER N WITH DOT BELOW" style="padding:0px;"}}| title="Alt+247U+015B LATIN SMALL LETTER S WITH ACUTE" style="padding:0px;"}}|ś title="Alt+248U+015A LATIN CAPITAL LETTER S WITH ACUTE" style="padding:0px;"}}|Ś title="Alt+249U+1E63 LATIN SMALL LETTER S WITH DOT BELOW" style="padding:0px;"}}| title="Alt+250U+1E62 LATIN CAPITAL LETTER S WITH DOT BELOW" style="padding:0px;"}}| title="Alt+251" style="padding:0px;background:#DDD"}}|title="Alt+252U+1E43 LATIN SMALL LETTER M WITH DOT BELOW" style="padding:0px;"}}| title="Alt+253U+1E42 LATIN CAPITAL LETTER M WITH DOT BELOW" style="padding:0px;"}}| title="Alt+254U+1E25 LATIN SMALL LETTER H WITH DOT BELOW" style="padding:0px;"}}| title="Alt+255U+1E24 LATIN CAPITAL LETTER H WITH DOT BELOW" style="padding:0px;"}}|

History

The CS and CSX character set was defined during an informal discussion over a beer between John Smith, Dominik Wujastyk and Ronald E. Emmerick during the World Sanskrit Conference in Vienna, 1990. A few months later they were endorsed by several other Indologists including Harry Falk, Richard Lariviere, G. Jan Meulenbeld, Hideaki Nakatani, Muneo Tokunaga, and Michio Yano. [5]

Related Research Articles

Devanagari Writing script for many Indian and Nepalese languages

Devanagari, also called Nagari, is a left-to-right abugida, based on the ancient Brāhmī script, used in the Indian subcontinent. It was developed in ancient India from the 1st to the 4th century CE and was in regular use by the 7th century CE. The Devanagari script, composed of 47 primary characters including 14 vowels and 33 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

TeX, stylized within the system as TeX, is a typesetting system which was designed and written by Donald Knuth and first released in 1978. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.

Pali is a Middle Indo-Aryan liturgical language native to the Indian subcontinent. It is widely studied because it is the language of the Pāli Canon or Tipiṭaka as well as the sacred language of Theravāda Buddhism. In early time, it was written in Brahmi script.

Metafont is a description language used to define raster fonts. It is also the name of the interpreter that executes Metafont code, generating the bitmap fonts that can be embedded into e.g. PostScript. Metafont was devised by Donald Knuth as a companion to his TeX typesetting system.

OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

Device independent file format

The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages and for printing.

Tibetan script writing system used to write certain Tibetic languages

The Tibetan script is a segmental writing system (abugida) of Indic origin used to write certain Tibetic languages, including Tibetan, Dzongkha, Sikkimese, Ladakhi, Jirel and Balti. It has also been used for some non-Tibetic languages in close cultural contact with Tibet, such as Thakali. The printed form is called uchen script while the hand-written cursive form used in everyday writing is called umê script. This writing system is used across the Himalayas, and Tibet.

Devanāgarī is an Indian script used for many languages of India and Nepal, including Hindi, Marathi, Nepali and Sanskrit. There are several somewhat similar methods of transliteration from Devanāgarī to the Roman script, including the influential and lossless IAST notation.

A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during the nineteenth century from suggestions by Charles Trevelyan, William Jones, Monier Monier-Williams and other scholars, and formalised by the Transliteration Committee of the Geneva Oriental Congress, in September 1894. IAST makes it possible for the reader to read the Indic text unambiguously, exactly as if it were in the original Indic script. It is this faithfulness to the original scripts that accounts for its continuing popularity amongst scholars.

Computer Modern Family of typefaces

Computer Modern is the original family of typefaces used by the typesetting program TeX. It was created by Donald Knuth with his Metafont program, and was most recently updated in 1992. Computer Modern, or variants of it, remains very widely used in scientific publishing, especially in disciplines that make frequent use of mathematical notation.

The Harvard-Kyoto Convention is a system for transliterating Sanskrit and other languages that use the Devanāgarī script into ASCII. It is predominantly used informally in e-mail, and for electronic texts.

ISO 15919 "Transliteration of Devanagari and related Indic scripts into Latin characters" is one of a series of international standards for romanization by the International Organization for Standardization. It was published in 2001 and uses diacritics to map the much larger set of consonants and vowels in Brahmic and Nastaliq scripts to the Latin script.

Open-source Unicode typefaces

A few projects exist to provide free and open-source Unicode typefaces, i.e. Unicode typefaces which are open-source and designed to contain glyphs of all Unicode characters, or at least a broad selection of Unicode scripts. There are also numerous projects aimed at providing only a certain script, such as the Arabeyes Arabic font. The advantage of targeting only some scripts with a font was that certain Unicode characters should be rendered differently depending on which language they are used in, and that a font that only includes the characters a certain user needs will be much smaller in file size compared to one with many glyphs. Unicode fonts in modern formats such as OpenType can in theory cover multiple languages by including multiple glyphs per character, though very few actually cover more than one language's forms of the unified Han characters.

The Cork encoding is a character encoding used for encoding glyphs in fonts. It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX. It contains 256 characters supporting most west and east-European languages with the Latin alphabet.

EB Garamond Typeface family

EB Garamond is a free and open source implementation of Claude Garamont’s Antiqua typeface Garamond and the matching Italic, Greek and Cyrillic characters designed by Robert Granjon. Its name is shortening of Egenolff–Berner Garamond which refers to the fact that the letter forms are taken from the Egenolff–Berner specimen printed in 1592.

The Velthuis system of transliteration is an ASCII transliteration scheme for the Sanskrit language from and to the Devanagari script. It was developed in about 1983 by Frans Velthuis, a scholar living in Groningen, Netherlands, who created a popular, high-quality software package in LaTeX for typesetting Devanāgarī. The primary documentation for the scheme is the system's clearly-written software manual. It is based on using the ISO 646 repertoire to represent mnemonically the accents used in standard scholarly transliteration. It does not use diacritics as IAST does. It may optionally use capital letters in a manner similar but not identical to the Harvard-Kyoto or ITRANS schemes.manual para 4.1

The CSX Indic character set, or the Classical Sanskrit eXtended Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit. It has no association with American railroad company CSX Transportation. It is an extension of the CS Indic character set, and is based on Code Page 437. An extended version is the CSX+ Indic character set. Michael Everson made a font in this character set for the Macintosh.

The CSX+ Indic character set, or the Classical Sanskrit eXtended Plus Indic Character Set, is used by LaTeX to represent text used in the Romanization of Sanskrit. It is an extension of the CSX Indic character set, which in turn is an extension of the CS Indic character set, and is based on Code Page 437. It fixes an issue with Windows programs, by moving á from code point 160 (0xA0), to code point 158 (0x9E).

References

  1. Anshuman Pandey (December 1998). "Romanized Indix and LaTex" (PDF). TUGboat . TeX Users Group. 19 (4): 417.
  2. "CTAN: /Tex-archive/Fonts/CSX/Fonts/Charter".
  3. "Classical Sanskrit eXtended encoding for the representation of Indian languages in Roman script".
  4. "The CSX+ encoding (Classical Sanskrit eXtended Plus) encoding used in (La)TeX".
  5. 1 2 Wujastyk, Dominik (1990). "HUMANIST listserv report". HUMANIST.