CS Indic character set

Last updated

The CS Indic character set, or the Classical Sanskrit Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit. [1] It is used in fonts, and is based on Code Page 437. [2] Extended versions are the CSX Indic character set and the CSX+ Indic character set. [3] [4]

Contents

Code page layout

CS Indic [5]
0123456789ABCDEF
8x
9x
Ax ñ Ñ
Bx
Cx
Dx
Ex ā Ā ī Ī ū Ū
Fx ś Ś

History

The CS and CSX character set was defined during an informal discussion over a beer between John Smith, Dominik Wujastyk and Ronald E. Emmerick during the World Sanskrit Conference in Vienna, 1990. A few months later they were endorsed by several other Indologists including Harry Falk, Richard Lariviere, G. Jan Meulenbeld, Hideaki Nakatani, Muneo Tokunaga, and Michio Yano. [5]

Related Research Articles

<span class="mw-page-title-main">Devanagari</span> Script used to write Indian and Nepalese languages

Devanagari is an Indic script used in the Indian subcontinent. Also simply called Nāgarī, it is a left-to-right abugida, based on the ancient Brāhmī script. It is one of the official scripts of the Republic of India and Nepal. It was developed and in regular use by the 8th century CE and achieved its modern form by 1000 CE. The Devanāgarī script, composed of 48 primary characters, including 14 vowels and 34 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

TeX, stylized within the system as TeX, is a typesetting program which was designed and written by computer scientist and Stanford University professor Donald Knuth and first released in 1978. The term now refers to the system of extensions – which includes software programs called TeX engines, sets of TeX macros, and packages which provide extra typesetting functionality – built around the original TeX language. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.

Pāli, also known as Pali-Magadhi, is a classical Middle Indo-Aryan language on the Indian subcontinent. It is widely studied because it is the language of the Buddhist Pāli Canon or Tipiṭaka as well as the sacred language of Theravāda Buddhism. Pali was designated as a classical language by the Government of India on 3 October 2024.

<span class="mw-page-title-main">Blackboard bold</span> Typeface style

Blackboard bold is a style of writing bold symbols on a blackboard by doubling certain strokes, commonly used in mathematical lectures, and the derived style of typeface used in printed mathematical texts. The style is most commonly used to represent the number sets , (integers), , , and .

Metafont is a description language used to define raster fonts. It is also the name of the interpreter that executes Metafont code, generating the bitmap fonts that can be embedded into e.g. PostScript. Metafont was devised by Donald Knuth as a companion to his TeX typesetting system.

OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

<span class="mw-page-title-main">Device independent file format</span> Typesetting file format

The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs in 1979. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages and for printing.

Devanagari is an Indic script used for many Indo-Aryan languages of North India and Nepal, including Hindi, Marathi and Nepali, which was the script used to write Classical Sanskrit. There are several somewhat similar methods of transliteration from Devanagari to the Roman script, including the influential and lossless IAST notation. Romanised Devanagari is also called Romanagari.

A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.

Lao script or Akson Lao is the primary script used to write the Lao language and other minority languages in Laos. Its earlier form, the Tai Noi script, was also used to write the Isan language, but was replaced by the Thai script. It has 27 consonants, 7 consonantal ligatures, 33 vowels, and 4 tone marks.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during the 19th century from suggestions by Charles Trevelyan, William Jones, Monier Monier-Williams and other scholars, and formalised by the Transliteration Committee of the Geneva Oriental Congress, in September 1894. IAST makes it possible for the reader to read the Indic text unambiguously, exactly as if it were in the original Indic script. It is this faithfulness to the original scripts that accounts for its continuing popularity amongst scholars.

<span class="mw-page-title-main">Computer Modern</span> Family of serif typefaces

Computer Modern is the original family of typefaces used by the typesetting program TeX. It was created by Donald Knuth with his Metafont program, and was most recently updated in 1992. Computer Modern, or variants of it, remains very widely used in scientific publishing, especially in disciplines that make frequent use of mathematical notation.

The Harvard-Kyoto Convention is a system for transliterating Sanskrit and other languages that use the Devanāgarī script into ASCII. It is predominantly used informally in e-mail, and for electronic texts.

<span class="mw-page-title-main">Open-source Unicode typefaces</span>

There are Unicode typefaces which are open-source and designed to contain glyphs of all Unicode characters, or at least a broad selection of Unicode scripts. There are also numerous projects aimed at providing only a certain script, such as the Arabeyes Arabic font. The advantage of targeting only some scripts with a font was that certain Unicode characters should be rendered differently depending on which language they are used in, and that a font that only includes the characters a certain user needs will be much smaller in file size compared to one with many glyphs. Unicode fonts in modern formats such as OpenType can in theory cover multiple languages by including multiple glyphs per character, though very few actually cover more than one language's forms of the unified Han characters.

The Cork encoding is a character encoding used for encoding glyphs in fonts. It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX. It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.

The Velthuis system of transliteration is an ASCII transliteration scheme for the Sanskrit language from and to the Devanagari script. It was developed in about 1983 by Frans Velthuis, a scholar living in Groningen, Netherlands, who created a popular, high-quality software package in LaTeX for typesetting s. The primary documentation for the scheme is the system's clearly written software Daniella and awwkeiwek. It is based on using the ISO 646 repertoire to represent mnemonically the accents used in standard scholarly transliteration.

The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.

The CSX Indic character set, or the Classical Sanskrit eXtended Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit. It has no association with American railroad company CSX Transportation. It is an extension of the CS Indic character set, and is based on Code Page 437. An extended version is the CSX+ Indic character set. Michael Everson made a font in this character set for the Macintosh.

The CSX+ Indic character set, or the Classical Sanskrit eXtended Plus Indic Character Set, is used by LaTeX to represent text used in the Romanization of Sanskrit. It is an extension of the CSX Indic character set, which in turn is an extension of the CS Indic character set, and is based on Code Page 437. It fixes an issue with Windows programs, by moving á from code point 160 (0xA0), to code point 158 (0x9E).

References

  1. Anshuman Pandey (December 1998). "Romanized Indix and LaTex" (PDF). TUGboat . 19 (4). TeX Users Group: 417.
  2. "CTAN: /Tex-archive/Fonts/CSX/Fonts/Charter".
  3. "Classical Sanskrit eXtended encoding for the representation of Indian languages in Roman script".
  4. "The CSX+ encoding (Classical Sanskrit eXtended Plus) encoding used in (La)TeX".
  5. 1 2 Wujastyk, Dominik (1990). "HUMANIST listserv report". HUMANIST.