CS Indic character set

Last updated December 28, 2021

The CS Indic character set, or the Classical Sanskrit Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit.^[1] It is used in fonts, and is based on Code Page 437.^[2] Extended versions are the CSX Indic character set and the CSX+ Indic character set.^[3]^[4]

Code page layout

CS Indic^[5]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
8x
9x
Ax					title="Alt+164U+00F1 LATIN SMALL LETTER N WITH TILDE" style="padding:0px;"}}\|ñ	title="Alt+165U+00D1 LATIN CAPITAL LETTER N WITH TILDE" style="padding:0px;"}}\|Ñ	title="Alt+166U+006C LATIN SMALL LETTER L, U+0303 COMBINING TILDE" style="padding:0px;"}}\|l̃	title="Alt+167U+1E41 LATIN SMALL LETTER M WITH DOT ABOVE" style="padding:0px;"}}\|ṁ
Bx
Cx
Dx
Ex	title="Alt+224U+0101 LATIN SMALL LETTER A WITH MACRON" style="padding:0px;"}}\|ā	title="Alt+225" style="padding:0px;background:#DDD"}}\|	title="Alt+226U+0100 LATIN CAPITAL LETTER A WITH MACRON" style="padding:0px;"}}\|Ā	title="Alt+227U+012B LATIN SMALL LETTER I WITH MACRON" style="padding:0px;"}}\|ī	title="Alt+228U+012A LATIN CAPITAL LETTER I WITH MACRON" style="padding:0px;"}}\|Ī	title="Alt+229U+016B LATIN SMALL LETTER U WITH MACRON" style="padding:0px;"}}\|ū	title="Alt+230U+016A LATIN CAPITAL LETTER U WITH MACRON" style="padding:0px;"}}\|Ū	title="Alt+231U+1E5B LATIN SMALL LETTER R WITH DOT BELOW" style="padding:0px;"}}\|ṛ	title="Alt+232U+1E5A LATIN CAPITAL LETTER R WITH DOT BELOW" style="padding:0px;"}}\|Ṛ	title="Alt+233U+1E5D LATIN SMALL LETTER R WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|ṝ	title="Alt+234U+1E5C LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|Ṝ	title="Alt+235U+1E37 LATIN SMALL LETTER L WITH DOT BELOW" style="padding:0px;"}}\|ḷ	title="Alt+236U+1E36 LATIN CAPITAL LETTER L WITH DOT BELOW" style="padding:0px;"}}\|Ḷ	title="Alt+237U+1E39 LATIN SMALL LETTER L WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|ḹ	title="Alt+238U+1E38 LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|Ḹ	title="Alt+239U+1E45 LATIN SMALL LETTER N WITH DOT ABOVE" style="padding:0px;"}}\|ṅ
Fx	title="Alt+240U+1E44 LATIN CAPITAL LETTER N WITH DOT ABOVE" style="padding:0px;"}}\|Ṅ	title="Alt+241U+1E6D LATIN SMALL LETTER T WITH DOT BELOW" style="padding:0px;"}}\|ṭ	title="Alt+242U+1E6C LATIN CAPITAL LETTER T WITH DOT BELOW" style="padding:0px;"}}\|Ṭ	title="Alt+243U+1E0D LATIN SMALL LETTER D WITH DOT BELOW" style="padding:0px;"}}\|ḍ	title="Alt+244U+1E0C LATIN CAPITAL LETTER D WITH DOT BELOW" style="padding:0px;"}}\|Ḍ	title="Alt+245U+1E47 LATIN SMALL LETTER N WITH DOT BELOW" style="padding:0px;"}}\|ṇ	title="Alt+246U+1E46 LATIN CAPITAL LETTER N WITH DOT BELOW" style="padding:0px;"}}\|Ṇ	title="Alt+247U+015B LATIN SMALL LETTER S WITH ACUTE" style="padding:0px;"}}\|ś	title="Alt+248U+015A LATIN CAPITAL LETTER S WITH ACUTE" style="padding:0px;"}}\|Ś	title="Alt+249U+1E63 LATIN SMALL LETTER S WITH DOT BELOW" style="padding:0px;"}}\|ṣ	title="Alt+250U+1E62 LATIN CAPITAL LETTER S WITH DOT BELOW" style="padding:0px;"}}\|Ṣ	title="Alt+251" style="padding:0px;background:#DDD"}}\|	title="Alt+252U+1E43 LATIN SMALL LETTER M WITH DOT BELOW" style="padding:0px;"}}\|ṃ	title="Alt+253U+1E42 LATIN CAPITAL LETTER M WITH DOT BELOW" style="padding:0px;"}}\|Ṃ	title="Alt+254U+1E25 LATIN SMALL LETTER H WITH DOT BELOW" style="padding:0px;"}}\|ḥ	title="Alt+255U+1E24 LATIN CAPITAL LETTER H WITH DOT BELOW" style="padding:0px;"}}\|Ḥ

History

The CS and CSX character set was defined during an informal discussion over a beer between John Smith, Dominik Wujastyk and Ronald E. Emmerick during the World Sanskrit Conference in Vienna, 1990. A few months later they were endorsed by several other Indologists including Harry Falk, Richard Lariviere, G. Jan Meulenbeld, Hideaki Nakatani, Muneo Tokunaga, and Michio Yano.^[5]

Related Research Articles

Devanagari, also called Nagari, is a left-to-right abugida, based on the ancient Brāhmī script, used in the Indian subcontinent. It was developed in ancient India from the 1st to the 4th century CE and was in regular use by the 7th century CE. The Devanagari script, composed of 47 primary characters including 14 vowels and 33 consonants, is the fourth most widely adopted writing system in the world, being used for over 120 languages.

TeX, stylized within the system as $T e X$ , is a typesetting system which was designed and written by Donald Knuth and first released in 1978. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.

Pali is a Middle Indo-Aryan liturgical language native to the Indian subcontinent. It is widely studied because it is the language of the Pāli Canon or Tipiṭaka as well as the sacred language of Theravāda Buddhism. In early time, it was written in Brahmi script.

Metafont is a description language used to define raster fonts. It is also the name of the interpreter that executes Metafont code, generating the bitmap fonts that can be embedded into e.g. PostScript. Metafont was devised by Donald Knuth as a companion to his TeX typesetting system.

OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages and for printing.

The Tibetan script is a segmental writing system (abugida) of Indic origin used to write certain Tibetic languages, including Tibetan, Dzongkha, Sikkimese, Ladakhi, Jirel and Balti. It has also been used for some non-Tibetic languages in close cultural contact with Tibet, such as Thakali. The printed form is called uchen script while the hand-written cursive form used in everyday writing is called umê script. This writing system is used across the Himalayas, and Tibet.

Devanāgarī is an Indian script used for many languages of India and Nepal, including Hindi, Marathi, Nepali and Sanskrit. There are several somewhat similar methods of transliteration from Devanāgarī to the Roman script, including the influential and lossless IAST notation.

A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during the nineteenth century from suggestions by Charles Trevelyan, William Jones, Monier Monier-Williams and other scholars, and formalised by the Transliteration Committee of the Geneva Oriental Congress, in September 1894. IAST makes it possible for the reader to read the Indic text unambiguously, exactly as if it were in the original Indic script. It is this faithfulness to the original scripts that accounts for its continuing popularity amongst scholars.

Computer Modern is the original family of typefaces used by the typesetting program TeX. It was created by Donald Knuth with his Metafont program, and was most recently updated in 1992. Computer Modern, or variants of it, remains very widely used in scientific publishing, especially in disciplines that make frequent use of mathematical notation.

The Harvard-Kyoto Convention is a system for transliterating Sanskrit and other languages that use the Devanāgarī script into ASCII. It is predominantly used informally in e-mail, and for electronic texts.

ISO 15919 "Transliteration of Devanagari and related Indic scripts into Latin characters" is one of a series of international standards for romanization by the International Organization for Standardization. It was published in 2001 and uses diacritics to map the much larger set of consonants and vowels in Brahmic and Nastaliq scripts to the Latin script.

A few projects exist to provide free and open-source Unicode typefaces, i.e. Unicode typefaces which are open-source and designed to contain glyphs of all Unicode characters, or at least a broad selection of Unicode scripts. There are also numerous projects aimed at providing only a certain script, such as the Arabeyes Arabic font. The advantage of targeting only some scripts with a font was that certain Unicode characters should be rendered differently depending on which language they are used in, and that a font that only includes the characters a certain user needs will be much smaller in file size compared to one with many glyphs. Unicode fonts in modern formats such as OpenType can in theory cover multiple languages by including multiple glyphs per character, though very few actually cover more than one language's forms of the unified Han characters.

The Cork encoding is a character encoding used for encoding glyphs in fonts. It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX. It contains 256 characters supporting most west and east-European languages with the Latin alphabet.

EB Garamond is a free and open source implementation of Claude Garamont’s Antiqua typeface Garamond and the matching Italic, Greek and Cyrillic characters designed by Robert Granjon. Its name is shortening of Egenolff–Berner Garamond which refers to the fact that the letter forms are taken from the Egenolff–Berner specimen printed in 1592.

The Velthuis system of transliteration is an ASCII transliteration scheme for the Sanskrit language from and to the Devanagari script. It was developed in about 1983 by Frans Velthuis, a scholar living in Groningen, Netherlands, who created a popular, high-quality software package in LaTeX for typesetting Devanāgarī. The primary documentation for the scheme is the system's clearly-written software manual. It is based on using the ISO 646 repertoire to represent mnemonically the accents used in standard scholarly transliteration. It does not use diacritics as IAST does. It may optionally use capital letters in a manner similar but not identical to the Harvard-Kyoto or ITRANS schemes.manual para 4.1

The CSX Indic character set, or the Classical Sanskrit eXtended Indic Character Set, is used by LaTeX represent text used in the Romanization of Sanskrit. It has no association with American railroad company CSX Transportation. It is an extension of the CS Indic character set, and is based on Code Page 437. An extended version is the CSX+ Indic character set. Michael Everson made a font in this character set for the Macintosh.

The CSX+ Indic character set, or the Classical Sanskrit eXtended Plus Indic Character Set, is used by LaTeX to represent text used in the Romanization of Sanskrit. It is an extension of the CSX Indic character set, which in turn is an extension of the CS Indic character set, and is based on Code Page 437. It fixes an issue with Windows programs, by moving á from code point 160 (0xA0), to code point 158 (0x9E).

References

↑ Anshuman Pandey (December 1998). "Romanized Indix and LaTex" (PDF). TUGboat . TeX Users Group. 19 (4): 417.
↑ "CTAN: /Tex-archive/Fonts/CSX/Fonts/Charter".
↑ "Classical Sanskrit eXtended encoding for the representation of Indian languages in Roman script".
↑ "The CSX+ encoding (Classical Sanskrit eXtended Plus) encoding used in (La)TeX".
1 2 Wujastyk, Dominik (1990). "HUMANIST listserv report". HUMANIST.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Anshuman Pandey (December 1998). "Romanized Indix and LaTex" (PDF). TUGboat . TeX Users Group. 19 (4): 417.

[2] "CTAN: /Tex-archive/Fonts/CSX/Fonts/Charter".

[3] "Classical Sanskrit eXtended encoding for the representation of Indian languages in Roman script".

[4] "The CSX+ encoding (Classical Sanskrit eXtended Plus) encoding used in (La)TeX".

[humanist-5] 1 2 Wujastyk, Dominik (1990). "HUMANIST listserv report". HUMANIST.

[1]

[2]

[3]

[4]

[5]

CS Indic^[5]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
8x
9x
Ax					title="Alt+164U+00F1 LATIN SMALL LETTER N WITH TILDE" style="padding:0px;"}}\|ñ	title="Alt+165U+00D1 LATIN CAPITAL LETTER N WITH TILDE" style="padding:0px;"}}\|Ñ	title="Alt+166U+006C LATIN SMALL LETTER L, U+0303 COMBINING TILDE" style="padding:0px;"}}\|l̃	title="Alt+167U+1E41 LATIN SMALL LETTER M WITH DOT ABOVE" style="padding:0px;"}}\|ṁ
Bx
Cx
Dx
Ex	title="Alt+224U+0101 LATIN SMALL LETTER A WITH MACRON" style="padding:0px;"}}\|ā	title="Alt+225" style="padding:0px;background:#DDD"}}\|	title="Alt+226U+0100 LATIN CAPITAL LETTER A WITH MACRON" style="padding:0px;"}}\|Ā	title="Alt+227U+012B LATIN SMALL LETTER I WITH MACRON" style="padding:0px;"}}\|ī	title="Alt+228U+012A LATIN CAPITAL LETTER I WITH MACRON" style="padding:0px;"}}\|Ī	title="Alt+229U+016B LATIN SMALL LETTER U WITH MACRON" style="padding:0px;"}}\|ū	title="Alt+230U+016A LATIN CAPITAL LETTER U WITH MACRON" style="padding:0px;"}}\|Ū	title="Alt+231U+1E5B LATIN SMALL LETTER R WITH DOT BELOW" style="padding:0px;"}}\|ṛ	title="Alt+232U+1E5A LATIN CAPITAL LETTER R WITH DOT BELOW" style="padding:0px;"}}\|Ṛ	title="Alt+233U+1E5D LATIN SMALL LETTER R WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|ṝ	title="Alt+234U+1E5C LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|Ṝ	title="Alt+235U+1E37 LATIN SMALL LETTER L WITH DOT BELOW" style="padding:0px;"}}\|ḷ	title="Alt+236U+1E36 LATIN CAPITAL LETTER L WITH DOT BELOW" style="padding:0px;"}}\|Ḷ	title="Alt+237U+1E39 LATIN SMALL LETTER L WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|ḹ	title="Alt+238U+1E38 LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON" style="padding:0px;"}}\|Ḹ	title="Alt+239U+1E45 LATIN SMALL LETTER N WITH DOT ABOVE" style="padding:0px;"}}\|ṅ
Fx	title="Alt+240U+1E44 LATIN CAPITAL LETTER N WITH DOT ABOVE" style="padding:0px;"}}\|Ṅ	title="Alt+241U+1E6D LATIN SMALL LETTER T WITH DOT BELOW" style="padding:0px;"}}\|ṭ	title="Alt+242U+1E6C LATIN CAPITAL LETTER T WITH DOT BELOW" style="padding:0px;"}}\|Ṭ	title="Alt+243U+1E0D LATIN SMALL LETTER D WITH DOT BELOW" style="padding:0px;"}}\|ḍ	title="Alt+244U+1E0C LATIN CAPITAL LETTER D WITH DOT BELOW" style="padding:0px;"}}\|Ḍ	title="Alt+245U+1E47 LATIN SMALL LETTER N WITH DOT BELOW" style="padding:0px;"}}\|ṇ	title="Alt+246U+1E46 LATIN CAPITAL LETTER N WITH DOT BELOW" style="padding:0px;"}}\|Ṇ	title="Alt+247U+015B LATIN SMALL LETTER S WITH ACUTE" style="padding:0px;"}}\|ś	title="Alt+248U+015A LATIN CAPITAL LETTER S WITH ACUTE" style="padding:0px;"}}\|Ś	title="Alt+249U+1E63 LATIN SMALL LETTER S WITH DOT BELOW" style="padding:0px;"}}\|ṣ	title="Alt+250U+1E62 LATIN CAPITAL LETTER S WITH DOT BELOW" style="padding:0px;"}}\|Ṣ	title="Alt+251" style="padding:0px;background:#DDD"}}\|	title="Alt+252U+1E43 LATIN SMALL LETTER M WITH DOT BELOW" style="padding:0px;"}}\|ṃ	title="Alt+253U+1E42 LATIN CAPITAL LETTER M WITH DOT BELOW" style="padding:0px;"}}\|Ṃ	title="Alt+254U+1E25 LATIN SMALL LETTER H WITH DOT BELOW" style="padding:0px;"}}\|ḥ	title="Alt+255U+1E24 LATIN CAPITAL LETTER H WITH DOT BELOW" style="padding:0px;"}}\|Ḥ

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Korean Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC 353 355 357 358 359 360 EBCDIC Teletex and Videotex/Teletext ISO/IEC 6937 / ITU T.51 ITU T.61 ITU T.101 World System Teletext background sets
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi French/Dutch/Turkish Adaptations Welsh Barents Cyrillic Volga Cyrillic Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 8957 / 10585 / 10586 / 10754 / 11822
National standards	ArmSCII BraSCII CNS 11643 ELOT 927 GOST 10859 GB 2312 GB 12052 GB 18030 HKSCS I.S. 434 ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1284 LST 1564 LST 1590-1 LST 1590-2 LST 1590-3 LST 1590-4 PASCII RUSCII SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	7-bit CN CN-EXT JP JP-EXT JP-1 JP-2 JP-3 KR ISO/IEC 4873 ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC CN KR JP TW
Mac OS code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic CentEuro ChineseSimp / EUC-CN ChineseTrad / Big5 Croatian Cyrillic Devanagari / ISCII Dingbats Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati / ISCII Gurmukhi / ISCII Hebrew Iceland Inuit Japanese / Shift JIS Keyboard Korean / EUC-KR Latin (Kermit) Maltese/Esperanto Ogham / I.S. 434 Roman Romanian Sámi Symbol Thai / TIS-620 Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	100 111 112 113 151 152 161 162 163 164 165 166 210 220 301 437 449 489 620 667 668 706 707 708 709 710 711 714 715 720 721 737 768 770 771 772 773 774 775 776 777 778 790 850 851 852 853 854 855/872 856 857 858 859 860 861 862 863 864 865 866/808 867 868 869 874/1161/1162 876 877 878 881 882 883 884 885 891 895 896 897 898 899 900 903 904 906 907 909 910 911 926 927 928 929 932 934 936 938 941 942 943 944 946 947 948 949 950/1370 951 966 991 1034 1039 1040 1041 1042 1043 1044 1046 1086 1088 1092 1093 1098 1108 1109 1114 1115 1116 1117 1118 1119 1125/848 1126 1127 1131/849 1139 1167 1168 1300 1351 1361 1362 1363 1372 1373 1374 1375 1380 1381 1385 1386 1391 1392 1393 1394 3012 3021 3843 3844 3845 3846 3847 3848 30000 30001 30002 30003 30004 30005 30006 30007 30008 30009 30010 30011 30012 30013 30014 30015 30016 30017 30018 30019 30020 30021 30022 30023 30024 30025 30026 30027 30028 30029 30030 30031 30032 30033 30034 30039 30040 58152 58210 58335 59234 59829 60258 60853 61282 62306 CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický KOI8 Mazovia MIK
IBM AIX code pages	367 371 806 813 819 895 896 912 913 914 915 916 919 920 921/901 922/902 923 952 953 954 955 956 957 958 959 960 961 963 964 965 970 971 1004 1006 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1029 1036 1089 1111 1124 1129/1163 1133 1350 1382 1383
IBM code pages for other vendors' encodings	Apple Macintosh 1275 1280 1281 1282 1283 1284 1285 1286 Adobe 1038 1276 1277 DEC 1020 1021 1023 1090 1100 1101 1102 1103 1104 1105 1106 1107 1287 1288 HP 1050 1051 1052 1053 1054 1055 1056 1057 1058
Windows code pages	CER-GS 874/1162 (TIS-620) 932/943 (Shift JIS) 936/1386 (GBK) 950/1370 (Big5) 949/1363 (EUC-KR) 1169 1174 Extended Latin-8 1200 (UTF-16LE) 1201 (UTF-16BE) 1250 1251 1252 1253 1254 1255 1256 1257 1258 1261 1270 54936 (GB18030) Armenian Cyrillic + Finnish Cyrillic + French Cyrillic + German Polytonic Greek 65001 (UTF-8)
Microsoft code pages for other vendors' encodings	Apple Macintosh 10000 10001 10002 10003 10004 10005 10006 10007 10008 10010 10017 10021 10029 10079 10081 10082
EBCDIC code pages	37/1140 37-2 273/1141 300/16684 290/8482/20290 298 390 391 392 393 394 395 424 435 500/1148 829 834 835 837 839 875 880 DKOI K1 DKOI K2 881 882 883 884 885 886 887 888 889 890 924 930/1390 931 933/1364 935/1388 937/1371 939/1399 1001 1003 1005 1007 1024 1026 1027/5123 1028 1030 1031 1032 1033 1037 1047 1068 1071 1073 1074 1075 1076 1077 1078 1080 1082 1083 1085 1087 1091 1136 1150 1151 1152 1278 1279 1303 1376 1377 JEF KEIS
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish 7-bit Hebrew 8-bit Hebrew Special Graphics Technical (TCS)
Platform specific	Acorn Adobe Standard Adobe Latin 1 Amstrad CPC Apple I Apple II Apple III ATASCII Atari ST BICS Casio calculators CDC Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International ELWRO-Junior FIELDATA GEM GEOS GSM 03.38 HP Roman Extension HP Roman-8 HP Roman-9 HP FOCAL HP RPL IBM SQUOZE LICS LMBCS Mattel Aquarius Minitel MSX NEC APC NeXT OricSCII PCW PETSCII Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International Ventura Symbol WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 (UTF-16LE/UTF-16BE) / UCS-2 UTF-32 (UTF-32LE/UTF-32BE) / UCS-4 UTF-EBCDIC GB 18030 BOCU-1 CESU-8 SCSU
TeX typesetting system	Cork IL1 IL2 IL3 L7X LGR LY1 OML OMS OMX OT1 OT2 OT3 OT4 PL0 QX T2A T2B T2C T2D T3 T4 T5 TS1 TS3 U X2
Miscellaneous code pages	ABICOMP APL 293 310 (Graphic Escape) 351 (GDDM) 371 (IR-68) 907 (OEM) 909 (OEM Alt Ext) 910 (OEM Alt) ARIB STD-B24 HZ IEC-P27-1 INIS 7-bit 8-bit Cyrillic ISO-IR-169 ISO 2033 Johab Mojikyō SEASCII Stanford/ITS TACE16 TRON UTF-5 UTF-6 WTF-8
Control and nonprinting character sets	Morse prosigns C0 and C1 control codes ISO/IEC 6429 / ANSI X3.64 / ECMA-48 / JIS X 0211 ISO 6630 DIN 31626 JIS X 0207 ITU T.101 C0 C1 EBCDIC control codes Unicode control, format and separator characters Whitespace characters
Related topics	Code page Windows code page CCSID Character encodings in HTML Charset detection Han unification Hardware code page Mojibake Variable-width encoding
Character sets

CS Indic character set

Contents

Code page layout

History

Related Research Articles

References