C0 and C1 control codes

Last updated May 30, 2024

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

C0 codes are the range 00_HEX–1F_HEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80_HEX–9F_HEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.

C0 controls

ASCII defined 32 control characters, plus a necessary extra character for the DEL character, 7F_HEX or 01111111_BIN (needed to punch out all the holes on a paper tape and erase it).

This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.

Only a few codes have maintained their use: BEL, ESC, and the "Format Effector" (FE_n) characters BS, TAB, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator. Some data transfer protocols such as ANPA-1312, Kermit, and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (IS_n) such as the Unix info format^[1] and Python's splitlines string method.^[2]

The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).

ASCII control codes, originally defined in ANSI X3.4.^[3]
Caret notation	Decimal	Hexadecimal	Abbreviations	Symbol	Name	C escape	Description
^@	0	00	NUL	␀	Null	\0	Does nothing. The code of blank paper tape, and also used for padding to slow transmission.
^A	1	01	TC₁, SOH	␁	Start of Heading		First character of the heading of a message.^[4]
^B	2	02	TC₂, STX	␂	Start of Text		Terminates the header and starts the message text.
^C	3	03	TC₃, ETX	␃	End of Text		Ends the message text, starts a footer (up to the next TC character).^[4]^[5]
^D	4	04	TC₄, EOT	␄	End of Transmission		Ends the transmission of one or more messages.^[4]^[5] May place terminals on standby.^[5]
^E	5	05	TC₅, ENQ, WRU^{[lower-alpha 1]}	␅	Enquiry		Trigger a response at the receiving end, to see if it is still present.
^F	6	06	TC₆, ACK	␆	Acknowledge		Indication of successful receipt of a message.
^G	7	07	BEL^{[lower-alpha 2]}	␇	Bell, Alert	\a	Call for attention from an operator.
^H	8	08	FE₀, BS	␈	Backspace	\b	Move one position leftwards. Next character may overprint or replace the character that was there.
^I	9	09	FE₁, HT	␉	Character Tabulation, Horizontal Tabulation	\t	Move right to the next tab stop.
^J	10	0A	FE₂, LF	␊	Line Feed	\n	Move down to the same position on the next line (some devices also moved to the left column).
^K	11	0B	FE₃, VT	␋	Line Tabulation, Vertical Tabulation	\v	Move down to the next vertical tab stop.
^L	12	0C	FE₄, FF	␌	Form Feed	\f	Move down to the top of the next page.
^M	13	0D	FE₅, CR	␍	Carriage Return	\r	Move to column zero while staying on the same line.
^N	14	0E	SO, LS₀^{[lower-alpha 3]}	␎	Shift Out		Switch to an alternative character set.
^O	15	0F	SI, LS₁^{[lower-alpha 3]}	␏	Shift In		Return to regular character set after SO.
^P	16	10	TC₇, DC₀,^{[lower-alpha 4]} DLE	␐	Data Link Escape		Cause a limited number of contiguously following characters to be interpreted in some different way.^[14]^[15]
^Q	17	11	DC₁, XON	␑	Device Control One		Turn on (DC₁ and DC₂) or off (DC₃ and DC₄) devices. Teletype^[6] used these for the paper tape reader and the paper tape punch. The first use became the de facto standard for software flow control.^[16]
^R	18	12	DC₂, TAPE	␒	Device Control Two
^S	19	13	DC₃, XOFF	␓	Device Control Three
^T	20	14	DC₄, ~~TAPE~~	␔	Device Control Four
^U	21	15	TC₈, NAK	␕	Negative Acknowledge		Negative response to a sender, such as a detected error.
^V	22	16	TC₉, SYN	␖	Synchronous Idle		Sent in synchronous transmission systems when no other character is being transmitted.
^W	23	17	TC₁₀, ETB	␗	End of Transmission Block		End of a transmission block of data when data are divided into such blocks for transmission purposes.
^X	24	18	CAN	␘	Cancel		Indicates that the data preceding it are in error or are to be disregarded.
^Y	25	19	EM	␙	End of medium		Indicates on paper or magnetic tapes that the end of the usable portion of the tape had been reached.^[3]
^Z	26	1A	SUB	␚	Substitute		Replaces a character that was found to be invalid or in error. Should be ignored.
^[	27	1B	ESC	␛	Escape	\e ^{[lower-alpha 5]}	Alters the meaning of a limited number of following bytes. Nowadays this is almost always used to introduce an ANSI escape sequence.
^\	28	1C	IS₄, FS	␜	File Separator		Can be used as delimiters to mark fields of data structures. US is the lowest level, while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it. SP (space) could be considered an even lower level.
^]	29	1D	IS₃, GS	␝	Group Separator
^^	30	1E	IS₂, RS	␞	Record Separator
^_	31	1F	IS₁, US	␟	Unit Separator
While not technically part of the C0 control character range, the following two characters can be thought of as having some characteristics of control characters.
	32	20	SP	␠	Space		Move right one character position.
^?	127	7F	DEL	␡	Delete		Should be ignored. Used to delete characters on punched tape by punching out all the holes.

↑ Teletype labelled the key WRU for 'who are you?'^[6]
↑ The name BELL is assigned by Unicode to the unrelated emoji character 🔔 (U+1F514). While C0 and C1 control characters were not formally named by the Unicode standard itself at the time, this collided with existing use of BELL as the name of this control character in software following the previous versions of UTS#18 (the Unicode Regular Expressions standard),^[7] e.g. in Perl.^[8] Unicode now accepts ALERT and BEL (but not BELL) as formal aliases for the control character,^[9] although the code chart still lists BELL as the ISO 6429 alias,^[10] and the corresponding control picture code point is called SYMBOL FOR BELL. Perl subsequently switched to using BELL for the emoji in version 5.18.^[11]
1 2 ISO/IEC 2022 (ECMA-35) refers to these as LS0 and LS1 in 8-bit environments, and as SI and SO in 7-bit environments.^[12]
↑ The first, 1963 edition of ASCII classified DLE as a device control, rather than a transmission control, and gave it the abbreviation DC0 ("device control reserved for data link escape").^[13]
↑ The '\e' escape sequence is not part of ISO C and many other language specifications. However, it is understood by several compilers, including GCC.

C1 controls

In 1973, ECMA-35 and ISO 2022 ^[17] attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa.^[18] In a 7-bit environment, the Shift Out ( SO ) would change the meaning of the 96 bytes 0x20 through 0x7F^{[lower-alpha 1]}^[20] (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range 0x80 through 0x9F could not be printed in a 7-bit environment,^[18] thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences ESC @ through ESC _ were to be considered equivalent.^[18] The later ISO 8859 standards abandoned support for 7-bit codes, but preserved this range of control characters.

The first C1 control code set to be registered for use with ISO 2022 was DIN 31626,^[21] a specialised set for bibliographic use which was registered in 1979.^[22]

The more common general-use ISO/IEC 6429 set was registered in 1983,^[23] although the ECMA-48 specification upon which it was based had been first published in 1976^[24] and JIS X 0211 (formerly JIS C 6323).^[25] Symbolic names defined by RFC 1345 and early drafts of ISO 10646, but not in ISO/IEC 6429 ( PAD , HOP and SGC ) are also used.^[8]^[26]

Except for SS2 and SS3 in EUC-JP text, and NEL in text transcoded from EBCDIC, the 8-bit forms of these codes were almost never used. CSI , DCS and OSC are used to control text terminals and terminal emulators, but almost always by using their 7-bit escape code representations. Nowadays if these codes are encountered it is far more likely they are intended to be printing characters from that position of Windows-1252 or Mac OS Roman.

ISO/IEC 6429 and RFC 1345 C1 control codes
ESC+	Decimal	Hex	Abbr	Name	Description^[27]
@	128	80	PAD^[9]	Padding Character^{[lower-alpha 2]}	Proposed as a "padding" or "high byte" for single-byte characters to make them two bytes long for easier interoperability with multiple byte characters. Extended Unix Code (EUC) occasionally uses this.^[31]
A	129	81	HOP^[9]	High Octet Preset^{[lower-alpha 2]}	Proposed to set the high byte of a sequence of multiple byte characters so they only need one byte each, as a simple form of data compression.
B	130	82	BPH	Break Permitted Here^{[lower-alpha 3]}	Follows a graphic character where a line break is permitted. Roughly equivalent to a soft hyphen or zero-width space except it does not define what is printed at the line break.
C	131	83	NBH	No Break Here^{[lower-alpha 3]}	Follows the graphic character that is not to be broken. See also word joiner.
D	132	84	IND	Index^{[lower-alpha 4]}	Move down one line without moving horizontally, to eliminate ambiguity about the meaning of LF.
E	133	85	NEL	Next Line	Equivalent to CR+LF, to match the EBCDIC control character.
F	134	86	SSA	Start of Selected Area	Used by block-oriented terminals. In xterm `ESC F` moves to the lower-left corner of the screen, since certain software assumes this behaviour.^[34]
G	135	87	ESA	End of Selected Area
H	136	88	HTS	Character Tabulation Set Horizontal Tabulation Set	Set a tab stop at the current position.
I	137	89	HTJ	Character Tabulation With Justification Horizontal Tabulation With Justification	Right-justify the text since the last tab against the next tab stop.
J	138	8A	VTS	Line Tabulation Set Vertical Tabulation Set	Set a vertical tab stop.
K	139	8B	PLD	Partial Line Forward Partial Line Down	To produce subscripts and superscripts in ISO/IEC 6429. Subscripts use `PLD text PLU` while superscripts use `PLU text PLD`.
L	140	8C	PLU	Partial Line Backward Partial Line Up
M	141	8D	RI	Reverse Line Feed Reverse Index	Move up one line.
N	142	8E	SS2	Single-Shift 2	Next character is from the G2 or G3 sets, respectively.
O	143	8F	SS3	Single-Shift 3	Next character is from the G2 or G3 sets, respectively.
P	144	90	DCS	Device Control String	Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C). Xterm defined a number of these.^[35]
Q	145	91	PU1	Private Use 1	Reserved for private function agreed on between the sender and the recipient of the data.
R	146	92	PU2	Private Use 2
S	147	93	STS	Set Transmit State
T	148	94	CCH	Cancel character	Destructive backspace, to eliminate ambiguity about meaning of BS .
U	149	95	MW	Message Waiting
V	150	96	SPA	Start of Protected Area	Used by block-oriented terminals.
W	151	97	EPA	End of Protected Area	Used by block-oriented terminals.
X	152	98	SOS	Start of String^{[lower-alpha 3]}	Followed by a control string terminated by ST (0x9C) which (unlike DCS , OSC , PM or APC ) may contain any character except SOS or ST.
Y	153	99	SGC,^[9] SGCI^[36]	Single Graphic Character Introducer^{[lower-alpha 2]}	Intended to allow an arbitrary Unicode character to be printed; it would be followed by that character, most likely encoded in UTF-1.^[36]
Z	154	9A	SCI	Single Character Introducer^{[lower-alpha 3]}	To be followed by a single printable character (0x20 through 0x7E) or format effector (0x08 through 0x0D), and to print it as ASCII no matter what graphic or control sets were in use.
[	155	9B	CSI	Control Sequence Introducer	Used to introduce control sequences that take parameters. Used for ANSI escape sequences.
\	156	9C	ST	String Terminator	Terminates a string started by DCS , SOS , OSC , PM or APC .
]	157	9D	OSC	Operating System Command	Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C), intended for use to allow in-band signaling of protocol information, but rarely used for that purpose. Some terminal emulators, including xterm, use OSC sequences for setting the window title and changing the colour palette. They may also support terminating an OSC sequence with BEL instead of ST.^[37] Kermit used APC to transmit commands.^[38]
^	158	9E	PM	Privacy Message
_	159	9F	APC	Application Program Command

↑ In early versions the range excluded SP and DEL^[19]
1 2 3 Not part of ISO/IEC 6429 (ECMA-48)^[8]^[26]^[28]^: 4^[29]^: 5^[30]^: 8
1 2 3 4 Not part of the first edition of ISO/IEC 6429.^[23]^[28]^: 4
↑ Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429^[30]^: 87 (1986^[32] and 1991^[33] respectively for ECMA-48).

Other control code sets

The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence ESC ! @ and the above C1 set chosen with the sequence ESC " C.^[23]

Several official and unofficial alternatives have been defined, but this is pretty much obsolete. Most were forced to retain a good deal of compatibility with the ASCII controls for interoperability. The standard makes ESC,^[39]^[40] SP and DEL^{[lower-alpha 1]} "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard.^[42] It also specifies that if a C0 set included transmission control (TC_n) codes, they must be encoded at their ASCII locations^[39] and could not be put in a C1 set,^[43] and any new transmission controls must be in a C1 set.^[39]

Other C0 control code sets

ANPA-1312, a text markup language used for news transmission, replaces several C0 control characters.
IPTC 7901, the newer international version of the above, has its own variations.
Videotex has a completely different set.
Teletext also defines a set similar to Videotex.
T.61/T.51,^[44] and others^[45] replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment.
Some sets replaced FS with SS2,^[46] (same as ANPA-1312).
The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources.^[47] replaced FS with CEX or "Control Extension"^[48] which introduces control sequences for vertical text behaviour, superscripts and subscripts^[49] and for transmitting custom character graphics.^[47]

Replacement C1 character sets

A specialized C1 control code set is registered for bibliographic use (including string collation), such as by MARC-8.^[22]^[50]^[51]
Various specialised C1 control code sets are registered for use by Videotex formats.^[21]
EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to Unicode (or to ISO 8859), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA).^[52]^[53] Although the New Line (NL) does translate to the ISO/IEC 6429 NEL (although it is often swapped with LF, following UNIX line ending convention),^[52] the remainder of the control codes do not correspond. For example, the EBCDIC control SPS and the ECMA-48 control PLU are both used to begin a superscript or end a subscript, but are not mapped to one another. Extended-ASCII-mapped EBCDIC can therefore be regarded as having its own C1 set, although it is not registered with the ISO-IR registry for ISO/IEC 2022.^[21]

Unicode

Unicode reserves the 65 code points described above for compatibility with the C0 and C1 control codes, giving them the general category Cc (control). These are:

U+0000–U+001F (C0 controls) and U+007F (DEL) assigned to the C0 Controls and Basic Latin block, and
U+0080–U+009F (C1 controls) assigned to the C1 Controls and Latin-1 Supplement block.

Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR, (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL.^[54] The rest of the codes are transparent to Unicode and their meanings are left to higher-level protocols, with ISO/IEC 6429 suggested as a default.^[54]

Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and the zero-width joiner and non-joiner for controlling ligature use. However these are given the general category Cf (format) rather than Cc.

Footnotes

↑ ISO/IEC 4873 extends this requirement to the C1 SS2 and SS3,^[41] although ISO/IEC 2022 itself does not.

Related Research Articles

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

<span class="mw-page-title-main">ISO/IEC 8859-1</span> Character encoding

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding was popular for users of Esperanto, but fell out of use as application support for Unicode became more common.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it. This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

ISO/IEC 8859-4:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 4: Latin alphabet No. 4, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-4 or North European. It was designed to cover Estonian, Latvian, Lithuanian, Greenlandic, and Sámi. It has been largely superseded by ISO/IEC 8859-10 and Unicode. Microsoft has assigned code page 28594 a.k.a. Windows-28594 to ISO-8859-4 in Windows. IBM has assigned code page 914 to ISO 8859-4.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-10:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1992. It is informally referred to as Latin-6. It was designed to cover the Nordic languages, deemed of more use for them than ISO 8859-4.

ISO/IEC 8859-13:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No. 7, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-7 or Baltic Rim. It was designed to cover the Baltic languages, and added characters used in Polish missing from the earlier encodings ISO 8859-4 and ISO 8859-10. Unlike these two, it does not cover the Nordic languages. It is similar to the earlier-published Windows-1257; its encoding of the Estonian alphabet also matches IBM-922.

ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 (Celtic), is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-8 or Celtic. It was designed to cover the Celtic languages, such as Irish, Manx, Scottish Gaelic, Welsh, Cornish, and Breton.

ISO/IEC 8859-16:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 16: Latin alphabet No. 10, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. The same encoding was defined as Romanian Standard SR 14111 in 1998, named the "Romanian Character Set for Information Interchange". It is informally referred to as Latin-10 or South-Eastern European. It was designed to cover Albanian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languages. It is also used by older versions of LDAP. While T.61 continues to be supported in modern versions of X.500 and X.509, it has been deprecated in favor of Unicode. It is also called Code page 1036, CP1036, or IBM 01036.

Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively. These are sometimes also called "Control-N" and "Control-O".

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

The MARC-8 charset is a MARC standard used in MARC-21 library records. The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in library database systems. The character encoding now known as MARC-8 was introduced in 1968 as part of the MARC format. Originally based on the Latin alphabet, from 1979 to 1983 the JACKPHY initiative expanded the repertoire to include Japanese, Arabic, Chinese, and Hebrew characters, with the later addition of Cyrillic and Greek scripts. If a character is not representable in MARC-8 of a MARC-21 record, then UTF-8 must be used instead. UTF-8 has support for many more characters than MARC-8, which is rarely used outside library data.

The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.

ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873.

References

↑ Fox, Brian. "Adding a new node to Info". Info: The online, menu-driven GNU documentation system. GNU Project.
↑ "Built-in Types § str.splitlines". The Python Standard Library. Python Software Foundation.
1 2 ISO/TC 97/SC 2 (1975). The set of control characters of the ISO 646 (PDF). ITSCJ/IPSJ. ISO-IR-1.{{citation}}: CS1 maint: numeric names: authors list (link)
1 2 3 IPTC (1995). The IPTC Recommended Message Format (PDF) (5th ed.). IPTC TEC 7901.
1 2 3 "end-of-transmission character (EOT)". Federal Standard 1037C . 1996. Archived from the original on 2016-03-09.
1 2 Robert McConnell; James Haynes; Richard Warren (December 2002). "Understanding ASCII Codes". NADCOMM.
↑ Williamson, Karl. "Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0".
1 2 3 Ken Whistler (July 20, 2011). "Formal Name Aliases for Control Characters, L2/11-281". Unicode Consortium.
1 2 3 4 "Name Aliases". Unicode Character Database. Unicode Consortium.
↑ "C0 Controls and Basic Latin" (PDF). Unicode Consortium.
↑ "charnames". Perl Programming Documentation.
↑ ECMA (1994). "7.3: Invocation of character-set code elements". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 14. ECMA-35.
↑ American Standards Association (1963). American Standard Code for Information Interchange: 4. Legend. p. 6. ASA X3.4-1963.
↑ "data link escape character (DLE)". Federal Standard 1037C . 1996. Archived from the original on 2016-08-01.
↑ "Supplementary transmission control functions (an extension of the basic mode control procedures for data communication systems)". European Computer Manufacturers Association. 1972. ECMA-37.
↑ "What is the point of Ctrl-S?". Unix and Linux Stack exchange. Retrieved 14 February 2019.
↑ ECMA/TC 1 (1973). "Brief History". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
1 2 3 ECMA/TC 1 (1971). "8.2: Correspondence between the 7-bit Code and an 8-bit Code". Extension of the 7-bit Coded Character Set (PDF) (1st ed.). ECMA. pp. 21–24. ECMA-35:1971.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ ECMA/TC 1 (1973). "4.2: Specific Control Characters". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. p. 16. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ ECMA/TC 1 (1985). "5.3.8: Sets of 96 graphic characters". Code Extension Techniques (PDF) (4th ed.). ECMA. pp. 17–18. ECMA-35:1985.{{citation}}: CS1 maint: numeric names: authors list (link)
1 2 3 ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences (PDF), ITSCJ/IPSJ, ISO-IR
1 2 DIN (1979-07-15). Additional Control Codes for Bibliographic Use according to German Standard DIN 31626 (PDF). ITSCJ/IPSJ. ISO-IR-40.
1 2 3 ISO/TC97/SC2 (1983-10-01). C1 Control Set of ISO 6429:1983 (PDF). ITSCJ/IPSJ. ISO-IR-77.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ ECMA/TC 1 (1979). "Brief History". Additional Control Functions for Character-Imaging I/O Devices (PDF) (2nd ed.). ECMA. ECMA-48:1979.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ "JIS X 02xx 符号" (in Japanese).
1 2 Ken Whistler (2015-10-05). "Why Nothing Ever Goes Away". Unicode Mailing List.
↑ ECMA/TC 1 (June 1991). Control Functions for Coded Character Sets (PDF) (5th ed.). ECMA. ECMA-48:1991.{{cite book}}: CS1 maint: numeric names: authors list (link)
1 2 ISO 6429:1983 Information processing — ISO 7-bit and 8-bit coded character sets — Additional control functions for character-imaging devices. ISO. 1983-05-01.
↑ ISO 6429:1988 Information processing — Control functions for 7-bit and 8-bit coded character sets. ISO. 1988-11-15.
1 2 ISO/IEC 6429:1992 Information technology — Control functions for coded character sets. ISO. 1992-12-15. Retrieved 2024-05-29.
↑ Lunde, Ken (2008). CJKV Information Processing: Chinese, Japanese, Korean, and Vietnamese Computing. O'Reilly. p. 244. ISBN 9780596800925.
↑ ECMA/TC 1 (December 1986). "Appendix E: Changes Made in this Edition". Control Functions for Coded Character Sets (PDF) (4th ed.). ECMA. ECMA-48:1986.{{cite book}}: CS1 maint: numeric names: authors list (link)
↑ ECMA/TC 1 (June 1991). "F.8 Eliminated control functions". Control Functions for Coded Character Sets (PDF) (5th ed.). ECMA. ECMA-48:1991.{{cite book}}: CS1 maint: numeric names: authors list (link)
↑ "VT100 Widget Resources (§ hpLowerleftBugCompat)". xterm - terminal emulator for X.
↑ Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Device-Control functions". XTerm Control Sequences.
1 2 Brender, Ronald F. (1989). "Ada 9x Project Report: Character Set Issues for Ada 9x". Carnegie Mellon University.
↑ Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Operating System Commands". XTerm Control Sequences.
↑ Frank da Cruz; Christine Gianone (1997). Using C-Kermit. Digital Press. p. 278. ISBN 978-1-55558-164-0.
1 2 3 ECMA (1994). "6.4.2: Primary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.
↑ ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C0 set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-104.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C1 Set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-105.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ ECMA (1994). "6.2: Fixed coded characters". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 7. ECMA-35.
↑ ECMA (1994). "6.4.3: Supplementary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.
↑ ITU (1985). Teletex Primary Set of Control Functions (PDF). ITSCJ/IPSJ. ISO-IR-106.
↑ Úřad pro normalizaci a měřeni (1987). The set of control characters of ISO 646, with EM replaced by SS2 (PDF). ITSCJ/IPSJ. ISO-IR-140.
↑ ISO/TC 97/SC 2 (1977). The set of control characters of ISO 646, with IS4 replaced by Single Shift for G2 (SS2) (PDF). ITSCJ/IPSJ. ISO-IR-36.{{citation}}: CS1 maint: numeric names: authors list (link)
1 2 ISO/TC97/SC2/WG6. "Liaison statement to ISO/TC97/SC2/WG8 and ISO/TC97/SC18/WG8" (PDF). ISO/TC97/SC2/WG6 N317.rev. Archived from the original (PDF) on 2020-10-26.{{cite web}}: CS1 maint: numeric names: authors list (link)
↑ ISO/TC 97/SC 2 (1982). The C0 set of Control Characters of Japanese Standard JIS C 6225-1979 (PDF). ITSCJ/IPSJ. ISO-IR-74.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ Printronix (2012). OKI® Programmer's Reference Manual (PDF). p. 26.
↑ ISO/TC 46 (1983-06-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-67.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ ISO/TC 46 (1986-02-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-124.{{citation}}: CS1 maint: numeric names: authors list (link)
1 2 Umamaheswaran, V.S. (1999-11-08). "3.3 Step 2: Byte Conversion". UTF-EBCDIC. Unicode Consortium. Unicode Technical Report #16. The 64 control characters […], the ASCII DELETE character (U+007F)[…] are mapped respecting EBCDIC conventions, as defined in IBM Character Data Representation Architecture, CDRA, with one exception -- the pairing of EBCDIC Line Feed and New Line control characters are swapped from their CDRA default pairings to ISO/IEC 6429 Line Feed (U+000A) and Next Line (U+0085) control characters
↑ Steele, Shawn (1996-04-24). cp037_IBMUSCanada to Unicode table. Microsoft/Unicode Consortium.
1 2 "23.1: Control Codes" (PDF). The Unicode Standard (15.0.0 ed.). Unicode Consortium. 2022. pp. 914–916. ISBN 978-1-936213-32-0.

The Unicode Standard
- C0 Controls and Basic Latin
- C1 Controls and Latin-1 Supplement
- Control Pictures
- The Unicode Standard, Version 6.1.0, Chapter 16: Special Areas and Format Characters
ATIS Telecom Glossary 2007
De litteris regentibus C1 quaestiones septem or Are C1 characters legal in XHTML 1.0?
W3C I18N FAQ: HTML, XHTML, XML and Control Codes
International register of coded character sets to be used with escape sequences

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[7] Teletype labelled the key WRU for 'who are you?'^[6]

[13] The name BELL is assigned by Unicode to the unrelated emoji character 🔔 (U+1F514). While C0 and C1 control characters were not formally named by the Unicode standard itself at the time, this collided with existing use of BELL as the name of this control character in software following the previous versions of UTS#18 (the Unicode Regular Expressions standard),^[7] e.g. in Perl.^[8] Unicode now accepts ALERT and BEL (but not BELL) as formal aliases for the control character,^[9] although the code chart still lists BELL as the ISO 6429 alias,^[10] and the corresponding control picture code point is called SYMBOL FOR BELL. Perl subsequently switched to using BELL for the emoji in version 5.18.^[11]

[lockingshifts-15] 1 2 ISO/IEC 2022 (ECMA-35) refers to these as LS0 and LS1 in 8-bit environments, and as SI and SO in 7-bit environments.^[12]

[17] The first, 1963 edition of ASCII classified DLE as a device control, rather than a transmission control, and gave it the abbreviation DC0 ("device control reserved for data link escape").^[13]

[lower-alpha-21] The '\e' escape sequence is not part of ISO C and many other language specifications. However, it is understood by several compilers, including GCC.

[25] In early versions the range excluded SP and DEL^[19]

[notpart-37] 1 2 3 Not part of ISO/IEC 6429 (ECMA-48)^[8]^[26]^[28]^: 4^[29]^: 5^[30]^: 8

[notfirst-39] 1 2 3 4 Not part of the first edition of ISO/IEC 6429.^[23]^[28]^: 4

[42] Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429^[30]^: 87 (1986^[32] and 1991^[33] respectively for ECMA-48).

[51] ISO/IEC 4873 extends this requirement to the C1 SS2 and SS3,^[41] although ISO/IEC 2022 itself does not.

[1] Fox, Brian. "Adding a new node to Info". Info: The online, menu-driven GNU documentation system. GNU Project.

[2] "Built-in Types § str.splitlines". The Python Standard Library. Python Software Foundation.

[ir001-3] 1 2 ISO/TC 97/SC 2 (1975). The set of control characters of the ISO 646 (PDF). ITSCJ/IPSJ. ISO-IR-1.{{citation}}: CS1 maint: numeric names: authors list (link)

[iptc7901-4] 1 2 3 IPTC (1995). The IPTC Recommended Message Format (PDF) (5th ed.). IPTC TEC 7901.

[1037-EOT-5] 1 2 3 "end-of-transmission character (EOT)". Federal Standard 1037C . 1996. Archived from the original on 2016-03-09.

[teletype-6] 1 2 Robert McConnell; James Haynes; Richard Warren (December 2002). "Understanding ASCII Codes". NADCOMM.

[8] Williamson, Karl. "Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0".

[Whistler2011-9] 1 2 3 Ken Whistler (July 20, 2011). "Formal Name Aliases for Control Characters, L2/11-281". Unicode Consortium.

[aliases-10] 1 2 3 4 "Name Aliases". Unicode Character Database. Unicode Consortium.

[11] "C0 Controls and Basic Latin" (PDF). Unicode Consortium.

[12] "charnames". Perl Programming Documentation.

[lockingshifts-14] ECMA (1994). "7.3: Invocation of character-set code elements". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 14. ECMA-35.

[16] American Standards Association (1963). American Standard Code for Information Interchange: 4. Legend. p. 6. ASA X3.4-1963.

[18] "data link escape character (DLE)". Federal Standard 1037C . 1996. Archived from the original on 2016-08-01.

[19] "Supplementary transmission control functions (an extension of the basic mode control procedures for data communication systems)". European Computer Manufacturers Association. 1972. ECMA-37.

[20] "What is the point of Ctrl-S?". Unix and Linux Stack exchange. Retrieved 14 February 2019.

[22] ECMA/TC 1 (1973). "Brief History". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)

[firstecma35-23] 1 2 3 ECMA/TC 1 (1971). "8.2: Correspondence between the 7-bit Code and an 8-bit Code". Extension of the 7-bit Coded Character Set (PDF) (1st ed.). ECMA. pp. 21–24. ECMA-35:1971.{{citation}}: CS1 maint: numeric names: authors list (link)

[24] ECMA/TC 1 (1973). "4.2: Specific Control Characters". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. p. 16. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)

[new96escs-26] ECMA/TC 1 (1985). "5.3.8: Sets of 96 graphic characters". Code Extension Techniques (PDF) (4th ed.). ECMA. pp. 17–18. ECMA-35:1985.{{citation}}: CS1 maint: numeric names: authors list (link)

[iso-ir-27] 1 2 3 ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences (PDF), ITSCJ/IPSJ, ISO-IR

[din31626-28] 1 2 DIN (1979-07-15). Additional Control Codes for Bibliographic Use according to German Standard DIN 31626 (PDF). ITSCJ/IPSJ. ISO-IR-40.

[1stE-29] 1 2 3 ISO/TC97/SC2 (1983-10-01). C1 Control Set of ISO 6429:1983 (PDF). ITSCJ/IPSJ. ISO-IR-77.{{citation}}: CS1 maint: numeric names: authors list (link)

[30] ECMA/TC 1 (1979). "Brief History". Additional Control Functions for Character-Imaging I/O Devices (PDF) (2nd ed.). ECMA. ECMA-48:1979.{{citation}}: CS1 maint: numeric names: authors list (link)

[31] "JIS X 02xx 符号" (in Japanese).

[Whistler2015-32] 1 2 Ken Whistler (2015-10-05). "Why Nothing Ever Goes Away". Unicode Mailing List.

[ecma-48-33] ECMA/TC 1 (June 1991). Control Functions for Coded Character Sets (PDF) (5th ed.). ECMA. ECMA-48:1991.{{cite book}}: CS1 maint: numeric names: authors list (link)

[ISO_6429:1983-34] 1 2 ISO 6429:1983 Information processing — ISO 7-bit and 8-bit coded character sets — Additional control functions for character-imaging devices. ISO. 1983-05-01.

[ISO_6429:1988-35] ISO 6429:1988 Information processing — Control functions for 7-bit and 8-bit coded character sets. ISO. 1988-11-15.

[ISO/IEC_6429:1992-36] 1 2 ISO/IEC 6429:1992 Information technology — Control functions for coded character sets. ISO. 1992-12-15. Retrieved 2024-05-29.

[38] Lunde, Ken (2008). CJKV Information Processing: Chinese, Japanese, Korean, and Vietnamese Computing. O'Reilly. p. 244. ISBN 9780596800925.

[40] ECMA/TC 1 (December 1986). "Appendix E: Changes Made in this Edition". Control Functions for Coded Character Sets (PDF) (4th ed.). ECMA. ECMA-48:1986.{{cite book}}: CS1 maint: numeric names: authors list (link)

[41] ECMA/TC 1 (June 1991). "F.8 Eliminated control functions". Control Functions for Coded Character Sets (PDF) (5th ed.). ECMA. ECMA-48:1991.{{cite book}}: CS1 maint: numeric names: authors list (link)

[43] "VT100 Widget Resources (§ hpLowerleftBugCompat)". xterm - terminal emulator for X.

[44] Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Device-Control functions". XTerm Control Sequences.

[ada-45] 1 2 Brender, Ronald F. (1989). "Ada 9x Project Report: Character Set Issues for Ada 9x". Carnegie Mellon University.

[46] Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Operating System Commands". XTerm Control Sequences.

[CruzGianone1997-47] Frank da Cruz; Christine Gianone (1997). Using C-Kermit. Digital Press. p. 278. ISBN 978-1-55558-164-0.

[tc-c0-48] 1 2 3 ECMA (1994). "6.4.2: Primary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.

[49] ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C0 set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-104.{{citation}}: CS1 maint: numeric names: authors list (link)

[50] ISO/TC97/SC2/WG-7; ECMA (1985-08-01). Minimum C1 Set for ISO 4873 (PDF). ITSCJ/IPSJ. ISO-IR-105.{{citation}}: CS1 maint: numeric names: authors list (link)

[52] ECMA (1994). "6.2: Fixed coded characters". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 7. ECMA-35.

[tc-c1-53] ECMA (1994). "6.4.3: Supplementary sets of coded control functions". Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.). p. 11. ECMA-35.

[T61C0-54] ITU (1985). Teletex Primary Set of Control Functions (PDF). ITSCJ/IPSJ. ISO-IR-106.

[55] Úřad pro normalizaci a měřeni (1987). The set of control characters of ISO 646, with EM replaced by SS2 (PDF). ITSCJ/IPSJ. ISO-IR-140.

[56] ISO/TC 97/SC 2 (1977). The set of control characters of ISO 646, with IS4 replaced by Single Shift for G2 (SS2) (PDF). ITSCJ/IPSJ. ISO-IR-36.{{citation}}: CS1 maint: numeric names: authors list (link)

[wg6-57] 1 2 ISO/TC97/SC2/WG6. "Liaison statement to ISO/TC97/SC2/WG8 and ISO/TC97/SC18/WG8" (PDF). ISO/TC97/SC2/WG6 N317.rev. Archived from the original (PDF) on 2020-10-26.{{cite web}}: CS1 maint: numeric names: authors list (link)

[58] ISO/TC 97/SC 2 (1982). The C0 set of Control Characters of Japanese Standard JIS C 6225-1979 (PDF). ITSCJ/IPSJ. ISO-IR-74.{{citation}}: CS1 maint: numeric names: authors list (link)

[59] Printronix (2012). OKI® Programmer's Reference Manual (PDF). p. 26.

[iso6630-old-60] ISO/TC 46 (1983-06-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-67.{{citation}}: CS1 maint: numeric names: authors list (link)

[iso6630-1985-61] ISO/TC 46 (1986-02-01). Additional Control Codes for Bibliographic Use according to International Standard ISO 6630 (PDF). ITSCJ/IPSJ. ISO-IR-124.{{citation}}: CS1 maint: numeric names: authors list (link)

[utr16cdra-62] 1 2 Umamaheswaran, V.S. (1999-11-08). "3.3 Step 2: Byte Conversion". UTF-EBCDIC. Unicode Consortium. Unicode Technical Report #16. The 64 control characters […], the ASCII DELETE character (U+007F)[…] are mapped respecting EBCDIC conventions, as defined in IBM Character Data Representation Architecture, CDRA, with one exception -- the pairing of EBCDIC Line Feed and New Line control characters are swapped from their CDRA default pairings to ISO/IEC 6429 Line Feed (U+000A) and Next Line (U+0085) control characters

[ms037-63] Steele, Shawn (1996-04-24). cp037_IBMUSCanada to Unicode table. Microsoft/Unicode Consortium.

[unicode-23-1-64] 1 2 "23.1: Control Codes" (PDF). The Unicode Standard (15.0.0 ed.). Unicode Consortium. 2022. pp. 914–916. ISBN 978-1-936213-32-0.

[1]

[2]

[3]

[4]

[5]

[lower-alpha 1]

[lower-alpha 2]

[lower-alpha 3]

[lower-alpha 4]

[14]

[15]

[6]

[16]

[lower-alpha 5]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[17]

[18]

[lower-alpha 1]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[lower-alpha 2]

[31]

[lower-alpha 3]

[lower-alpha 4]

[34]

[35]

[36]

[37]

[38]

[19]

[28]

[29]

[30]

[32]

[33]

[39]

[40]

[lower-alpha 1]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[41]

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Korean Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Barents Cyrillic Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 668 708 720 737 770 773 775 776 777 778 850 851 852 853 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 897 899 903 904 932 936 942 949 950 951 1034 1040 1042 1043 1044 1098 1115 1116 1117 1118 1127 3846 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1124 1133
Windows code pages	CER-GS 932 936 (GBK) 950 1169 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + Finnish Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1056 1057 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets