In Unicode, characters can have a unique name. A character can also have one or more alias names. An alias name can be an abbreviation, a C0 or C1 control name, a correction, an alternate name or a figment. An alias too is unique over all names and aliases, and therefore identifying.
The formal, primary Unicode name is unique over all names, only uses certain characters & format, and is guaranteed never to change. The formal name consists of characters A–Z (uppercase), 0–9, " " (space), and "-" (hyphen). Next to this name, a character can have one or more formal (normative) alias names. Such an alias name also follows the rules of a name: characters used (A-Z, -, 0-9, <space>) and not used (a-z, %, $, etc.). Alias names are also unique in the full name set (that is, all names and alias names are all unique in their combined set). Alias names are formally described in the Unicode Standard. [1] [2] In this sense, an abbreviation is also considered a Unicode name.
There are five possible reasons to assign an alias name to a code point. [1] A character can have multiple aliases: for example U+0008<control-0008> has control alias BACKSPACE and abbreviation alias BS.
Code point | HTML decimal | Name or <label> | Alias | Reason | Chart | Note | |
---|---|---|---|---|---|---|---|
Abbr | Name | ||||||
U+0000 | � | <control-0000> | NUL | NULL | Control | C0 Controls and Basic Latin (pdf) | |
U+0001 |  | <control-0001> | SOH | START OF HEADING | Control | C0 Controls and Basic Latin (pdf) | |
U+0002 |  | <control-0002> | STX | START OF TEXT | Control | C0 Controls and Basic Latin (pdf) | |
U+0003 |  | <control-0003> | ETX | END OF TEXT | Control | C0 Controls and Basic Latin (pdf) | |
U+0004 |  | <control-0004> | EOT | END OF TRANSMISSION | Control | C0 Controls and Basic Latin (pdf) | |
U+0005 |  | <control-0005> | ENQ | ENQUIRY | Control | C0 Controls and Basic Latin (pdf) | |
U+0006 |  | <control-0006> | ACK | ACKNOWLEDGE | Control | C0 Controls and Basic Latin (pdf) | |
U+0007 |  | <control-0007> | BEL | ALERT | Control | C0 Controls and Basic Latin (pdf) | |
U+0008 |  | <control-0008> | BS | BACKSPACE | Control | C0 Controls and Basic Latin (pdf) | |
U+0009 | 	 	 | <control-0009> | TAB | CHARACTER TABULATION | Control | C0 Controls and Basic Latin (pdf) | |
HT | HORIZONTAL TABULATION | Control | |||||
U+000A | | <control-000A> | LF | LINE FEED | Control | C0 Controls and Basic Latin (pdf) | |
NL | NEW LINE | Control | |||||
EOL | END OF LINE | Control | |||||
U+000B |  | <control-000B> | LINE TABULATION | Control | C0 Controls and Basic Latin (pdf) | ||
VT | VERTICAL TABULATION | Control | |||||
U+000C |  | <control-000C> | FF | FORM FEED | Control | C0 Controls and Basic Latin (pdf) | |
U+000D | | <control-000D> | CR | CARRIAGE RETURN | Control | C0 Controls and Basic Latin (pdf) | |
U+000E |  | <control-000E> | SO | SHIFT OUT | Control | C0 Controls and Basic Latin (pdf) | |
LOCKING-SHIFT ONE | Control | ||||||
U+000F |  | <control-000F> | SI | SHIFT IN | Control | C0 Controls and Basic Latin (pdf) | |
LOCKING-SHIFT ZERO | Control | ||||||
U+0010 |  | <control-0010> | DLE | DATA LINK ESCAPE | Control | C0 Controls and Basic Latin (pdf) | |
U+0011 |  | <control-0011> | DC1 | DEVICE CONTROL ONE | Control | C0 Controls and Basic Latin (pdf) | |
U+0012 |  | <control-0012> | DC2 | DEVICE CONTROL TWO | Control | C0 Controls and Basic Latin (pdf) | |
U+0013 |  | <control-0013> | DC3 | DEVICE CONTROL THREE | Control | C0 Controls and Basic Latin (pdf) | |
U+0014 |  | <control-0014> | DC4 | DEVICE CONTROL FOUR | Control | C0 Controls and Basic Latin (pdf) | |
U+0015 |  | <control-0015> | NAK | NEGATIVE ACKNOWLEDGE | Control | C0 Controls and Basic Latin (pdf) | |
U+0016 |  | <control-0016> | SYN | SYNCHRONOUS IDLE | Control | C0 Controls and Basic Latin (pdf) | |
U+0017 |  | <control-0017> | ETB | END OF TRANSMISSION BLOCK | Control | C0 Controls and Basic Latin (pdf) | |
U+0018 |  | <control-0018> | CAN | CANCEL | Control | C0 Controls and Basic Latin (pdf) | |
U+0019 |  | <control-0019> | EOM | END OF MEDIUM | Control | C0 Controls and Basic Latin (pdf) | |
EM | Abbreviation | added in version 15.0 | |||||
U+001A |  | <control-001A> | SUB | SUBSTITUTE | Control | C0 Controls and Basic Latin (pdf) | |
U+001B |  | <control-001B> | ESC | ESCAPE | Control | C0 Controls and Basic Latin (pdf) | |
U+001C |  | <control-001C> | INFORMATION SEPARATOR FOUR | Control | C0 Controls and Basic Latin (pdf) | ||
FS | FILE SEPARATOR | Control | |||||
U+001D |  | <control-001D> | INFORMATION SEPARATOR THREE | Control | C0 Controls and Basic Latin (pdf) | ||
GS | GROUP SEPARATOR | Control | |||||
U+001E |  | <control-001E> | INFORMATION SEPARATOR TWO | Control | C0 Controls and Basic Latin (pdf) | ||
RS | RECORD SEPARATOR | Control | |||||
U+001F |  | <control-001F> | INFORMATION SEPARATOR ONE | Control | C0 Controls and Basic Latin (pdf) | ||
US | UNIT SEPARATOR | Control | |||||
U+0020 |   | SPACE | SP | Abbreviation | C0 Controls and Basic Latin (pdf) | ||
U+007F |  | <control-007F> | DEL | DELETE | Control | C0 Controls and Basic Latin (pdf) | |
U+0080 | € | <control-0080> | PAD | PADDING CHARACTER | Figment | C1 Controls and Latin-1 Supplement (pdf) | Aliases are not widely published by Unicode; chart shows non-unique XXX |
U+0081 |  | <control-0081> | HOP | HIGH OCTET PRESET | Figment | C1 Controls and Latin-1 Supplement (pdf) | Aliases are not widely published by Unicode; chart shows non-unique XXX |
U+0082 | ‚ | <control-0082> | BPH | BREAK PERMITTED HERE | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0083 | ƒ | <control-0083> | NBH | NO BREAK HERE | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0084 | „ | <control-0084> | IND | INDEX | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0085 | … | <control-0085> | NEL | NEXT LINE | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0086 | † | <control-0086> | SSA | START OF SELECTED AREA | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0087 | ‡ | <control-0087> | ESA | END OF SELECTED AREA | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0088 | ˆ | <control-0088> | CHARACTER TABULATION SET | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
HTS | HORIZONTAL TABULATION SET | Control | |||||
U+0089 | ‰ | <control-0089> | CHARACTER TABULATION WITH JUSTIFICATION | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
HTJ | HORIZONTAL TABULATION WITH JUSTIFICATION | Control | |||||
U+008A | Š | <control-008A> | LINE TABULATION SET | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
VTS | VERTICAL TABULATION SET | Control | |||||
U+008B | ‹ | <control-008B> | PARTIAL LINE FORWARD | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
PLD | PARTIAL LINE DOWN | Control | |||||
U+008C | Œ | <control-008C> | PARTIAL LINE BACKWARD | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
PLU | PARTIAL LINE UP | Control | |||||
U+008D |  | <control-008D> | REVERSE LINE FEED | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
RI | REVERSE INDEX | Control | |||||
U+008E | Ž | <control-008E> | SINGLE SHIFT TWO | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
SS2 | SINGLE-SHIFT-2 | Control | |||||
U+008F |  | <control-008F> | SINGLE SHIFT THREE | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
SS3 | SINGLE-SHIFT-3 | Control | |||||
U+0090 |  | <control-0090> | DCS | DEVICE CONTROL STRING | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0091 | ‘ | <control-0091> | PRIVATE USE ONE | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
PU1 | PRIVATE USE-1 | Control | |||||
U+0092 | ’ | <control-0092> | PRIVATE USE TWO | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
PU2 | PRIVATE USE-2 | Control | |||||
U+0093 | “ | <control-0093> | STS | SET TRANSMIT STATE | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0094 | ” | <control-0094> | CCH | CANCEL CHARACTER | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0095 | • | <control-0095> | MW | MESSAGE WAITING | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0096 | – | <control-0096> | START OF GUARDED AREA | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
SPA | START OF PROTECTED AREA | Control | |||||
U+0097 | — | <control-0097> | END OF GUARDED AREA | Control | C1 Controls and Latin-1 Supplement (pdf) | ||
EPA | END OF PROTECTED AREA | Control | |||||
U+0098 | ˜ | <control-0098> | SOS | START OF STRING | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+0099 | ™ | <control-0099> | SGC | SINGLE GRAPHIC CHARACTER INTRODUCER | Figment | C1 Controls and Latin-1 Supplement (pdf) | Aliases are not widely published by Unicode; chart shows non-unique XXX |
U+009A | š | <control-009A> | SCI | SINGLE CHARACTER INTRODUCER | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+009B | › | <control-009B> | CSI | CONTROL SEQUENCE INTRODUCER | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+009C | œ | <control-009C> | ST | STRING TERMINATOR | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+009D |  | <control-009D> | OSC | OPERATING SYSTEM COMMAND | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+009E | ž | <control-009E> | PM | PRIVACY MESSAGE | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+009F | Ÿ | <control-009F> | APC | APPLICATION PROGRAM COMMAND | Control | C1 Controls and Latin-1 Supplement (pdf) | |
U+00A0 |     | NO-BREAK SPACE | NBSP | Abbreviation | C1 Controls and Latin-1 Supplement (pdf) | ||
U+00AD | ­ ­ | SOFT HYPHEN | SHY | Abbreviation | C1 Controls and Latin-1 Supplement (pdf) | ||
U+01A2 | Ƣ | LATIN CAPITAL LETTER OI | LATIN CAPITAL LETTER GHA | ※ Correction | Latin Extended-B (pdf) | ||
U+01A3 | ƣ | LATIN SMALL LETTER OI | LATIN SMALL LETTER GHA | ※ Correction | Latin Extended-B (pdf) | ||
U+034F | ͏ | COMBINING GRAPHEME JOINER | CGJ | Abbreviation | Combining Diacritical Marks (pdf) | The name of this character is misleading; it does not actually join graphemes | |
U+0616 | ؖ | ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH | ARABIC SMALL HIGH LIGATURE ALEF WITH YEH BARREE | ※ Correction | Arabic | added in version 15.0 | |
U+061C | ؜ | ARABIC LETTER MARK | ALM | Abbreviation | Arabic (pdf) | See RLM | |
U+0709 | ܉ | SYRIAC SUBLINEAR COLON SKEWED RIGHT | SYRIAC SUBLINEAR COLON SKEWED LEFT | ※ Correction | Syriac (pdf) | ||
U+0CDE | ೞ | KANNADA LETTER FA | KANNADA LETTER LLLA | ※ Correction | Kannada (pdf) | ||
U+0E9D | ຝ | LAO LETTER FO TAM | LAO LETTER FO FON | ※ Correction | Lao (pdf) | ||
U+0E9F | ຟ | LAO LETTER FO SUNG | LAO LETTER FO FAY | ※ Correction | Lao (pdf) | ||
U+0EA3 | ຣ | LAO LETTER LO LING | LAO LETTER RO | ※ Correction | Lao (pdf) | ||
U+0EA5 | ລ | LAO LETTER LO LOOT | LAO LETTER LO | ※ Correction | Lao (pdf) | ||
U+0FD0 | ࿐ | TIBETAN MARK BSKA- SHOG GI MGO RGYAN | TIBETAN MARK BKA- SHOG GI MGO RGYAN | ※ Correction | Tibetan (pdf) | ||
U+11EC | ᇬ | HANGUL JONGSEONG IEUNG-KIYEOK | HANGUL JONGSEONG YESIEUNG-KIYEOK | ※ Correction | Hangul Jamo (pdf) | ||
U+11ED | ᇭ | HANGUL JONGSEONG IEUNG-SSANGKIYEOK | HANGUL JONGSEONG YESIEUNG-SSANGKIYEOK | ※ Correction | Hangul Jamo (pdf) | ||
U+11EE | ᇮ | HANGUL JONGSEONG SSANGIEUNG | HANGUL JONGSEONG SSANGYESIEUNG | ※ Correction | Hangul Jamo (pdf) | ||
U+11EF | ᇯ | HANGUL JONGSEONG IEUNG-KHIEUKH | HANGUL JONGSEONG YESIEUNG-KHIEUKH | ※ Correction | Hangul Jamo (pdf) | ||
U+180B | ᠋ | MONGOLIAN FREE VARIATION SELECTOR ONE | FVS1 | Abbreviation | Mongolian (pdf) | ||
U+180C | ᠌ | MONGOLIAN FREE VARIATION SELECTOR TWO | FVS2 | Abbreviation | Mongolian (pdf) | ||
U+180D | ᠍ | MONGOLIAN FREE VARIATION SELECTOR THREE | FVS3 | Abbreviation | Mongolian (pdf) | ||
U+180E | ᠎ | MONGOLIAN VOWEL SEPARATOR | MVS | Abbreviation | Mongolian (pdf) | ||
U+180F | ᠏ | MONGOLIAN FREE VARIATION SELECTOR FOUR | FVS4 | Abbreviation | Mongolian (pdf) | ||
U+1BBD | ᮽ | SUNDANESE LETTER BHA | SUNDANESE LETTER ARCHAIC I | ※ Correction | Sudanese (pdf) | added in version 15.0 | |
U+200B | ​ ​ ​ ​ ​ ​ | ZERO WIDTH SPACE | ZWSP | Abbreviation | General Punctuation (pdf) | ||
U+200C | ‌ ‌ | ZERO WIDTH NON-JOINER | ZWNJ | Abbreviation | General Punctuation (pdf) | ||
U+200D | ‍ ‍ | ZERO WIDTH JOINER | ZWJ | Abbreviation | General Punctuation (pdf) | ||
U+200E | ‎ ‎ | LEFT-TO-RIGHT MARK | LRM | Abbreviation | General Punctuation (pdf) | ||
U+200F | ‏ ‏ | RIGHT-TO-LEFT MARK | RLM | Abbreviation | General Punctuation (pdf) | ||
U+202A | ‪ | LEFT-TO-RIGHT EMBEDDING | LRE | Abbreviation | General Punctuation (pdf) | ||
U+202B | ‫ | RIGHT-TO-LEFT EMBEDDING | RLE | Abbreviation | General Punctuation (pdf) | ||
U+202C | ‬ | POP DIRECTIONAL FORMATTING | PDF | Abbreviation | General Punctuation (pdf) | ||
U+202D | ‭ | LEFT-TO-RIGHT OVERRIDE | LRO | Abbreviation | General Punctuation (pdf) | ||
U+202E | ‮ | RIGHT-TO-LEFT OVERRIDE | RLO | Abbreviation | General Punctuation (pdf) | ||
U+202F |   | NARROW NO-BREAK SPACE | NNBSP | Abbreviation | General Punctuation (pdf) | ||
U+205F |     | MEDIUM MATHEMATICAL SPACE | MMSP | Abbreviation | General Punctuation (pdf) | ||
U+2060 | ⁠ ⁠ | WORD JOINER | WJ | Abbreviation | General Punctuation (pdf) | ||
U+2066 | ⁦ | LEFT-TO-RIGHT ISOLATE | LRI | Abbreviation | General Punctuation (pdf) | ||
U+2067 | ⁧ | RIGHT-TO-LEFT ISOLATE | RLI | Abbreviation | General Punctuation (pdf) | ||
U+2068 | ⁨ | FIRST STRONG ISOLATE | FSI | Abbreviation | General Punctuation (pdf) | ||
U+2069 | ⁩ | POP DIRECTIONAL ISOLATE | PDI | Abbreviation | General Punctuation (pdf) | ||
U+2118 | ℘ ℘ ℘ | SCRIPT CAPITAL P | WEIERSTRASS ELLIPTIC FUNCTION | ※ Correction | Letterlike Symbols (pdf) | ||
U+2448 | ⑈ | OCR DASH | MICR ON US SYMBOL | ※ Correction | Optical Character Recognition (pdf) | ||
U+2449 | ⑉ | OCR CUSTOMER ACCOUNT NUMBER | MICR DASH SYMBOL | ※ Correction | Optical Character Recognition (pdf) | ||
U+2B7A | ⭺ | LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE | LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE | ※ Correction | Miscellaneous Symbols and Arrows (pdf) | ||
U+2B7C | ⭼ | RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE | RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE | ※ Correction | Miscellaneous Symbols and Arrows (pdf) | ||
U+A015 | ꀕ | YI SYLLABLE WU | YI SYLLABLE ITERATION MARK | ※ Correction | Yi Syllables (pdf) | ||
U+AA6E | ꩮ | MYANMAR LETTER KHAMTI HHA | MYANMAR LETTER KHAMTI LLA | ※ Correction | Myanmar Extended-A (pdf) | ||
U+FE00 ... U+FE0F | ︀ ... ️ | VARIATION SELECTOR-1 ... VARIATION SELECTOR-16 | VS1 ... VS16 | Abbreviation | Variation Selectors (pdf) | ||
(16 code points) | |||||||
Abbreviation | |||||||
U+FE18 | ︘ | PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET | PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET | ※ Correction | Vertical Forms (pdf) | ||
U+FEFF |  | ZERO WIDTH NO-BREAK SPACE | BOM | BYTE ORDER MARK | Alternate | Arabic Presentation Forms-B (pdf) | |
ZWNBSP | Abbreviation | ||||||
U+122D4 | 𒋔 | CUNEIFORM SIGN SHIR TENU | CUNEIFORM SIGN NU11 TENU | ※ Correction | Cuneiform (pdf) | ||
U+122D5 | 𒋕 | CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR | CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR | ※ Correction | Cuneiform (pdf) | ||
U+12327 | 𒌧 | CUNEIFORM SIGN UN GUNU | CUNEIFORM SIGN KALAM | ※ Correction | Cuneiform (pdf) | ||
U+1680B | 𖠋 | BAMUM LETTER PHASE-A MAEMBGBIEE | BAMUM LETTER PHASE-A MAEMGBIEE | ※ Correction | Bamum Supplement (pdf) | ||
U+16E56 | 𖹖 | MEDEFAIDRIN CAPITAL LETTER HP | MEDEFAIDRIN CAPITAL LETTER H | ※ Correction | Medefaidrin (pdf) | ||
U+16E57 | 𖹗 | MEDEFAIDRIN CAPITAL LETTER NY | MEDEFAIDRIN CAPITAL LETTER NG | ※ Correction | Medefaidrin (pdf) | ||
U+16E76 | 𖹶 | MEDEFAIDRIN SMALL LETTER HP | MEDEFAIDRIN SMALL LETTER H | ※ Correction | Medefaidrin (pdf) | ||
U+16E77 | 𖹷 | MEDEFAIDRIN SMALL LETTER NY | MEDEFAIDRIN SMALL LETTER NG | ※ Correction | Medefaidrin (pdf) | ||
U+1B001 | 𛀁 | HIRAGANA LETTER ARCHAIC YE | HENTAIGANA LETTER E-1 | ※ Correction | Kana Supplement (pdf) | ||
U+1D0C5 | 𝃅 | BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS | BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS | ※ Correction | Byzantine Musical Symbols (pdf) | ||
U+1E899 | 𞢙 | MENDE KIKAKUI SYLLABLE M172 MBOO | MENDE KIKAKUI SYLLABLE M172 MBO | ※ Correction | Mende Kikakui (pdf) | ||
U+1E89A | 𞢚 | MENDE KIKAKUI SYLLABLE M174 MBO | MENDE KIKAKUI SYLLABLE M174 MBOO | ※ Correction | Mende Kikakui (pdf) | ||
U+E0100 ... U+E01EF | 󠄀 ... 󠇯 | VARIATION SELECTOR-17 ... VARIATION SELECTOR-256 | VS17 ... VS256 | Abbreviation | Variation Selectors Supplement (pdf) | ||
(240 code points) | |||||||
Abbreviation |
The Unicode standard also uses and publishes alternative names that are not formal, and are not listed as normative alias names. These labels may not be unique and may use irregular characters in their name. They are used in Unicode code charts, for example U+070F SYRIAC ABBREVIATION MARK: SAM. [3]
In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.
Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.
Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 of the standard defines 154998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts.
The tab keyTab ↹ on a keyboard is used to advance the cursor to the next tab stop.
ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is designated ECMA-128 by Ecma International and TS 5881 as a Turkish standard. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for the replacement of six Icelandic characters with characters unique to the Turkish alphabet. And the uppercase of i is İ; the lowercase of I is ı.
ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.
T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languages. It is also used by older versions of LDAP. While T.61 continues to be supported in modern versions of X.500 and X.509, it has been deprecated in favor of Unicode. It is also called Code page 1036, CP1036, or IBM 01036.
A bell character is a device control code originally sent to ring a small electromechanical bell on tickers and other teleprinters and teletypewriters to alert operators at the other end of the line, often of an incoming message. Though tickers punched the bell codes into their tapes, printers generally do not print a character when the bell code is received. Bell codes are usually represented by the label "BEL
". They have been used since 1870.
The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.
Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.
The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.
The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.
The Unicode Standard assigns various properties to each Unicode character and code point.
This article describes and classifies the Unicode characters that may validly appear in XML.
The ZX Spectrum character set is the variant of ASCII used in the ZX Spectrum family computers. It is based on ASCII-1967 but the characters ^, ` and DEL
are replaced with ↑, £ and ©. It also differs in its use of the C0 control codes other than the common BS
and CR
, and it makes use of the 128 high-bit characters beyond the ASCII range. The ZX Spectrum's main set of printable characters and system font are also used by the Jupiter Ace computer.
Microsoft Windows code page 932, also called Windows-31J amongst other names, is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.
VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.
This article covers technical details of the character encoding system defined by ETS 300 706 of the ETSI, a standard for World System Teletext, and used for the Viewdata and Teletext variants of Videotex in Europe.