Unicode alias names and abbreviations

Last updated

In Unicode, characters can have a unique name. A character can also have one or more alias names. An alias name can be an abbreviation, a C0 or C1 control name, a correction, an alternate name or a figment. An alias too is unique over all names and aliases, and therefore identifying.

Contents

Background

The formal, primary Unicode name is unique over all names, only uses certain characters & format, and is guaranteed never to change. The formal name consists of characters AZ (uppercase), 09, " " (space), and "-" (hyphen). Next to this name, a character can have one or more formal (normative) alias names. Such an alias name also follows the rules of a name: characters used (A-Z, -, 0-9, <space>) and not used (a-z, %, $, etc.). Alias names are also unique in the full name set (that is, all names and alias names are all unique in their combined set). Alias names are formally described in the Unicode Standard. [1] [2] In this sense, an abbreviation is also considered a Unicode name.

Reason to add an alias

There are five possible reasons to assign an alias name to a code point. [1] A character can have multiple aliases: for example U+0008<control-0008> has control alias BACKSPACE and abbreviation alias BS.

1. Abbreviation
Commonly occurring abbreviations (or acronyms) for control codes, format characters, spaces, and variation selectors.
There are 354 such aliases, including 256 aliases for variant selectors (VS-1 ... VS-256).
For example, U+00A0 NO-BREAK SPACE has alias NBSP.
Presentation: in the code charts, the abbreviation is shown in a dashed box:
NBSP
.
2. Control
ISO 6429 names for C0 and C1 control functions and similar commonly occurring names, are added as an alias to the character.
There are 84 such aliases.
For example, U+0008<control-0008> has alias BACKSPACE.
Presentation: Control characters do not have a primary name, they are labeled like <control-0008>. Its alias name like BACKSPACE is used in the chart documentation, but never as a primary name. This prevents unintended (automated) replacement by the actual, disrupting control character. For example, using alias name BEL in line would be replaced by U+0007<control-0007>, triggering the bell sound.
3. Correction
This is a correction for a "serious problem" in the primary character name, usually an error.
There are 31 such aliases.
For example, U+2118SCRIPT CAPITAL P is actually a lowercase p, and so is given alias name ※ WEIERSTRASS ELLIPTIC FUNCTION: "actually this has the form of a lowercase calligraphic p, despite its name, and through the alias the correct spelling is added."
Presentation: A corrected name is preceded by symbol ※ (the reference mark).
4. Alternate
For widely used alternate name for a character.
There is 1 such alias.
Example: U+FEFFZERO WIDTH NO-BREAK SPACE has alternate BYTE ORDER MARK.
Presentation: listed in character charts description.
5. Figment
Several documented labels for C1 control code points which were never actually approved in any standard ( figment = feigned, in fiction).
There are 3 such aliases.
For example, U+0099<control-0099> has figment alias SINGLE GRAPHIC CHARACTER INTRODUCER. This name is an architectural concept from early drafts of ISO/IEC 10646-1, but it was never approved and standardized.
Presentation: These figment abbreviations are not published in Standard; the chart shows "XXX" for each informally, that is: not a unique or identifying abbreviation.

List of aliases

code
point
html
decimal
Name
or <label>
AliasReasonChartNote
AbbrName
U+0000&#0; <control-0000>
NUL
NULLControlC0 Controls and Basic Latin  (pdf)
U+0001&#1; <control-0001>
SOH
START OF HEADINGControlC0 Controls and Basic Latin  (pdf)
U+0002&#2; <control-0002>
STX
START OF TEXTControlC0 Controls and Basic Latin  (pdf)
U+0003&#3; <control-0003>
ETX
END OF TEXTControlC0 Controls and Basic Latin  (pdf)
U+0004&#4; <control-0004>
EOT
END OF TRANSMISSIONControlC0 Controls and Basic Latin  (pdf)
U+0005&#5; <control-0005>
ENQ
ENQUIRYControlC0 Controls and Basic Latin  (pdf)
U+0006&#6; <control-0006>
ACK
ACKNOWLEDGEControlC0 Controls and Basic Latin  (pdf)
U+0007&#7; <control-0007>
BEL
ALERTControlC0 Controls and Basic Latin  (pdf)
U+0008&#8; <control-0008>
BS
BACKSPACEControlC0 Controls and Basic Latin  (pdf)
U+0009&Tab;
&#9;
<control-0009>
TAB
CHARACTER TABULATIONControlC0 Controls and Basic Latin  (pdf)
HT
HORIZONTAL TABULATIONControl
U+000A&#10; <control-000A>
LF
LINE FEEDControlC0 Controls and Basic Latin  (pdf)
NL
NEW LINEControl
EOL
END OF LINEControl
U+000B&#11; <control-000B> LINE TABULATIONControlC0 Controls and Basic Latin  (pdf)
VT
VERTICAL TABULATIONControl
U+000C&#12; <control-000C>
FF
FORM FEEDControlC0 Controls and Basic Latin  (pdf)
U+000D&#13; <control-000D>
CR
CARRIAGE RETURNControlC0 Controls and Basic Latin  (pdf)
U+000E&#14; <control-000E>
SO
SHIFT OUTControlC0 Controls and Basic Latin  (pdf)
LOCKING-SHIFT ONEControl
U+000F&#15; <control-000F>
SI
SHIFT INControlC0 Controls and Basic Latin  (pdf)
LOCKING-SHIFT ZEROControl
U+0010&#16; <control-0010>
DLE
DATA LINK ESCAPEControlC0 Controls and Basic Latin  (pdf)
U+0011&#17;<control-0011>
DC1
DEVICE CONTROL ONEControlC0 Controls and Basic Latin  (pdf)
U+0012&#18;<control-0012>
DC2
DEVICE CONTROL TWOControlC0 Controls and Basic Latin  (pdf)
U+0013&#19;<control-0013>
DC3
DEVICE CONTROL THREEControlC0 Controls and Basic Latin  (pdf)
U+0014&#20;<control-0014>
DC4
DEVICE CONTROL FOURControlC0 Controls and Basic Latin  (pdf)
U+0015&#21; <control-0015>
NAK
NEGATIVE ACKNOWLEDGEControlC0 Controls and Basic Latin  (pdf)
U+0016&#22; <control-0016>
SYN
SYNCHRONOUS IDLEControlC0 Controls and Basic Latin  (pdf)
U+0017&#23; <control-0017>
ETB
END OF TRANSMISSION BLOCKControlC0 Controls and Basic Latin  (pdf)
U+0018&#24; <control-0018>
CAN
CANCELControlC0 Controls and Basic Latin  (pdf)
U+0019&#25; <control-0019>
EOM
END OF MEDIUMControlC0 Controls and Basic Latin  (pdf)
EM
Abbreviationadded in version 15.0
U+001A&#26; <control-001A>
SUB
SUBSTITUTEControlC0 Controls and Basic Latin  (pdf)
U+001B&#27; <control-001B>
ESC
ESCAPEControlC0 Controls and Basic Latin  (pdf)
U+001C&#28; <control-001C> INFORMATION SEPARATOR FOURControlC0 Controls and Basic Latin  (pdf)
FS
FILE SEPARATORControl
U+001D&#29;<control-001D>INFORMATION SEPARATOR THREEControlC0 Controls and Basic Latin  (pdf)
GS
GROUP SEPARATORControl
U+001E&#30;<control-001E>INFORMATION SEPARATOR TWOControlC0 Controls and Basic Latin  (pdf)
RS
RECORD SEPARATORControl
U+001F&#31;<control-001F>INFORMATION SEPARATOR ONEControlC0 Controls and Basic Latin  (pdf)
US
UNIT SEPARATORControl
U+0020&#32; SPACE
SP
AbbreviationC0 Controls and Basic Latin  (pdf)
U+007F&#127; <control-007F>
DEL
DELETEControlC0 Controls and Basic Latin  (pdf)
U+0080&#128; <control-0080>
PAD
PADDING CHARACTERFigmentC1 Controls and Latin-1 Supplement  (pdf) Aliases are not widely published by Unicode; chart shows non-unique XXX
U+0081&#129; <control-0081>
HOP
HIGH OCTET PRESETFigmentC1 Controls and Latin-1 Supplement  (pdf) Aliases are not widely published by Unicode; chart shows non-unique XXX
U+0082&#130;<control-0082>
BPH
BREAK PERMITTED HEREControlC1 Controls and Latin-1 Supplement  (pdf)
U+0083&#131;<control-0083>
NBH
NO BREAK HEREControlC1 Controls and Latin-1 Supplement  (pdf)
U+0084&#132;<control-0084>
IND
INDEXControlC1 Controls and Latin-1 Supplement  (pdf)
U+0085&#133;<control-0085>
NEL
NEXT LINEControlC1 Controls and Latin-1 Supplement  (pdf)
U+0086&#134;<control-0086>
SSA
START OF SELECTED AREAControlC1 Controls and Latin-1 Supplement  (pdf)
U+0087&#135;<control-0087>
ESA
END OF SELECTED AREAControlC1 Controls and Latin-1 Supplement  (pdf)
U+0088&#136;<control-0088>CHARACTER TABULATION SETControlC1 Controls and Latin-1 Supplement  (pdf)
HTS
HORIZONTAL TABULATION SETControl
U+0089&#137;<control-0089>CHARACTER TABULATION WITH JUSTIFICATIONControlC1 Controls and Latin-1 Supplement  (pdf)
HTJ
HORIZONTAL TABULATION WITH JUSTIFICATIONControl
U+008A&#138;<control-008A>LINE TABULATION SETControlC1 Controls and Latin-1 Supplement  (pdf)
VTS
VERTICAL TABULATION SETControl
U+008B&#139;<control-008B>PARTIAL LINE FORWARDControlC1 Controls and Latin-1 Supplement  (pdf)
PLD
PARTIAL LINE DOWNControl
U+008C&#140;<control-008C>PARTIAL LINE BACKWARDControlC1 Controls and Latin-1 Supplement  (pdf)
PLU
PARTIAL LINE UPControl
U+008D&#141;<control-008D>REVERSE LINE FEEDControlC1 Controls and Latin-1 Supplement  (pdf)
RI
REVERSE INDEXControl
U+008E&#142;<control-008E>SINGLE SHIFT TWOControlC1 Controls and Latin-1 Supplement  (pdf)
SS2
SINGLE-SHIFT-2Control
U+008F&#143;<control-008F>SINGLE SHIFT THREEControlC1 Controls and Latin-1 Supplement  (pdf)
SS3
SINGLE-SHIFT-3Control
U+0090&#144;<control-0090>
DCS
DEVICE CONTROL STRINGControlC1 Controls and Latin-1 Supplement  (pdf)
U+0091&#145;<control-0091>PRIVATE USE ONEControlC1 Controls and Latin-1 Supplement  (pdf)
PU1
PRIVATE USE-1Control
U+0092&#146;<control-0092>PRIVATE USE TWOControlC1 Controls and Latin-1 Supplement  (pdf)
PU2
PRIVATE USE-2Control
U+0093&#147;<control-0093>
STS
SET TRANSMIT STATEControlC1 Controls and Latin-1 Supplement  (pdf)
U+0094&#148;<control-0094>
CCH
CANCEL CHARACTERControlC1 Controls and Latin-1 Supplement  (pdf)
U+0095&#149;<control-0095>
MW
MESSAGE WAITINGControlC1 Controls and Latin-1 Supplement  (pdf)
U+0096&#150;<control-0096>START OF GUARDED AREAControlC1 Controls and Latin-1 Supplement  (pdf)
SPA
START OF PROTECTED AREAControl
U+0097&#151;<control-0097>END OF GUARDED AREAControlC1 Controls and Latin-1 Supplement  (pdf)
EPA
END OF PROTECTED AREAControl
U+0098&#152;<control-0098>
SOS
START OF STRINGControlC1 Controls and Latin-1 Supplement  (pdf)
U+0099&#153; <control-0099>
SGC
SINGLE GRAPHIC CHARACTER INTRODUCERFigmentC1 Controls and Latin-1 Supplement  (pdf) Aliases are not widely published by Unicode; chart shows non-unique XXX
U+009A&#154;<control-009A>
SCI
SINGLE CHARACTER INTRODUCERControlC1 Controls and Latin-1 Supplement  (pdf)
U+009B&#155;<control-009B>
CSI
CONTROL SEQUENCE INTRODUCERControlC1 Controls and Latin-1 Supplement  (pdf)
U+009C&#156;<control-009C>
ST
STRING TERMINATORControlC1 Controls and Latin-1 Supplement  (pdf)
U+009D&#157;<control-009D>
OSC
OPERATING SYSTEM COMMANDControlC1 Controls and Latin-1 Supplement  (pdf)
U+009E&#158;<control-009E>
PM
PRIVACY MESSAGEControlC1 Controls and Latin-1 Supplement  (pdf)
U+009F&#159;<control-009F>
APC
APPLICATION PROGRAM COMMANDControlC1 Controls and Latin-1 Supplement  (pdf)
U+00A0&nbsp;&NonBreakingSpace;
&#160;
NO-BREAK SPACE
NBSP
AbbreviationC1 Controls and Latin-1 Supplement  (pdf)
U+00AD&shy;
&#173;
SOFT HYPHEN
SHY
AbbreviationC1 Controls and Latin-1 Supplement  (pdf)
U+01A2&#418; LATIN CAPITAL LETTER OI LATIN CAPITAL LETTER GHA※ CorrectionLatin Extended-B  (pdf)
U+01A3&#419; LATIN SMALL LETTER OI LATIN SMALL LETTER GHA※ CorrectionLatin Extended-B  (pdf)
U+034F&#847; COMBINING GRAPHEME JOINER
CGJ
AbbreviationCombining Diacritical Marks  (pdf) The name of this character is misleading; it does not actually join graphemes
U+0616&#1558;ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEHARABIC SMALL HIGH LIGATURE ALEF WITH YEH BARREE※ CorrectionArabic added in version 15.0
U+061C&#1564; ARABIC LETTER MARK
ALM
AbbreviationArabic  (pdf) See RLM
U+0709&#1801; SYRIAC SUBLINEAR COLON SKEWED RIGHT SYRIAC SUBLINEAR COLON SKEWED LEFT※ CorrectionSyriac  (pdf)
U+0CDE&#3294; KANNADA LETTER FA KANNADA LETTER LLLA※ CorrectionKannada  (pdf)
U+0E9D&#3741; LAO LETTER FO TAM LAO LETTER FO FON※ CorrectionLao  (pdf)
U+0E9F&#3743; LAO LETTER FO SUNG LAO LETTER FO FAY※ CorrectionLao  (pdf)
U+0EA3&#3747; LAO LETTER LO LING LAO LETTER RO※ CorrectionLao  (pdf)
U+0EA5&#3749; LAO LETTER LO LOOT LAO LETTER LO※ CorrectionLao  (pdf)
U+0FD0&#4048; TIBETAN MARK BSKA- SHOG GI MGO RGYAN TIBETAN MARK BKA- SHOG GI MGO RGYAN※ CorrectionTibetan  (pdf)
U+11EC&#4588; HANGUL JONGSEONG IEUNG-KIYEOK HANGUL JONGSEONG YESIEUNG-KIYEOK※ CorrectionHangul Jamo  (pdf)
U+11ED&#4589; HANGUL JONGSEONG IEUNG-SSANGKIYEOK HANGUL JONGSEONG YESIEUNG-SSANGKIYEOK※ CorrectionHangul Jamo  (pdf)
U+11EE&#4590; HANGUL JONGSEONG SSANGIEUNG HANGUL JONGSEONG SSANGYESIEUNG※ CorrectionHangul Jamo  (pdf)
U+11EF&#4591; HANGUL JONGSEONG IEUNG-KHIEUKH HANGUL JONGSEONG YESIEUNG-KHIEUKH※ CorrectionHangul Jamo  (pdf)
U+180B&#6155; MONGOLIAN FREE VARIATION SELECTOR ONE
FVS1
AbbreviationMongolian  (pdf)
U+180C&#6156; MONGOLIAN FREE VARIATION SELECTOR TWO
FVS2
AbbreviationMongolian  (pdf)
U+180D&#6157; MONGOLIAN FREE VARIATION SELECTOR THREE
FVS3
AbbreviationMongolian  (pdf)
U+180E&#6158; MONGOLIAN VOWEL SEPARATOR
MVS
AbbreviationMongolian  (pdf)
U+180F&#6159; MONGOLIAN FREE VARIATION SELECTOR FOUR
FVS4
AbbreviationMongolian  (pdf)
U+1BBD&#7101;SUNDANESE LETTER BHASUNDANESE LETTER ARCHAIC I※ CorrectionSudanese  (pdf) added in version 15.0
U+200B&NegativeMediumSpace;&NegativeThickSpace;&NegativeThinSpace;&NegativeVeryThinSpace;&ZeroWidthSpace;
&#8203;
ZERO WIDTH SPACE
ZWSP
AbbreviationGeneral Punctuation  (pdf)
U+200C&zwnj;
&#8204;
ZERO WIDTH NON-JOINER
ZWNJ
AbbreviationGeneral Punctuation  (pdf)
U+200D&zwj;
&#8205;
ZERO WIDTH JOINER
ZWJ
AbbreviationGeneral Punctuation  (pdf)
U+200E&lrm;
&#8206;
LEFT-TO-RIGHT MARK
LRM
AbbreviationGeneral Punctuation  (pdf)
U+200F&rlm;
&#8207;
RIGHT-TO-LEFT MARK
RLM
AbbreviationGeneral Punctuation  (pdf)
U+202A&#8234; LEFT-TO-RIGHT EMBEDDING
LRE
AbbreviationGeneral Punctuation  (pdf)
U+202B&#8235; RIGHT-TO-LEFT EMBEDDING
RLE
AbbreviationGeneral Punctuation  (pdf)
U+202C&#8236; POP DIRECTIONAL FORMATTING
PDF
AbbreviationGeneral Punctuation  (pdf)
U+202D&#8237; LEFT-TO-RIGHT OVERRIDE
LRO
AbbreviationGeneral Punctuation  (pdf)
U+202E&#8238; RIGHT-TO-LEFT OVERRIDE
RLO
AbbreviationGeneral Punctuation  (pdf)
U+202F&#8239; NARROW NO-BREAK SPACE
NNBSP
AbbreviationGeneral Punctuation  (pdf)
U+205F&MediumSpace;
&#8287;
MEDIUM MATHEMATICAL SPACE
MMSP
AbbreviationGeneral Punctuation  (pdf)
U+2060&NoBreak;
&#8288;
WORD JOINER
WJ
AbbreviationGeneral Punctuation  (pdf)
U+2066&#8294;LEFT-TO-RIGHT ISOLATE
LRI
AbbreviationGeneral Punctuation  (pdf)
U+2067&#8295;RIGHT-TO-LEFT ISOLATE
RLI
AbbreviationGeneral Punctuation  (pdf)
U+2068&#8296; FIRST STRONG ISOLATE
FSI
AbbreviationGeneral Punctuation  (pdf)
U+2069&#8297; POP DIRECTIONAL ISOLATE
PDI
AbbreviationGeneral Punctuation  (pdf)
U+2118&weierp;&wp;
&#8472;
SCRIPT CAPITAL P WEIERSTRASS ELLIPTIC FUNCTION※ CorrectionLetterlike Symbols  (pdf)
U+2448&#9288; OCR DASH MICR ON US SYMBOL※ CorrectionOptical Character Recognition  (pdf)
U+2449&#9289; OCR CUSTOMER ACCOUNT NUMBER MICR DASH SYMBOL※ CorrectionOptical Character Recognition  (pdf)
U+2B7A&#11130; LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE※ CorrectionMiscellaneous Symbols and Arrows  (pdf)
U+2B7C&#11132; RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE VERTICAL STROKE※ CorrectionMiscellaneous Symbols and Arrows  (pdf)
U+A015&#40981; YI SYLLABLE WU YI SYLLABLE ITERATION MARK※ CorrectionYi Syllables  (pdf)
U+AA6E&#43630; MYANMAR LETTER KHAMTI HHA MYANMAR LETTER KHAMTI LLA※ CorrectionMyanmar Extended-A  (pdf)
U+FE00
...
U+FE0F
&#65024;
...
&#65039;
VARIATION SELECTOR-1
...
VARIATION SELECTOR-16
VS1
...
VS16
AbbreviationVariation Selectors  (pdf)
(16 code points)
Abbreviation
U+FE18&#65048; PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRACKET※ CorrectionVertical Forms  (pdf)
U+FEFF&#65279; ZERO WIDTH NO-BREAK SPACE
BOM
BYTE ORDER MARKAlternateArabic Presentation Forms-B  (pdf)
ZWNBSP
Abbreviation
U+122D4&#74452; CUNEIFORM SIGN SHIR TENU CUNEIFORM SIGN NU11 TENU※ CorrectionCuneiform  (pdf)
U+122D5&#74453; CUNEIFORM SIGN SHIR OVER SHIR BUR OVER BUR CUNEIFORM SIGN NU11 OVER NU11 BUR OVER BUR※ CorrectionCuneiform  (pdf)
U+16E56&#93782; MEDEFAIDRIN CAPITAL LETTER HP MEDEFAIDRIN CAPITAL LETTER H※ CorrectionMedefaidrin  (pdf)
U+16E57&#93783; MEDEFAIDRIN CAPITAL LETTER NY MEDEFAIDRIN CAPITAL LETTER NG※ CorrectionMedefaidrin  (pdf)
U+16E76&#93814; MEDEFAIDRIN SMALL LETTER HP MEDEFAIDRIN SMALL LETTER H※ CorrectionMedefaidrin  (pdf)
U+16E77&#93815; MEDEFAIDRIN SMALL LETTER NY MEDEFAIDRIN SMALL LETTER NG※ CorrectionMedefaidrin  (pdf)
U+1B001&#110593; HIRAGANA LETTER ARCHAIC YE HENTAIGANA LETTER E-1※ CorrectionKana Supplement  (pdf)
U+1D0C5&#118981; BYZANTINE MUSICAL SYMBOL FHTORA SKLIRON CHROMA VASIS BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS※ CorrectionByzantine Musical Symbols  (pdf)
U+E0100
...
U+E01EF
&#917760;
...
&#917999;
VARIATION SELECTOR-17
...
VARIATION SELECTOR-256
VS17
...
VS256
AbbreviationVariation Selectors Supplement  (pdf)
(240 code points)
Abbreviation

Informal alternative names

The Unicode standard also uses and publishes alternative names that are not formal, and are not listed as normative alias names. These labels may not be unique and may use irregular characters in their name. They are used in Unicode code charts, for example U+070F  SYRIAC ABBREVIATION MARK: SAM. [3]

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

In computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

<span class="mw-page-title-main">Tab key</span> Key on a keyboard for tabulation

The tab keyTab ↹ on a keyboard is used to advance the cursor to the next tab stop.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

In word processing and digital typesetting, a non-breaking space, also called NBSP, required space, hard space, or fixed space, is a space character that prevents an automatic line break at its position. In some formats, including HTML, it also prevents consecutive whitespace characters from collapsing into a single space. Non-breaking space characters with other widths also exist.

T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languages. It is also used by older versions of LDAP. While T.61 continues to be supported in modern versions of X.500 and X.509, it has been deprecated in favor of Unicode. It is also called Code page 1036, CP1036, or IBM 01036.

A bell character is a device control code originally sent to ring a small electromechanical bell on tickers and other teleprinters and teletypewriters to alert operators at the other end of the line, often of an incoming message. Though tickers punched the bell codes into their tapes, printers generally do not print a character when the bell code is received. Bell codes are usually represented by the label "BEL". They have been used since 1870.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

The Unicode Standard assigns various properties to each Unicode character and code point.

This article describes and classifies the Unicode characters that may validly appear in XML.

<span class="mw-page-title-main">ZX Spectrum character set</span>

The ZX Spectrum character set is the variant of ASCII used in the ZX Spectrum family computers. It is based on ASCII-1967 but the characters ^, ` and DEL are replaced with ↑, £ and ©. It also differs in its use of the C0 control codes other than the common BS and CR, and it makes use of the 128 high-bit characters beyond the ASCII range. The ZX Spectrum's main set of printable characters and system font are also used by the Jupiter Ace computer.

Microsoft Windows code page 932, also called Windows-31J amongst other names, is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.

This article covers technical details of the character encoding system defined by ETS 300 706 of the ETSI, a standard for World System Teletext, and used for the Viewdata and Teletext variants of Videotex in Europe.

References

  1. 1 2 "NameAliases-15.1.0.txt". The Unicode Consortium. 2023-01-05. Retrieved 2023-09-12.
  2. The Unicode Standard (PDF). 15.0.0. The Unicode Consortium. 2022. ISBN   978-1-936213-32-0.
  3. "Unicode 14.0 Character Code Charts: Syriac" (PDF).