Videotex character set

Last updated
In addition to sets of graphical characters and control characters, some Videotex formats included Picture Description Instructions (PDI) sets used for in-band vector graphics. PDI-alpha-geometrique.png
In addition to sets of graphical characters and control characters, some Videotex formats included Picture Description Instructions (PDI) sets used for in-band vector graphics.

The character sets used by Videotex are based, to greater or lesser extents, on ISO/IEC 2022. Three Data Syntax systems are defined by ITU T.101, corresponding to the Videotex systems of different countries.

Contents

Data Syntax 1

Data Syntax 1 is defined in Annex B of T.101:1994. It is based on the CAPTAIN system used in Japan. Its graphical sets include JIS X 0201 and JIS X 0208.

The following G-sets are available through ISO/IEC 2022-based designation escapes: [1] :AnxB.2.3

NameG-set escape typeF byte ISO-IR for F byte
Primary Character setSingle byte 94-code0x4A (J)ISO-IR-14 (JIS X 0201 Roman)
Katakana Character setSingle byte 94-code0x49 (I)ISO-IR-13 (JIS X 0201 Kana)
Mosaic I setSingle byte 94-code0x33 (3)(Occupies private-use F byte; also registered as ISO-IR-137 with F byte 0x79) [2]
Mosaic II setSingle byte 94-code0x63 (c)ISO-IR-71 [3]
Display Control setSingle byte 96-code0x38 (8)(Occupies private-use F byte)
PDI setSingle byte 96-code0x57 (W)(F byte exceptionally reserved and not used in ISO-IR) [4]
MVI setSingle byte 96-code0x39 (9)(Occupies private-use F byte)
Kanji setMultiple byte 94n-code0x42 (B)ISO-IR-87 (JIS X 0208:1983)
Macro setSingle byte DRCS 96-code0x40 (@)(Uses a DRCS escape syntax)
DRCS I setSingle byte DRCS 94-code0x41 (A)(Is a DRCS)
DRCS II setMultiple byte DRCS 94n-code0x40 (@)(Is a DRCS)

Mosaic sets for Data Syntax 1

The mosaic sets supply characters for use in semigraphics.

Videotex Mosaic set: First Mosaic set for Data Syntax 1 (ISO-IR-137; partial Unicode mapping) [2]
0123456789ABCDEF
0x
1x
2x🮛
3x🮚
4x
5x
6x🭒🭓🭔🭕🭖🭗🭘🭙🭚🭛🭜🭬🭭
7x🭝🭞🭟🭠🭡🭢🭣🭤🭥🭦🭧🭮🭯

� Not in Unicode

Videotex Mosaic set: Second Mosaic set for Data Syntax 1 (ISO-IR-71) [3]
0123456789ABCDEF
0x
1x
2x🬀🬁🬂🬃🬄🬅🬆🬇🬈🬉🬊🬋🬌🬍🬎
3x🬏🬐🬑🬒🬓🬔🬕🬖🬗🬘🬙🬚🬛🬜🬝
4x🬼🬽🬾🬿🭀🭁🭂🭃🭄🭅🭆🭨🭩🭰🮕
5x🭇🭈🭉🭊🭋🭌🭍🭎🭏🭐🭑🭪🭫🭵
6x🬞🬟🬠🬡🬢🬣🬤🬥🬦🬧🬨🬩🬪🬫🬬
7x🬭🬮🬯🬰🬱🬲🬳🬴🬵🬶🬷🬸🬹🬺🬻

Data Syntax 2

Data Syntax 2 is defined in Annex C of T.101:1994. It corresponds to some European Videotex systems such as CEPT T/CD 06-01. The graphical character coding of Data Syntax 2 is based on T.51.

The default G2 set of Data Syntax 2 is based on an older version of T.51, lacking the non-breaking space, soft hyphen, not sign (¬) and broken bar (¦) present in the current version, but adding a dialytika tonoscombining form is U+0344) at the beginning of the row of diacritical marks for combination with codes from a Greek primary set. [5] An umlaut diacritic code distinct from the diaeresis code, as included in some versions of T.61, is also sometimes included. [6]

The default G1 set is the second mosaic set, corresponding roughly to the second mosaic set of Data Syntax 1. [1] :AnxCpt1/TableC.11 The default G3 set is the third mosaic set, matching the first mosaic set of Data Syntax 1 for 0x60 through 0x6D and 0x70 through 0x7D, and otherwise differing. [1] :AnxCpt1/TableC.12 The first mosaic set matches the second except for 0x40 through 0x5E: 0x40 through 0x5A follow ASCII (supplying uppercase letters), whereas the remainder are national variant characters; the displaced full block is placed at 0x7F. [1] :AnxCpt1/TableC.10

Videotex Mosaic set: First Mosaic set for Data Syntax 2 [1] :AnxCpt1/TableC.10
0123456789ABCDEF
0x
1x
2x SP 🬀🬁🬂🬃🬄🬅🬆🬇🬈🬉🬊🬋🬌🬍🬎
3x🬏🬐🬑🬒🬓🬔🬕🬖🬗🬘🬙🬚🬛🬜🬝
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ a ½ a a a ⌗/_ b
6x🬞🬟🬠🬡🬢🬣🬤🬥🬦🬧🬨🬩🬪🬫🬬
7x🬭🬮🬯🬰🬱🬲🬳🬴🬵🬶🬷🬸🬹🬺🬻

Data Syntax 3

Data Syntax 3 is defined in Annex D of T.101:1994. The graphical character coding of Data Syntax 3 is based on T.51.

The supplementary set for Data Syntax 3 is based on an older version of T.51, lacking the non-breaking space, soft hyphen, not sign (¬) and broken bar (¦) present in the current version, and allocating non-spacing marks for a "vector overbar" and solidus and several semigraphic characters to unallocated space in that set.

See the comments in the T.51 article for caveats about the combining mark Unicode mappings shown below. Unlike Unicode combining characters, T.51 diacritic codes precede the base character.

Supplementary Set for Videotex Data Syntax 3 [7]
0123456789ABCDEF
0x/8x
1x/9x
2x/Ax ¡ ¢ £ $ ¥ # § ¤ «
3x/Bx ° ± ² ³ × µ · ÷ » ¼ ½ ¾ ¿
4x/Cx ̀ ́ ̂ ̃ ̄ ̆ ̇ ̈ ̸ ̊ ̧ ̲ ̋ ̨ ̌
5x/Dx ¹ ® ©
6x/Ex Æ Đ/Ð ª Ħ IJ Ŀ Ł Ø Œ º Þ Ŧ Ŋ ʼn
7x/Fx ĸ æ đ ð ħ ı ij ŀ ł ø œ ß þ ŧ ŋ
  Differences from T.51 (1988 edition, first supplementary set)

C0 control codes

C0 control codes for Videotex differ from ASCII as shown in the table below. The NUL, BEL, SO (LS1), SI (LS0) and ESC codes are also available in some or all data syntaxes, but without change in name or semantic from ASCII. [8] [9] [10]

Seq DecHexReplacedSyntaxesAcronymNameDescription
^H0808BS1, [8] 2, [9] 3 [10] APBActive Position BackwardMoves cursor one position backward. If it is at the start of the line, moves it to the end of the line and back one line. This retains one possible semantic of the ASCII BS.
^I0909HT1, [8] 2, [9] 3 [10] APFActive Position ForwardMoves cursor one position forward. If it is at the end of the line, moves it to the start of the line and forward one line.
^J100ALF1, [8] 2, [9] 3 [10] APDActive Position DownMoves cursor one line forward. If it is at the last line of the screen, moves it to the first line unless Data Syntax 3 scroll mode is active. This retains one possible semantic of the ASCII LF.
^K110BVT1, [8] 2, [9] 3 [10] APUActive Position UpMoves cursor one line backward. If it is at the first line of the screen, moves it to the last line unless Data Syntax 3 scroll mode is active.
^L120CFF1, [8] 2, [9] 3 [10] CSClear ScreenResets entire display to spaces with default display attributes and returns the cursor to its initial position. In Data Syntax 1, also resets macros and DRCS. This retains one possible semantic of the ASCII FF.
^M130DCR1, [8] 2, [9] 3 [10] APRActive Position ReturnMoves the cursor to the start of the line. In Data Syntax 3, may instead move it to the start of the active field if it is entirely within it. This retains one possible semantic of the ASCII CR.
^Q1711DC1/XON2 [9] CONCursor OnMakes the cursor visible.
^R1812DC22 [9] RPTRepeatRepeats the immediately preceding graphic character a number of times indicated by the low six bits of the following byte (from 0x40 to 0x7F).
^T2014DC41 [1] :AnxB.3.1KMCKey-In-Monitor ConcealTakes one parameter: 0x40 makes the key-in-monitor area unconcealed, 0x41 makes it concealed.
2 [9] COFCursor OffMakes the cursor invisible.
^X2418CAN1, [8] 2, [9] 3 [10] CANCancelIn Data Syntax 2, fill the rest of the current line (after the current position) with spaces (compare EL). In Data Syntax 1 and 3, immediately stop all running macros. Contrast the semantic of basic ASCII CAN.
^Y2519EM1, [8] 2, [9] 3 [10] SS2Single Shift Two Non-locking shift code for G2.
^Z261ASUB3 [10] SDCService Delimitor CharacterImplementation-defined but non-presentational.
^\281CFS1, [8] 3 [10] APSActive Position SetFollowed by two bytes (from 0x40 to 0x7F; may also be from 0xA0 to 0xFF in Data Syntax 3) respectively giving a row and column address in their low six bits. Compare CUP and HVP.
^]291DGS1, [8] 2, [9] 3 [10] SS3Single Shift Three Non-locking shift code for G3.
^^301ERS1, [8] 2, [9] 3 [10] APHActive Position HomeReturns cursor to the initial position.
^_311FUS1, [8] 3 [10] NSRNon-Selective ResetResets all display attributes (including ISO 2022 state, domain, text parameters, textures, colour mode but not macros, DRCS or programmable masks), then moves the cursor to a specified position. Followed by two bytes (from 0x40 to 0x7F; may also be from 0xA0 to 0xFF in Data Syntax 3) respectively giving a row and column address in their low six bits. Compare RIS.
2 [9] APAActive Position AddressFollowed by two or four bytes (from 0x40 to 0x7F) giving a row and column address in their low six bits. Four bytes are used if there are more than 63 rows and columns, with the most significant six bits being first for each parameter. Compare CUP and HVP. If the following byte is not in the range of 0x40 to 0x7F, indicates a switch to another coding scheme (contrast DOCS).

C1 control codes

The following specialised C1 control codes are used in Videotex. There are four registered sets, with some differences between them.

8-bitEscapeData Syntax 1 [11] Data Syntax 2, "Parallel" C1 set [12] [1] :AnxC.3.3.2Data Syntax 2, "Serial" C1 set [13] [1] :AnxC.3.3.1Data Syntax 3 [14]
0x80ESC 0x40 (@)BKF, Black Foreground.ABK, Alpha Black. Switch to alphabetic, black foreground.DEFM, Define Macro. Next character (from 0x20 to 0x7F) gives macro name, rest is stored as part of macro until another DEF* or an END .
0x81ESC 0x41 (A)RDF, Red Foreground.ANR, Alpha Red. Switch to alphabetic, red foreground.DEFP, Define P-Macro. Like DEFM , but simultaneously defines and executes the macro.
0x82ESC 0x42 (B)GRF, Green Foreground.ANG, Alpha Green. Switch to alphabetic, green foreground.DEFT, Define Transmit-Macro. Like DEFM but defines a macro to be transmitted, not executed.
0x83ESC 0x43 (C)YLF, Yellow Foreground.ANY, Alpha Yellow. Switch to alphabetic, yellow foreground.DEFD, Define DRCS. Defines a character in the Dynamically Redefinable Character Set. Expected to be followed by the character code defined (from 0x20 to 0x7F) unless it terminates a previous DEFD, in which case it defines the next code. Terminated by another DEF* or an END
0x84ESC 0x44 (D)BLF, Blue Foreground.ANB, Alpha Blue. Switch to alphabetic, blue foreground.DEFX, Define Texture. Defines a texture mask. Expected to be followed by the texture mask ID defined (from 0x40 to 0x44). Terminated by another DEF* or an END
0x85ESC 0x45 (E)MGF, Magenta Foreground.ANM, Alpha Magenta. Switch to alphabetic, magenta foreground.END, End. Terminates a macro, DRCS character or texture definition. Also used in unprotected fields.
0x86ESC 0x46 (F)CNF, Cyan Foreground.ANC, Alpha Cyan. Switch to alphabetic, cyan foreground.REP, Repeat. Repeats preceding spacing graphical character a number of times specified by the following byte (from 0x40 to 0x7F).
0x87ESC 0x47 (G)WHF, White Foreground.ANW, Alpha White. Switch to alphabetic, white foreground.REPE, Repeat to End of Line. Repeats preceding spacing graphical character until the end of the line is reached.
0x88ESC 0x48 (H)SSZ, Small Size. Characters half normal width and heightFSH, Flashing. Characters displayed flashing between foreground and background.REVV, Reverse Video. Enables reverse video mode.
0x89ESC 0x49 (I)MSZ, Medium Size. Characters normal height, half normal widthSTD, Steady. Terminates flashing.NORV, Normal Video. Disables reverse video mode.
0x8AESC 0x4A (J)NSZ, Normal Size. Characters normal width and height.EBX, End Box. Terminates SBX .SMTX, Small Text. Text size 1/80 of screen width and 5/128 of screen height.
0x8BESC 0x4B (K)SZX, Size Control. Followed by a one-byte parameter. 0x41 means double height (DBH), 0x44 means double width (DBW), 0x45 means doubled width and height (DBS). [1] :AnxB.3.2.2SBX, Start Box. Defines a non-alphanumeric area, with transparent background. Terminated by EBX .METX, Medium Text. Text size 1/32 of screen width and 3/64 of screen height.
0x8CESC 0x4C (L)(not used)NSZ, Normal Size. Characters normal width and height.NOTX, Normal Text. Text size 1/40 of screen width and 5/128 of screen height.
0x8DESC 0x4D (M)(not used)DBH, Double Height. Characters normal width and double normal height. Inactive on top line.DBH, Double Height. Characters normal width and double normal height. Inactive on bottom line.DBH, Double Height. Text size 1/40 of screen width and 5/64 of screen height.
0x8EESC 0x4E (N)CON, Cursor On. Makes cursor visible.DBW, Double Width. Characters normal height and double normal width. Inactive in last position of line.BSTA, Blink Start.
0x8FESC 0x4F (O)COF, Cursor Off. Makes cursor invisible.DBS, Double Size. Characters normal height and double normal width. Inactive on top line or in last position of line.DBS, Double Size. Characters normal height and double normal width. Inactive on bottom line or in last position of line.DBS, Double Size. Text size 1/20 of screen width and 5/64 of screen height.
0x90ESC 0x50 (P)COL, Background or Foreground Colour. Takes a one-byte parameter. 0x48–0x4F sets a reduced intensity foreground. 0x50–0x57 sets background colour. 0x58–0x5F sets a reduced intensity background. Colour order is the same as that of the individual foreground colour controls (black, red, green, yellow, blue, magenta, cyan, white), but transparent takes the place of reduced intensity black. [1] :AnxB.3.2.1BKB, Black Background.MBK, Mosaic Black. Switch to mosaic, black foreground.PRO, Protect. Makes all character fields within the active field protected.
0x91ESC 0x51 (Q)FLC, Flashing Control. Takes one parameter: 0x40 for "normal" flashing, 0x41 through 0x47 for other flashing modes, 0x4F for steady (terminate flashing). [1] :AnxB.3.2.4RDB, Red Background.MSR, Mosaic Red. Switch to mosaic, red foreground.(EDC1, not used)
0x92ESC 0x52 (R)CDC, Conceal Display Control. Takes a one-byte parameter defining conceal display attributes, which can make text invisible until user interaction. 0x40 is used to start a concealed range (CDY), 0x4F is used to terminate it (SCD). [1] :AppB.3.2.7GRB, Green Background.MSG, Mosaic Green. Switch to mosaic, green foreground.(EDC2, not used)
0x93ESC 0x53 (S)(not used)YLB, Yellow Background.MSY, Mosaic Yellow. Switch to mosaic, yellow foreground.(EDC3, not used)
0x94ESC 0x54 (T)(not used)BLB, Blue Background.MSB, Mosaic Blue. Switch to mosaic, blue foreground.(EDC4, not used)
0x95ESC 0x55 (U)P-MACRO, Photo Macro. Followed by a single-byte parameter (0x40 for define, 0x41 for define and execute, 0x42 to define a transmit-macro, 0x4F to delimit the end of a macro definition). [1] :AppB.3.2.9 Second single-byte parameter (from 0x20 to 0x7F) identifies the photo macro being defined (from PM0 to PM95).MGB, Magenta Background.MSM, Mosaic Magenta. Switch to mosaic, magenta foreground.WWON, Word Wrap On.
0x96ESC 0x56 (V)(not used)CNB, Cyan Background.MSC, Mosaic Cyan. Switch to mosaic, cyan foreground.WWOF, Word Wrap Off.
0x97ESC 0x57 (W)(not used)WHB, White Background.MSW, Mosaic White. Switch to mosaic, white foreground.SCON, Scroll On. Next-lining off the bottom of the screen moves the rest of the screen up to make space.
0x98ESC 0x58 (X)RPC, Repeat Control. Repeats preceding spacing graphical character a number of times specified by the low six bits of the following byte (from 0x40 to 0x7F). Repeats to end of line if byte is 0x40. Compare REP from Data Syntax 3.CDY, Conceal Display. Display characters as spaces (might be terminated by SCD ).SCOF, Scroll Off. Next-lining off the bottom of the screen wraps around to the top of the screen.
0x99ESC 0x59 (Y)SPL, Stop Lining. Terminates underlining. For mosaic characters, non-underlined font corresponds to contiguous display, with the blocks within a mosaic character joined together.USTA, Underline Start. Begins underlined letters, and switches to separated display for mosaics.
0x9AESC 0x5A (Z)STL, Start Lining. Begins underlined letters. For mosaics, this corresponds to separated display, with the blocks within a mosaic character shown separated.USTO, Underline Stop. Terminates underlining, and switches to contiguous display for mosaics.
0x9BESC 0x5B ([)(not used)CSI, Control Sequence Introducer.FLC, Flash Cursor. User input cursor turned on, flashing.
0x9CESC 0x5C (\)(not used)NPO, Normal Polarity. Foreground in foreground colour, background in background colour.BBD, Black Background.STC, Steady Cursor. User input cursor turned on, always visible.
0x9DESC 0x5D (])(not used)IPO, Inverted Polarity. Foreground in background colour, background in foreground colour.NBD, New Background. Set background colour to previous foreground colour. The current foreground colour is not affected.COF, Cursor Off. User input cursor invisible, but still functional.
0x9EESC 0x5E (^)UNP, Unprotected. Makes following characters unprotected from user input.TRB, Transparent Background.HMS, Hold Mosaic. Image subsequently stored control functions as the last received mosaic character.BSTO, Blink Stop.
0x9FESC 0x5F (_)PRT, Protected. Makes following characters protected from user inputSCD, Stop Conceal. Terminate CDY .RMS, Release Mosaic. Terminate HMS .UNP, Unprotect. Makes a field unprotected (open to user input).

Related Research Articles

In telecommunication and character encoding, the term cancel character refers to a control character which may be either of:

  1. "CAN", "Cancel", U+0018, or ^X used to indicate that the data with which it is associated are in error or are to be disregarded. Exact meaning can depend on protocol. For example:
  2. "CCH", "Cancel Character", U+0094, or ESC T used to erase the previous character. This character was created as an unambiguous alternative to the much more common backspace character, which has a now mostly obsolete alternative function of causing the following character to be superimposed on the preceding one.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it. This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

ISO/IEC 8859-4:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 4: Latin alphabet No. 4, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-4 or North European. It was designed to cover Estonian, Latvian, Lithuanian, Greenlandic, and Sámi. It has been largely superseded by ISO/IEC 8859-10 and Unicode. Microsoft has assigned code page 28594 a.k.a. Windows-28594 to ISO-8859-4 in Windows. IBM has assigned code page 914 to ISO 8859-4.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic.

ISO/IEC 8859-10:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1992. It is informally referred to as Latin-6. It was designed to cover the Nordic languages, deemed of more use for them than ISO 8859-4.

ISO/IEC 8859-13:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No. 7, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-7 or Baltic Rim. It was designed to cover the Baltic languages, and added characters used in Polish missing from the earlier encodings ISO 8859-4 and ISO 8859-10. Unlike these two, it does not cover the Nordic languages. It is similar to the earlier-published Windows-1257; its encoding of the Estonian alphabet also matches IBM-922.

ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 (Celtic), is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-8 or Celtic. It was designed to cover the Celtic languages, such as Irish, Manx, Scottish Gaelic, Welsh, Cornish, and Breton.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languages. It is also used by older versions of LDAP. While T.61 continues to be supported in modern versions of X.500 and X.509, it has been deprecated in favor of Unicode. It is also called Code page 1036, CP1036, or IBM 01036.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

The MARC-8 charset is a MARC standard used in MARC-21 library records. The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in library database systems. The character encoding now known as MARC-8 was introduced in 1968 as part of the MARC format. Originally based on the Latin alphabet, from 1979 to 1983 the JACKPHY initiative expanded the repertoire to include Japanese, Arabic, Chinese, and Hebrew characters, with the later addition of Cyrillic and Greek scripts. If a character is not representable in MARC-8 of a MARC-21 record, then UTF-8 must be used instead. UTF-8 has support for many more characters than MARC-8, which is rarely used outside library data.

The CCITT Chinese Primary Set is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992. It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex. It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165, and encodable in the ISO-2022-CN-EXT code version.

<span class="mw-page-title-main">ISO 2047</span> Published standard

ISO 2047 is a standard for graphical representation of the control characters for debugging purposes, such as may be found in the character generator of a computer terminal; it also establishes a two-letter abbreviation of each control character. The graphics and two-letter codes are essentially unchanged from the 1968 European standard ECMA-17 and the 1973 American standard ANSI X3.32-1973. It became an ISO standard in 1975. It is also standardized as GB/T 3911-1983 in China, as KS X 1010 in Korea, and was enacted in Japan as "graphical representation of information exchange capabilities for character" JIS X 0209:1976.

The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.

ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873.

<span class="mw-page-title-main">ARIB STD B24 character set</span> Character encoding and character set extensions used in Japanese broadcasting.

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26. The latest revision is version 6.3 as of 2016-07-06.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ITU-T (1994-11-11). International interworking for Videotex services. T.101:1994.
  2. 1 2 CCITT (1987-07-31). Mosaic-1 Set of Data Syntax 1 of CCITT Rec. T.101 (PDF). ITSCJ/IPSJ. ISO-IR-137.
  3. 1 2 CCITT (1983-10-01). Second Supplementary Set of Mosaic Characters (PDF). ITSCJ/IPSJ. ISO-IR-71.
  4. "2.9 Synopsis table". International Register of Coded Character Sets To Be Used With Escape Sequences (PDF). ITSCJ/IPSJ. p. 22. ISO-IR. Bit combination 5/7 of table 3 will not be allocated in order to avoid problems with an earlier usage by CCITT.
  5. 1 2 CCITT (1988-11-01). Supplementary Set of Graphic Characters for Videotex (PDF). ITSCJ/IPSJ. ISO-IR-70.
  6. See Table C.9 in Annex C part 1 of T.101. [1] Caveat: the table itself is displayed in the PDF with severe mojibake (hence why the displayed table does not appear to correspond to the associated notes), and is supposed to look like ISO-IR-70 [5] (besides the additional highlighted umlaut code).
  7. CCITT (1986-11-30). Supplementary Set of Graphic Characters for CCITT Recommendation T.101, Data Syntax III (PDF). ITSCJ/IPSJ. ISO-IR-128.
  8. 1 2 3 4 5 6 7 8 9 10 11 12 13 CCITT (1987-07-31). Primary Control Set of Data Syntax I of CCITT Rec. T.101 (PDF). ITSCJ/IPSJ. ISO-IR-132.
  9. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 CCITT (1987-07-31). Primary Control Set of Data Syntax II of CCITT Rec. T.101 (PDF). ITSCJ/IPSJ. ISO-IR-134.
  10. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 CCITT (1987-07-31). Primary Control Set of Data Syntax III of CCITT Rec. T.101 (PDF). ITSCJ/IPSJ. ISO-IR-135.
  11. CCITT (1987-07-31). Supplementary Control Set of Data Syntax I of CCITT Rec. T.101 (PDF). ITSCJ/IPSJ. ISO-IR-133.
  12. CCITT (1983-10-01). Attribute Control Set for Videotex (PDF). ITSCJ/IPSJ. ISO-IR-73.
  13. BSI (1982-06-01). Attribute Control Set for UK Videotex (PDF). ITSCJ/IPSJ. ISO-IR-56.
  14. CCITT (1987-07-31). The Supplementary Control Set of Data Syntax III of CCITT Rec. T.101 (PDF). ITSCJ/IPSJ. ISO-IR-136.