Teletext character set

Last updated

This article covers technical details of the character encoding system defined by ETS 300 706 of the ETSI, a standard for World System Teletext, and used for the Viewdata and Teletext variants of Videotex in Europe.

Contents

Character sets

The following tables show various Teletext character sets. Each character is shown with a potential Unicode equivalent if available. Space and control characters are represented by the abbreviations for their names.

Control characters

Control characters are used to set foreground and background color (black, red, green, yellow, blue, magenta, cyan, white, flash), character height (normal, double width, double height, double), current default character set, and other attributes. [1] [2]

In formats where compatibility with ECMA-48's C0 control codes such as TAB and LF is not required, these control codes are sometimes mapped transparently to the Unicode C0 control code range (U+0000 through U+001F). [3] Amongst C1 control code sets, the ITU T.101 C1 control codes for "Serial" Data Syntax 2, [4] are mostly a transposition of the Teletext spacing controls, except for the inclusion of CSI at 0x9B.

Teletext spacing attributes [2]
0123456789ABCDEF
0x ABK ANR ANG ANY ANB ANM ANC ANW FSH STD EBX SBX NSZ DBH DBW DBS
1x MBK MSR MSG MSY MSB MSM MSC MSW CDY SPL [lower-alpha 1] STL [lower-alpha 2] ESC [lower-alpha 3] BBD NBD HMS RMS
  1. The ETS 300 706 name of this control code is "Contiguous Mosaic Graphics", and it switches mosaic characters to contiguous (connected) display. [2] Its other name of "Stop Lining" arises from other formats, as well as using it to switch mosaic characters to connected display, also using it to switch alphanumeric characters to non-underlined display. [4]
  2. The ETS 300 706 name of this control code is "Separated Mosaic Graphics", and it switches mosaic characters to separated display. [2] Its other name of "Start Lining" arises from other formats, as well as using it to switch mosaic characters to separated display, also using it to switch alphanumeric characters to underlined display. [4]
  3. ESC is also given the alternative name of "Switch" by ETS 300 706. It is used in certain contexts as a toggle between two G0 sets previously designated by dedicated packets. [2]

Latin

G0

Teletext (Latin G0) [5] [6]
0123456789ABCDEF
2x  SP   ! " # ¤ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
  National option subset (see table below)


Latin G0 national option subsets [7]
2324405B5C5D5E5F607B7C7D7E
Primary set # ¤ @ [ \ ] ^ _ ` { | } ~
Czech/Slovak # ů č ť ž ý í ř é á ě ú š
English £ $ @ ½ #/ ¼ ¾ ÷
Estonian # õ Š Ä Ö Ž Ü Õ š ä ö ž ü
French é ï à ë ê ù î # è â ô û ç
German # $ § Ä Ö Ü ^ _ ° ä ö ü ß
Italian £ $ é ° ç # ù à ò è ì
Latvian/Lithuanian # $ Š ė ę Ž č ū š ą ų ž į
Polish # ń ą Ƶ Ś Ł ć ó ę ż ś ł ź
Portuguese/Spanish ç $ ¡ á é í ó ú ¿ ü ñ è à
Romanian # ¤ Ţ/Ț Â Ş/Ș Ă Î ı ţ/ț â ş/ș ă î
Serbian/Croatian/Slovenian # Ë Č Ć Ž Đ Š ë č ć ž đ š
Swedish/Finnish/Hungarian # ¤ É Ä Ö Å Ü _ é ä ö å ü
Turkish ğ İ Ş Ö Ç Ü Ğ ı ş ö ç ü

G2

Teletext (Latin G2) [8] [9]
0123456789ABCDEF
2x  SP   ¡ ¢ £ $ ¥ # § ¤ «
3x ° ± ² ³ × µ · ÷ » ¼ ½ ¾ ¿
4x NBSP ̀ ́ ̂ ̃ ̄ ̆ ̇ ̈ ̣ ̊ ̧ ̲ ̋ ̨ ̌
5x ¹ ® © α
6x Ω Æ Ð ª Ħ IJ Ŀ Ł Ø Œ º Þ Ŧ Ŋ ʼn
7x ĸ æ đ ð ħ ı ij ŀ ł ø œ ß þ ŧ ŋ
  Diacritical marks for use with G0 characters

Greek

G0

Teletext (Greek G0) [10]
0123456789ABCDEF
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; « = » ?
4x ΐ Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο
5x Π Ρ ʹ Σ Τ Υ Φ Χ Ψ Ω Ϊ Ϋ ά έ ή ί
6x ΰ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο
7x π ρ ς σ τ υ φ χ ψ ω ϊ ϋ ό ύ ώ

G2

Teletext (Greek G2) [11]
0123456789ABCDEF
2x  SP   a b £ e h i § : k
3x ° ± ² ³ × m n p ÷ t ¼ ½ ¾ x
4x ̀ ́ ̂ ̃ ̄ ̆ ̇ ̈ ̣ ̊ ̧ ̲ ̋ ̨ ̌
5x ? ¹ ® © ɑ Ί Ύ Ώ
6x C D F G J L Q R S U V W Y Z Ά Ή
7x c d f g j l q r s u v w y z Έ
  Diacritical marks for use with G0 characters

Cyrillic

G0

Teletext (Cyrillic G0, Russian/Bulgarian) [12]
0123456789ABCDEF
2x  SP   ! " # $ % ы ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x Ю А Б Ц Д Е Ф Г Х И Ѝ К Л М Н О
5x П Я Р С Т У Ж В Ь Ъ З Ш Э Щ Ч Ы
6x ю а б ц д е ф г х и ѝ к л м н о
7x п я р с т у ж в ь ъ з ш э щ ч
Teletext (Cyrillic G0, Serbian/Croatian) [13]
0123456789ABCDEF
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x Ч А Б Ц Д Е Ф Г Х И Ј К Л М Н О
5x П Ќ Р С Т У В Ѓ Љ Њ З Ћ Ж Ђ Ш Џ
6x ч а б ц д е ф г х и ј к л м н о
7x п ќ р с т у в ѓ љ њ з ћ ж ђ ш
Teletext (Cyrillic G0, Ukrainian) [14]
0123456789ABCDEF
2x  SP   ! " # $ % ї ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x Ю А Б Ц Д Е Ф Г Х И Ѝ К Л М Н О
5x П Я Р С Т У Ж В Ь І З Ш Є Щ Ч Ї
6x ю а б ц д е ф г х и ѝ к л м н о
7x п я р с т у ж в ь і з ш є щ ч

G2

Teletext (Cyrillic G2) [15]
0123456789ABCDEF
2x  SP   ¡ ¢ £ $ ¥ § «
3x ° ± ² ³ × µ · ÷ » ¼ ½ ¾ ¿
4x ̀ ́ ̂ ̃ ̄ ̆ ̇ ̈ ̣ ̊ ̧ ̲ ̋ ̨ ̌
5x ¹ ® © α Ł ł β
6x D E F G I J K L N Q R S U V W Z
7x d e f g i j k l n q r s u v w z
  Diacritical marks for use with G0 characters

Arabic

Note that each Arabic contextual/positional character in the tables below is shown with the non-positional Unicode equivalent if available.

G0

Teletext (Arabic G0) [16]
0123456789ABCDEF
2x  SP   ! " £ $ % ) ( * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ؛ > = < ؟
4x
5x ﺿ #
6x ـ
7x

G2

Teletext (Arabic G2) [17]
0123456789ABCDEF
2x  SP  
3x ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩
4x à A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z ë ê ù î
6x é a b c d e f g h i j k l m n o
7x p q r s t u v w x y z â ô û ç


Hebrew

Teletext (Hebrew G0) [18]
0123456789ABCDEF
2x  SP   ! " £ $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z ½ #
6x א ב ג ד ה ו ז ח ט י ך כ ל ם מ ן
7x נ ס ע ף פ ץ צ ק ר ש ת ¾ ÷

Graphics character sets

G1 block mosaics

Teletext (G1) [19] [20]
0123456789ABCDEF
2x  SP  🬀🬁🬂🬃🬄🬅🬆🬇🬈🬉🬊🬋🬌🬍🬎
3x🬏🬐🬑🬒🬓🬔🬕🬖🬗🬘🬙🬚🬛🬜🬝
4x
5x
6x🬞🬟🬠🬡🬢🬣🬤🬥🬦🬧🬨🬩🬪🬫🬬
7x🬭🬮🬯🬰🬱🬲🬳🬴🬵🬶🬷🬸🬹🬺🬻

    Same table as above, rendered with bitmaps:

    0123456789ABCDEF
    2 TRS-80 character 0x80.png TRS-80 character 0x81.png TRS-80 character 0x82.png TRS-80 character 0x83.png TRS-80 character 0x84.png TRS-80 character 0x85.png TRS-80 character 0x86.png TRS-80 character 0x87.png TRS-80 character 0x88.png TRS-80 character 0x89.png TRS-80 character 0x8A.png TRS-80 character 0x8B.png TRS-80 character 0x8C.png TRS-80 character 0x8D.png TRS-80 character 0x8E.png TRS-80 character 0x8F.png
    3 TRS-80 character 0x90.png TRS-80 character 0x91.png TRS-80 character 0x92.png TRS-80 character 0x93.png TRS-80 character 0x94.png TRS-80 character 0x95.png TRS-80 character 0x96.png TRS-80 character 0x97.png TRS-80 character 0x98.png TRS-80 character 0x99.png TRS-80 character 0x9A.png TRS-80 character 0x9B.png TRS-80 character 0x9C.png TRS-80 character 0x9D.png TRS-80 character 0x9E.png TRS-80 character 0x9F.png
    6 TRS-80 character 0xA0.png TRS-80 character 0xA1.png TRS-80 character 0xA2.png TRS-80 character 0xA3.png TRS-80 character 0xA4.png TRS-80 character 0xA5.png TRS-80 character 0xA6.png TRS-80 character 0xA7.png TRS-80 character 0xA8.png TRS-80 character 0xA9.png TRS-80 character 0xAA.png TRS-80 character 0xAB.png TRS-80 character 0xAC.png TRS-80 character 0xAD.png TRS-80 character 0xAE.png TRS-80 character 0xAF.png
    7 TRS-80 character 0xB0.png TRS-80 character 0xB1.png TRS-80 character 0xB2.png TRS-80 character 0xB3.png TRS-80 character 0xB4.png TRS-80 character 0xB5.png TRS-80 character 0xB6.png TRS-80 character 0xB7.png TRS-80 character 0xB8.png TRS-80 character 0xB9.png TRS-80 character 0xBA.png TRS-80 character 0xBB.png TRS-80 character 0xBC.png TRS-80 character 0xBD.png TRS-80 character 0xBE.png TRS-80 character 0xBF.png

    G3 smooth mosaics and line drawing

    Teletext (G3) [21] [22]
    0123456789ABCDEF
    2x🬼🬽🬾🬿🭀🭁🭂🭃🭄🭅🭆🭨🭩🭰
    3x🭇🭈🭉🭊🭋🭌🭍🭎🭏🭐🭑🭪🭫🭵
    4x🮤🮥🮦🮧🮠🮡🮢🮣
    5xNBSP
    6x🭒🭓🭔🭕🭖🭗🭘🭙🭚🭛🭜🭬🭭
    7x🭝🭞🭟🭠🭡🭢🭣🭤🭥🭦🭧🭮🭯

    Related Research Articles

    ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

    <span class="mw-page-title-main">Programme Delivery Control</span> Television standard to indicate start and end of programmes

    Programme delivery control (PDC) is specified by the standard ETS 300 231, published by the European Telecommunications Standards Institute (ETSI). This specifies the signals sent as hidden codes in the teletext service, indicating when transmission of a programme starts and finishes.

    <span class="mw-page-title-main">PETSCII</span> Character encoding on Commodore computers

    PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines' 8-bit home computers.

    ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

    <span class="mw-page-title-main">ATASCII</span> Character encoding used by the Atari 8-bit family of home computers

    The ATASCII character set, from ATARI Standard Code for Information Interchange, alternatively ATARI ASCII, is a character encoding used in the Atari 8-bit family of home computers. ATASCII is based on ASCII, but is not fully compatible with it.

    <span class="mw-page-title-main">Michael Everson</span> American-Irish type designer (born 1963)

    Michael Everson is an American and Irish linguist, script encoder, typesetter, type designer and publisher. He runs a publishing company called Evertype, through which he has published over a hundred books since 2006.

    CEPT Recommendation T/CD 06-01 was a standard set in 1981 by the European Conference of Postal and Telecommunications Administrations (CEPT) for the display of Videotex; specifically, for the Videotex Presentation Layer Data Syntax. It was revised a number of times in the 1980s, and also later redesignated as recommendation T/TE 06-01.

    <span class="mw-page-title-main">Box-drawing character</span> Unicode block group

    Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. These characters are characterized by being designed to be connected horizontally and/or vertically with adjacent characters, which requires proper alignment. Box-drawing characters therefore typically only work well with monospaced fonts.

    A code point, codepoint or code position is a unique position in a quantized n-dimensional space that has been assigned a semantic meaning.

    The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

    T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

    <span class="mw-page-title-main">World System Teletext</span> Teletext standard

    World System Teletext (WST) is the name of a standard for encoding and displaying teletext information, which is used as the standard for teletext throughout Europe today. It was adopted into the international standard CCIR 653 of 1986 as CCIR Teletext System B.

    YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

    <span class="mw-page-title-main">Semigraphics</span> Method used in early text mode video hardware to emulate raster graphics

    Text-based semigraphics, pseudographics, or character graphics is a primitive method used in early text mode video hardware to emulate raster graphics without having to implement the logic for such a display mode.

    <span class="mw-page-title-main">Atari ST character set</span> Character set of the Atari ST personal computer family

    The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC.

    The Acorn RISC OS character set was used in the Acorn Archimedes series and subsequent computers from 1987 onwards. It is an extension of ISO/IEC 8859-1, similar to the Windows CP1252 in that many of the added characters are typographical punctuation marks.

    MSX character sets are a group of single- and double-byte character sets developed by Microsoft for MSX computers. They are based on code page 437.

    The TRS-80 computer manufacturered by Tandy / Radio Shack contains an 8-bit character set. It is partially derived from ASCII, and shares the code points from 32 - 95 on the standard model. Code points 96 - 127 are supported on models that have been fitted with a lower-case upgrade.

    Sharp MZ character sets are character sets made by Sharp Corporation for Sharp MZ computers. The European and Japanese versions of the software use different character sets.

    The Amstrad CPC character set is the character set used in the Amstrad CPC series of 8-bit personal computers when running BASIC. This character set existed in the built-in "lower" ROM chip. It is based on ASCII-1967, with the exception of character 0x5E which is the up arrow instead of the circumflex, as it is in ASCII-1963, a feature shared with other character sets of the time. Apart from the standard printable ASCII range (0x20-0x7e), it is completely different from the Amstrad CP/M Plus character set. The BASIC character set had symbols of particular use in games and home computing, while the CP/M Plus character reflected the International and Business flavor of the CP/M Plus environment. This character set is represented in Unicode as of the March 2020 release of Unicode 13.0, which added symbols for legacy computing.

    References

    1. "4. Teletext and Minitel" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS, 2019-01-04, p. 2
    2. 1 2 3 4 5 "12.2 Spacing attributes" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, pp. 76–80, retrieved 4 April 2020
    3. Ewell, Doug (2020-10-16). "Teletext separated mosaic graphics". Unicode Mailing List Archive. Unicode Consortium.
    4. 1 2 3 British Standards Institution (1982-06-01). Attribute Control Set for UK Videotex (PDF). ITSCJ/IPSJ. ISO-IR-56.
    5. "15.6.1, 15.6.2" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, pp. 114–115, retrieved 4 April 2020
    6. "TELTXTG0.TXT" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS, 2019-01-04
    7. "15.6.2 Latin National Option Sub-Sets" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 115, retrieved 4 April 2020
    8. "15.6.3 Latin G2 Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 116, retrieved 4 April 2020
    9. "TELTXTG2.TXT" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS, 2019-01-04
    10. "15.6.8 Greek G0 Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 121, retrieved 4 April 2020
    11. "15.6.9 Greek G2 Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 122, retrieved 4 April 2020
    12. "15.6.5 Cyrillic G0 Set - Option 2 - Russian/Bulgarian" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 118, retrieved 4 April 2020
    13. "15.6.4 Cyrillic G0 Set - Option 1 - Serbian/Croatian" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 117, retrieved 4 April 2020
    14. "15.6.6 Cyrillic G0 Set - Option 3 - Ukrainian" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 119, retrieved 4 April 2020
    15. "15.6.7 Cyrillic G2 Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 120, retrieved 4 April 2020
    16. "15.6.10 Arabic G0 Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 123, retrieved 4 April 2020
    17. "15.6.11 Arabic G2 Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 124, retrieved 4 April 2020
    18. "15.6.12 Hebrew G0 Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 125, retrieved 4 April 2020
    19. "15.7.1 G1 Block Mosaics Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 126, retrieved 4 April 2020
    20. "TELTXTG1.TXT" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS, 2019-01-04
    21. "15.7.2 G3 Smooth Mosaics and Line Drawing Set" (PDF), Enhanced Teletext specification, European Telecommunications Standards Institute (ETSI), May 1997, p. 127, retrieved 4 April 2020
    22. "TELTXTG3.TXT" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS, 2019-01-04