VT100 encoding

Last updated
VT100
Language(s) English, various others
Classification Extended ASCII, Mac OS script
Extends ASCII

The VT100 code page is a character encoding used to represent text on the Classic Mac OS for compatibility with the VT100 terminal. It encodes 256 characters, the first 128 of which are identical to ASCII, with the remaining characters including mathematical symbols, diacritics, and additional punctuation marks. It is suitable for English and several other Western languages. It is similar to Mac OS Roman but includes all characters in ISO 8859-1 except for the currency sign (which was superseded by the euro sign), the no-break space, and the soft hyphen. It also includes all characters in DEC Special Graphics (code page 1090), except for the new line and no-break space controls. The VT100 encoding is only used on the VT100 font on the Classic Mac OS and is not an official Mac OS character encoding. [1]

Codepage layout

The following table shows how characters are encoded in the VT100 character set. Each character is shown with its Unicode equivalent.

VT100
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL   BS    HT    LF    VT    FF    CR    SO    SI   
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN   EM   SUB ESC   FS    GS    RS    US  
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x Ä Å Ç É Ñ Ö Ü á à â ä ã å ç é è
9x ê ë í ì î ï ñ ó ò ô ö õ ú ù û ü
Ax Ý ° ¢ £ § ¸ ß ® © ´ ¨ Æ Ø
Bx × ± ¥ µ ¹ ² ³ π ¦ ª º æ ø
Cx ¿ ¡ ¬ ½ ƒ ¼ ¾ « » [lower-alpha 2] À Ã Õ Œ œ
Dx ÷ ÿ Ÿ [lower-alpha 3] Ð ð Þ þ
Ex ý · [lower-alpha 4] Â Ê Á Ë È Í Î Ï Ì Ó Ô
Fx Ò Ú Û Ù
  1. 1 2 3 4 5 The codes 0xA2, 0xA3, 0xA9, 0xB1, and 0xB5 coincidentally have the same character assignment as ISO 8859-1 (and thus Unicode).
  2. The character 0xCA is the replacement character, which is displayed as a reversed question mark in this encoding.
  3. Before Mac OS 8.5, the character 0xDB mapped to currency sign (¤), Unicode character U+00A4.
  4. The character 0xE4 is the horizontal scan line-5, which is unified with U+2500 in Unicode.

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

While Hypertext Markup Language (HTML) has been in use since 1991, HTML 4.0 from December 1997 was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII, two goals are worth considering: the information's integrity, and universal browser display.

<span class="mw-page-title-main">ISO/IEC 8859-1</span> Character encoding

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file (EOF) marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. Most text files need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

<span class="mw-page-title-main">Windows-1252</span> Character encoding

Windows-1252 or CP-1252 is a single-byte character encoding of the Latin alphabet that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa.

ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic.

ISO/IEC 8859-7:2003, Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Greek. It was designed to cover the modern Greek language. The original 1987 version of the standard had the same character assignments as the Greek national standard ELOT 928, published in 1986. The table in this article shows the updated 2003 version which adds three characters. Microsoft has assigned code page 28597 a.k.a. Windows-28597 to ISO-8859-7 in Windows. IBM has assigned code page 813 to ISO 8859-7. (IBM CCSID 813 is the original encoding. CCSID 4909 adds the euro sign. CCSID 9005 further adds the drachma sign and ypogegrammeni.)

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).

<span class="mw-page-title-main">Soft hyphen</span> Unicode character

In computing and typesetting, a soft hyphen or syllable hyphen, abbreviated SHY, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line.

The degree symbol or degree sign, °, is a glyph or symbol that is used, among other things, to represent degrees of arc, hours, degrees of temperature or alcohol proof. The symbol consists of a small superscript circle.

Windows code page 1253, commonly known by its IANA-registered name Windows-1253 or abbreviated as cp1253, is a Microsoft Windows code page used to write modern Greek. It is not capable of supporting the older polytonic Greek.

The currency sign¤ is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly above the baseline. The character is sometimes called scarab.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

In mathematics, the radical symbol, radical sign, root symbol, radix, or surd is a symbol for the square root or higher-order root of a number. The square root of a number x is written as

Mac OS Croatian is a character encoding used on Apple Macintosh computers to represent Gaj's Latin alphabet. It is a derivative of Mac OS Roman. The three digraphs, Dž, Lj, and Nj, are not encoded.

Mac OS Celtic is a character encoding used by Mac OS to represent Welsh text, replacing 14 of the Mac OS Roman characters with Welsh characters. This character set was developed by Michael Everson and was used for the Irish localizations of Mac OS 6.0.8 and 7.1 and for the Welsh localization of Mac OS 7.1.

Mac OS Gaelic is a character encoding created for the Irish Gaelic language, based on the Welsh Mac OS Celtic encoding but replacing 23 characters with Gaelic characters. It was developed by Michael Everson, and was in his CeltScript fonts and on some fonts included with the Irish localization of Mac OS 6.0.8 and 7.1 and on.

Mac OS Maltese/Esperanto, called MacOS Esperanto in older sources, is a character encoding for Esperanto, Maltese and Turkish created by Michael Everson on August 15 1997, based on the Mac OS Turkish encoding. It is used in his fonts, but not on official Mac OS fonts.

References

  1. "Older Character Sets". whitefiles.org. Retrieved October 3, 2019.