Atari ST character set

Last updated
The Atari ST character set as rendered in the 8x16 high-resolution system font. Atari ST character set 8x16.png
The Atari ST character set as rendered in the 8×16 high-resolution system font.
The 8x8 low- and medium resolution system font. Atari ST character set 8x8.png
The 8×8 low- and medium resolution system font.

The Atari ST character set [1] is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC.

Contents

Like codepage 437, it aligns with ASCII codepoints 32–126, and has additional codepoints including letters with diacritics and other symbols. It differs from code page 437 in using other dingbats at code points 0–31, in exchanging the box-drawing characters 176–223 for the Hebrew alphabet and other symbols, and exchanging code points 158, 236 and 254–255 with the symbols for sharp S, line integral, cubed and macron.

The Atari ST family of computers contained this font stored in ROM in three sizes; as an 8×16 pixels-per-character font used in the high-resolution graphics modes, as an 8×8 pixels-per-character font used in the low- and medium-resolution graphics modes, and as a 6×6 pixels-per-character font used for icon labels in any graphics mode. [1]

All 256 codes were assigned a graphical character in ROM, including the codes from 0 to 31 that in ASCII were reserved for non-graphical control characters.

Character set

The following table shows the Atari ST character set. Each character is shown with a potential Unicode equivalent if available. Differences from code page 437 are shown boxed.

Although the ROM provides a graphic for all 256 different possible 8-bit codes, some APIs will not print some of these code points, in particular the range 0–31 and the code at 127. Instead they will interpret them as control characters.

Atari ST character set [2] [3] [4] [5] [6] [7] [8]
0123456789ABCDEF
0x NUL [lower-alpha 1] 🮽🮾🮿 🕒 🔔 Atari ST character 0x0E.png [lower-alpha 2] Atari ST character 0x0F.png [lower-alpha 2]
1x🯰🯱🯲🯳🯴🯵🯶🯷🯸🯹 ə Atari ST character 0x1C.png [lower-alpha 3] Atari ST character 0x1D.png [lower-alpha 3] Atari ST character 0x1E.png [lower-alpha 3] Atari ST character 0x1F.png [lower-alpha 3]
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
8x Ç ü é â ä à å ç ê ë è ï î ì Ä Å
9x É æ Æ ô ö ò û ù ÿ Ö Ü ¢ £ ¥ ß [lower-alpha 4] ƒ
Ax á í ó ú ñ Ñ ª º ¿ ¬ ½ ¼ ¡ « »
Bx ã õ Ø ø œ Œ À Ã Õ ¨ ´ © ®
Cx ij IJ א ב ג ד ה ו ז ח ט י כ ל מ נ
Dx ס ע פ צ ק ר ש ת ן ך ם ף ץ §
Ex α β [lower-alpha 4] Γ π [lower-alpha 5] Σ [lower-alpha 6] σ µ [lower-alpha 7] τ Φ Θ Ω [lower-alpha 8] δ [lower-alpha 9] [lower-alpha 10] ϕ [lower-alpha 11] [lower-alpha 12]
Fx ± ÷ ° · ² ³ ¯ [lower-alpha 13]
  Differences from code page 437
  1. Actually a blank space but used as C string terminator.
  2. 1 2 14–15 (0Ehex–0Fhex) are 2 pieces that form an Atari "Fuji" logo, sometimes used together as an alternative to the title "Desk" for the leftmost menu in Atari ST software. They are not proposed for Unicode. [9]
  3. 1 2 3 4 28–31 (1Chex–1Fhex) are 4 pieces that form the image of J. R. "Bob" Dobbs from the satirical Church of the SubGenius, a rarely used Easter egg. They are not proposed for Unicode. [9]
  4. 1 2 Codepoint 158 (9Ehex) is the German sharp S (U+00DF, ß) produced by a German Atari ST keyboard's ß key. Codepoint 225 (E1hex) is the Greek lowercase beta (U+03B2, β) homoglyph. Code page 437 uses codepoint 225 to represent both characters; the Unicode Consortium's code page 437 mapping recommends mapping codepoint 225 to sharp S (U+00DF)—presumably based on its more frequent use as the sharp S, despite its surrounding code points being Greek characters. [10]
  5. 227 (E3hex) is the Greek lowercase pi (U+03C0, π), but early code page 437 fonts such as Terminal use a variant of pi that is ambiguous in case, and therefore can be used for the Greek capital pi (U+03A0, Π) or the n-ary product sign (U+220F, ∏).
  6. 228 (E4hex) is both the n-ary summation sign (U+2211, ∑) and the Greek uppercase sigma (U+03A3, Σ).
  7. 230 (E6hex) is both the micro sign (U+00B5, µ) and the Greek lowercase mu (U+03BC, μ).
  8. 234 (EAhex) is both the ohm sign (U+2126, Ω) and the Greek uppercase omega (U+03A9, Ω). (Unicode considers the ohm sign to be equivalent to uppercase omega, and suggests that the latter be used in both contexts. [11] )
  9. 235 (EBhex) is the Greek lowercase delta (U+03B4, δ), but it has also been used as a surrogate for the Icelandic lowercase eth (U+00F0, ð) and the partial derivative sign (U+2202, ∂).
  10. 236 (EChex) is used for the symbol on the Atari ST while code page 437 uses it for the symbol, which in turn the Atari ST places at 223.
  11. 237 (EDhex) is both used as the empty set sign (U+2205, ∅), the Greek lowercase phi, the Greek phi symbol in italics (U+03D5, Φ) to name angles, and the diameter sign (U+2300, ⌀).
  12. 238 (EEhex) is both used as the Greek lowercase epsilon (U+03B5, ε) and the element-of sign (U+2208, ∈). Later it was often used for the euro sign (U+20AC, €).
  13. Used as non-breaking space by much MSDOS software.

Alt codes

Using Alt codes, users can enter a character by holding down the Alt key and entering the three-digit decimal code point on the Numpad. This provides a way to enter special characters not provided directly on the keyboard. [2]

Euro variants

The Atari ST character set long predates the introduction of the euro currency and thus does not provide a code point for the euro sign (U+20AC, €). However, some software (such as Calamus) utilizes code point 238 (0xEE) for this purpose. [12] [13] This code point is normally assigned to the mathematical element-of sign (U+2208, ∈), and to the Greek lowercase epsilon (U+03B5, ε) in code page 437. Alternatively, the rarely used logical conjunction sign (U+2227, ∧) at code point 222 (0xDE) could be replaced by the euro sign. [14]

See also

Related Research Articles

Epsilon is the fifth letter of the Greek alphabet, corresponding phonetically to a mid front unrounded vowel IPA:[e̞] or IPA:[ɛ̝]. In the system of Greek numerals it also has the value five. It was derived from the Phoenician letter He . Letters that arose from epsilon include the Roman E, Ë and Ɛ, and Cyrillic Е, È, Ё, Є and Э.

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

<span class="mw-page-title-main">Windows-1252</span> Windows character set for Latin alphabet

Windows-1252 or CP-1252 is a single-byte character encoding of the Latin alphabet that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa.

<span class="mw-page-title-main">Pound sign</span> Currency sign

The pound sign is the symbol for the pound unit of sterling – the currency of the United Kingdom and its associated Crown Dependencies and British Overseas Territories and previously of Great Britain and of the Kingdom of England. The same symbol is used for other currencies called pound, such as the Egyptian and Syrian pounds. The sign may be drawn with one or two bars depending on personal preference, but the Bank of England has used the one-bar style exclusively on banknotes since 1975.

<span class="mw-page-title-main">PETSCII</span> Character encoding on Commodore computers

PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines' 8-bit home computers.

VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128 precomposed characters. Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data, but legacy VSCII and VISCII files may need conversion.

GB/T 2312-1980 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB refers to the Guobiao standards (国家标准), whereas the T suffix denotes a non-mandatory standard.

KOI (КОИ) is a family of several code pages for the Cyrillic script. The name stands for Kod obmena informatsiey which means "Code for Information Interchange".

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

<span class="mw-page-title-main">Code page 850</span> Computer character set for Latin scripts

Code page 850 is a code page used under DOS operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 is the primary code page and default OEM code page in many countries, including various English-speaking locales, whilst other English-speaking locales default to the hardware code page 437.

<span class="mw-page-title-main">Code page 437</span> Character set of the original IBM PC

Code page 437 is the character set of the original IBM PC. It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII".

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

MIK (МИК) is an 8-bit Cyrillic code page used with DOS. It is based on the character set used in the Bulgarian Pravetz 16 IBM PC compatible system. Kermit calls this character set "BULGARIA-PC" / "bulgaria-pc". In Bulgaria, it was sometimes incorrectly referred to as code page 856. This code page is known by FreeDOS as Code page 3021.

Symbol is one of the four standard fonts available on all PostScript-based printers, starting with Apple's original LaserWriter (1985). It contains a complete unaccented Greek alphabet and a selection of commonly used mathematical symbols. Insofar as it fits into any standard classification, it is a serif font designed in the style of Times New Roman.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

CWI-2 is a Hungarian code page frequently used in the 1980s and early 1990s. If this code page is erroneously interpreted as code page 437, it will still be fairly readable.

The GEM character set is the character set of Digital Research's graphical user interface GEM on Intel platforms. It is based on code page 437, the original character set of the IBM PC.

The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.

VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.

References

  1. 1 2 Feagans, John (May 1986). "How do Europeans access special characters in the Atari ST character set? What is the 6x6 font used for?" (PDF). Atari ST Developers Question and Answer Bulletin. Sunnyvale, CA, USA: Atari Corp. Archived from the original (PDF) on 2017-02-19. Retrieved 2017-02-19.
  2. 1 2 "The Atari character set". Atari Wiki. Archived from the original on 2017-01-16. Retrieved 2017-01-16.
  3. Bettencourt, Rebecca G. (2016-08-01). "Character Encodings - Legacy Encodings - Atari ST". Kreative Korporation. Retrieved 2016-08-09.
  4. Kostis, Kosta; Lehmann, Alexander. "Atari ST/TT Character Encoding". 1.56. Kostis Netzwerkberatung. Archived from the original on 2017-01-16. Retrieved 2017-01-16.
  5. "Codepages / Ascii Table Atari ST/TT Character Encoding". ASCII.ca. 2016 [2006]. Archived from the original on 2017-01-16. Retrieved 2017-01-16.
  6. "ATARISTV.TXT" (PDF), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS, 2019-01-04
  7. Verdy, Philippe; Haible, Bruno (2015-10-08) [1998]. "AtariST to Unicode". 1.3. Retrieved 2023-11-29.
  8. Flohr, Guido (2016) [2006]. "Locale::RecodeData::ATARI_ST - Conversion routines for ATARI-ST". CPAN libintl-perl. 1.1. Archived from the original on 2017-01-14. Retrieved 2017-01-14.
  9. 1 2 "7. Characters not proposed", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF), 2019-01-04
  10. "cp437_DOSLatinUS to Unicode table" (TXT). The Unicode Consortium. Retrieved 2011-11-14.
  11. The Unicode Consortium, The Unicode Standard 4.0, Chapter 7, "European Alphabetic Scripts", p176. PDF version
  12. Dunkel, Ulf (July 1999). "Calamus (2)". ST Computer (in German). Retrieved 2017-01-16.
  13. Hädrich, Johannes (2002-12-14). "Calamus: RTF 3.0 mit grossem Qualitaetssprung" (in German). Archived from the original on 2017-01-16. Retrieved 2017-01-16.
  14. Flohr, Guido (2016) [2006]. "Locale::RecodeData::ATARI_ST_EURO - Conversion routines for ATARI-ST-EURO". CPAN libintl-perl. 1.1. Archived from the original on 2017-01-14. Retrieved 2017-01-14.