Code page 437

Last updated

Code page 437
Codepage-437.png
Code page 437, as rendered by an IBM PC using standard VGA
MIME / IANAIBM437
Alias(es)cp437, 437, csPC8CodePage437, [1] OEM-US
Language(s) English, German, Swedish
Classification Extended ASCII, OEM code page
Extends US-ASCII
Other related encoding(s) Code page 850, CWI-2

Code page 437 ( CCSID 437) is the character set of the original IBM PC (personal computer). [2] It is also known as CP437, OEM-US, OEM 437, [3] PC-8, [4] or DOS Latin US. [5] The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII" [4] (one of many mutually incompatible ASCII extensions).

Contents

This character set remains the primary set in the core of any EGA and VGA-compatible graphics card. As such, text shown when a PC reboots, before fonts can be loaded and rendered, is typically rendered using this character set. [note 1] Many file formats developed at the time of the IBM PC are based on code page 437 as well.

Display adapters

The original IBM PC contained this font as a 9×14 pixels-per-character font stored in the ROM of the IBM Monochrome Display Adapter (MDA) and an 8×8 pixels-per-character font of the Color Graphics Adapter (CGA) cards.[ citation needed ] The IBM Enhanced Graphics Adapter (EGA) contained an 8×14 pixels-per-character version, and the VGA contained a 9×16 version.[ citation needed ]

All these display adapters have text modes in which each character cell contains an 8-bit character code point (see details), giving 256 possible values for graphic characters. All 256 codes were assigned a graphical character in ROM, including the codes from 0 to 31 that were reserved in ASCII for non-graphical control characters.

Various Eastern European PCs used different character sets, sometimes user-selectable via jumpers or CMOS setup. These sets were designed to match 437 as much as possible, for instance sharing the code points for many of the line-drawing characters, while still allowing text in a local language to be displayed.

Alt codes

A legacy of code page 437 is the number combinations used in Windows Alt codes. [6] [7] [8] A DOS user could enter a character by holding down the Alt key and entering the character code on the numpad [6] and many users memorized the numbers needed for CP437 (or for the similar CP850). Although Microsoft Windows used different character sets such as CP1252, the original numbers were emulated so users could continue to use them; Microsoft added the ability to type a code from the Windows character set by typing 0 before the digits. [6] [9]

Character set

The following tables show code page 437. Each character is shown with its equivalent Unicode code point (when it is not equal to the character's code). A tooltip, generally available only when one points to the immediate left of the character, shows the Unicode code point name and the decimal Alt code. See also the notes below, as there are multiple equivalent Unicode characters for some code points.

Although the ROM provides a graphic for all 256 different possible 8-bit codes, some APIs will not print some code points, in particular the range 0-31 and the code at 127. [10] Instead, they will interpret them as control characters. For instance, many methods of outputting text on the original IBM PC would interpret hex codes 07, 08, 0A, and 0D as BEL, BS, LF, and CR, respectively. Many printers were also unable to print these characters.

Code page 437 [11] [12] [13] [14]
0123456789ABCDEF
0x
0
NUL [a]
263A

263B

2665

2666

2663

2660

2022

25D8

25CB

25D9

2642

2640

266A
[b]
266B

263C
1x
16

25BA

25C4

2195

203C

00B6
§
00A7

25AC

21A8

2191

2193

2192

2190

221F

2194

25B2

25BC
2x
32
  SP   ! " # $ % & ' ( ) * + , - . /
3x
48
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x
64
@ A B C D E F G H I J K L M N O
5x
80
P Q R S T U V W X Y Z [ \ ] ^ _
6x
96
` a b c d e f g h i j k l m n o
7x
112
p q r s t u v w x y z { | [c] } ~ [d]
2302
8x
128
Ç
00C7
ü
00FC
é
00E9
â
00E2
ä
00E4
à
00E0
å
00E5
ç
00E7
ê
00EA
ë
00EB
è
00E8
ï
00EF
î
00EE
ì
00EC
Ä
00C4
Å
00C5
9x
144
É
00C9
æ
00E6
Æ
00C6
ô
00F4
ö
00F6
ò
00F2
û
00FB
ù
00F9
ÿ
00FF
Ö
00D6
Ü
00DC
¢
00A2
£
00A3
¥
00A5

20A7
ƒ
0192
Ax
160
á
00E1
í
00ED
ó
00F3
ú
00FA
ñ
00F1
Ñ
00D1
ª
00AA
º
00BA
¿
00BF

2310
¬
00AC
½
00BD
¼
00BC
¡
00A1
«
00AB
»
00BB
Bx
176

2591

2592

2593
[e]
2502

2524

2561

2562

2556

2555

2563

2551

2557

255D

255C

255B

2510
Cx
192

2514

2534

252C

251C

2500

253C

255E

255F

255A

2554

2569

2566

2560

2550

256C

2567
Dx
208

2568

2564

2565

2559

2558

2552

2553

256B

256A

2518

250C

2588

2584

258C

2590

2580
Ex
224
α
03B1
ß [f]
00DF
Γ
0393
π [g]
03C0
Σ [h]
03A3
σ
03C3
µ [i]
00B5
τ
03C4
Φ
03A6
Θ [j]
0398
Ω [k]
03A9
δ [l]
03B4

221E
φ [m]
03C6
ε [n]
03B5

2229
Fx
240

2261
±
00B1

2265

2264
[o]
2320

2321
÷
00F7

2248
°
00B0
[p]
2219
·
00B7
[q]
221A

207F
²
00B2

25A0
NBSP [r]
00A0
  Symbols and punctuation

When translating to Unicode some codes do not have a unique, single Unicode equivalent; the correct choice may depend upon context.

  1. 0 draws a blank space, but usage as the C string terminator means it is more accurately translated as NUL. In their code-page-437-based implementation of C0-region graphics, Star Micronics printers re-purpose this code as a slashed zero. [15]
  2. Mapping as shown, to the beamed quavers [U+266B, ♫], follows data provided by the Unicode Consortium. [16] In IBM's GCGID (Graphic Character Global IDentifier) system of character IDs, this is SM910000, simply annotated as "Two Musical Notes"; [12] [13] however, the reference glyph shows two beamed semiquavers [U+266C, ♬]. [12] In the specification for IBM Japanese Host code, SM910080 (i.e. SM910000 with the fullwidth attribute set) is explicitly mapped to U+266C, and accordingly shows two semiquavers. [17]
  3. 124 (7Chex) The actual glyph at this position is a broken bar [U+00A6, ¦] in the original IBM PC and compatibles font, as rendered by the original MDA. This rendering was later adopted for CGA, EGA and VGA (see image at the beginning of the article). However, almost all software assumes this code is the ASCII character [U+007C, |]; for example, programming languages use it as "or". In the early 1990s, it was clarified[ by whom? ] that there is vertical bar in ASCII at this position and that the broken bar symbol is not part of ASCII.
  4. 127 (7Fhex) is a "house" but was also sometimes used as Greek capital delta [U+0394, Δ].
  5. Could also serve as an integral extension [U+23AE, ⎮] in IBM's font.
  6. Comparison of characters in the E0 to EF range across various IBM products. Code Page 437 E0-EF Comparison.svg
    Comparison of characters in the E0 to EF range across various IBM products.
    225 (E1hex) is identified by IBM as Latin "Sharp s Small" [13] [U+00DF, ß] but is sometimes rendered in OEM fonts as Greek small beta [U+03B2, β]. The placement of this Latin character among Greek characters suggests intended multi-use.
  7. 227 (E3hex) is identified by IBM as Greek "Pi Small" [U+03C0, π] but is sometimes rendered in OEM fonts as Greek capital pi [U+03A0, Π] or the n-ary product sign [U+220F, ∏].
  8. 228 (E4hex) is identified by IBM as Greek "Sigma Capital" [U+03A3, Σ] but is also used as the n-ary summation sign [U+2211, ∑].
  9. 230 (E6hex) is identified by IBM as Greek "Mu Small" [U+03BC, μ] but is also used as the micro sign [U+00B5, µ]. IBM's Greek GCGID table [18] maps the character in this code page to the Greek letter, but the cp437_DOSLatinUS to Unicode table [11] maps it to the micro sign.
  10. 233 (E9hex) is identified by IBM as Greek "Theta Capital" [U+0398, Θ]. [12] [13] However, these symbols are for mathematics and physics, in which lowercase theta is much more commonly used (e.g. for polar coordinates).
  11. 234 (EAhex) is identified by IBM as Greek "Omega Capital" [U+03A9, Ω] but is also used as the ohm sign [U+2126, Ω]. Unicode considers the characters to be equivalent and suggests that U+03A9 be used in both contexts. [19]
  12. 235 (EBhex) is identified by IBM as Greek "Delta Small" [U+03B4, δ]. It was also unofficially used for the small eth [U+00F0, ð] and the partial derivative sign [U+2202, ∂]
  13. 237 (EDhex) is identified by IBM as Greek "Phi Small (Closed Form)" [U+03D5, ϕ; or, from the italicized math set, U+1D719, 𝜙] but, Unicode maps it to the open (or "loopy") form [U+03C6, φ] in its cp437_DOSLatinUS table. [11] Comparison of IBM's Greek GCGID table [18] with Unicode's Greek code chart [20] shows where IBM, for example, reversed the open and closed forms when mapping to Unicode. This character is also used as the empty set sign [U+2205, ∅], the diameter sign [U+2300, ⌀], and the Latin letter O with stroke [U+00D8, Ø; and U+00F8, ø].
  14. 238 (EEhex) is identified by IBM as Greek "Epsilon Small" [U+03B5, ε] but is sometimes rendered in OEM fonts as the element-of sign [U+2208, ∈]. It was often used as the euro sign [U+20AC, €]
  15. 244 (F4hex) and 245 (F5hex) are the upper and lower portion of the integral symbol (∫), and they can be extended with the character 179 (B3hex), the vertical line of the box drawing block. 244 could also be used for the long s character [U+017F, ſ].
  16. 249 (F9hex) and 250 (FAhex) are almost indistinguishable: the first is a slightly larger dot than the second, both were used as bullets, middle dot, and multiplication dot [U+2219, ∙]
  17. 251 (FBhex) was also sometimes used as a check mark [U+2713, ✓].
  18. 255 (FFhex) draws a blank space; the use as non-breaking space (NBSP) has precedent in word processors designed for the IBM PC.

History

The repertoire of code page 437 was taken from the character set of Wang word-processing machines, according to Bill Gates in an interview with Gates and Paul Allen that appeared in the 2 October 1995 edition of Fortune Magazine:

"... We were also fascinated by dedicated word processors from Wang, because we believed that general-purpose machines could do that just as well. That's why, when it came time to design the keyboard for the IBM PC, we put the funny Wang character set into the machine—you know, smiley faces and boxes and triangles and stuff. We were thinking we'd like to do a clone of Wang word-processing software someday."

According to an interview with David J. Bradley (developer of the PC's ROM-BIOS) the characters were decided upon during a four-hour meeting on a plane trip from Seattle to Atlanta by Andy Saenz (responsible for the video card), Lew Eggebrecht (chief engineer for the PC) and himself. [21]

The selection of graphic characters has some internal logic:

Most fonts for Microsoft Windows include the special graphic characters at the Unicode indexes shown, as they are part of the WGL4 set that Microsoft encourages font designers to support. (The monospaced raster font family Terminal was an early font that replicated all code page 437 characters, at least at some resolutions.) To draw these characters directly from these code points, a Microsoft Windows font called MS Linedraw [24] replicates all of the code page 437 characters, thus providing one way to display DOS text on a modern Windows machine as it was shown in DOS, with limitations. [25]

Code page 1055, also known as HP symbol set 0L, [26] is a subset which includes the box-drawing, half-blocks, black circles (the black circle replaces the bullet, which replaces the middle dot in this code page), and black square, and moves them to the upper half; the space is also included. [27]

Internationalization

Code page 437 has a series of international characters, mainly values 128 to 175 (80hex to AFhex). However, it only covers a few major Western European languages in full, including English, German and Swedish, [note 2] and so lacks several characters (mostly capital letters) important to many major Western European languages:

Along with the cent (¢), pound sterling (£) and yen/yuan (¥) currency symbols, it has a couple of former European currency symbols: the florin (ƒ, Netherlands) and the peseta (₧, Spain). The presence of the last is unusual, since the Spanish peseta was never an internationally relevant currency, and also never had a symbol of its own; it was simply abbreviated as "Pt", "Pta", "Pts", or "Ptas". Spanish models of the IBM electric typewriter, however, also had a single position devoted to it.

Later DOS character sets, such as code page 850 (DOS Latin-1), code page 852 (DOS Central-European) and code page 737 (DOS Greek), filled the gaps for international use with some compatibility with code page 437 by retaining the single and double box-drawing characters, while discarding the mixed ones (e.g. horizontal double/vertical single). All code page 437 characters have similar glyphs in Unicode and in Microsoft's WGL4 character set, and therefore are available in most fonts in Microsoft Windows, and also in the default VGA font of the Linux kernel, and the ISO 10646 fonts for X11.

See also

Notes

  1. Systems available in Eastern European, Arabic, and Asian countries often use a different set; however, these sets are designed to match 437 as much as possible. The designation "OEM", for "original equipment manufacturer", indicates that the set could be changed by the manufacturer to meet different markets.
  2. It also covers some less major Western European languages—as well as some other languages—in full, including Basque, Malay, and the pre-1999 Turkmen Latin alphabet, but this was likely unintended.

Related Research Articles

<span class="mw-page-title-main">ASCII art</span> Computer art form using text characters

ASCII art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable characters defined by the ASCII Standard from 1963 and ASCII compliant character sets with proprietary extended characters. The term is also loosely used to refer to text-based visual art in general. ASCII art can be created with any text editor, and is often used with free-form languages. Most examples of ASCII art require a fixed-width font such as Courier for presentation.

In typography, a bullet or bullet point, , is a typographical symbol or glyph used to introduce items in a list. For example:
 Item 1
 Item 2
 Item 3

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

<span class="mw-page-title-main">IBM Monochrome Display Adapter</span> IBM PC graphic adapter and display standard

The Monochrome Display Adapter is IBM's standard video display card and computer display standard for the IBM PC introduced in 1981. The MDA does not have any pixel-addressable graphics modes, only a single monochrome text mode which can display 80 columns by 25 lines of high-resolution text characters or symbols useful for drawing forms.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

<span class="mw-page-title-main">Code page 850</span> Computer character set for Latin scripts

Code page 850 is a code page used under DOS operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 is the primary code page and default OEM code page in many countries, including various English-speaking locales, whilst other English-speaking locales default to the hardware code page 437.

<span class="mw-page-title-main">Box-drawing characters</span> Characters for drawing frames and boxes

Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. These characters are characterized by being designed to be connected horizontally and/or vertically with adjacent characters, which requires proper alignment. Box-drawing characters therefore typically only work well with monospaced fonts.

The Kamenický encoding, named for the brothers Jiří and Marian Kamenický, was a code page for personal computers running DOS, very popular in Czechoslovakia around 1985–1995. Another name for this encoding is KEYBCS2, the name of the terminate-and-stay-resident utility which implemented the matching keyboard driver. It was also named KAMENICKY.

<span class="mw-page-title-main">Code page 737</span> VGA text mode code page

Code page 737 is a code page used under DOS to write the Greek language. It was much more popular than code page 869 although it lacks the letters ΐ and ΰ.

Code page 869 is a code page used under DOS to write Greek and may also be used to get Greek letters for other uses such as math. It is also called DOS Greek 2. It was designed to include all characters from ISO 8859-7.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

<span class="mw-page-title-main">JIS X 0201</span> Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. The character set was initially known as JIS C 6220 before the JIS category reform. Its two forms were a 7-bit encoding or an 8-bit encoding, although the 8-bit form was dominant until Unicode replaced it. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

On personal computers with numeric keypads that use Microsoft operating systems, such as Windows, many characters that do not have a dedicated key combination on the keyboard may nevertheless be entered using the Alt code. This is done by pressing and holding the Alt key, then typing a number on the keyboard's numeric keypad that identifies the character and then releasing Alt.

MIK (МИК) is an 8-bit Cyrillic code page used with DOS. It is based on the character set used in the Bulgarian Pravetz 16 IBM PC compatible system. Kermit calls this character set "BULGARIA-PC" / "bulgaria-pc". In Bulgaria, it was sometimes incorrectly referred to as code page 856. This code page is known by Star printers and FreeDOS as Code page 3021.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

In computing, a hardware code page (HWCP) refers to a code page supported natively by a hardware device such as a display adapter or printer. The glyphs to present the characters are stored in the alphanumeric character generator's resident read-only memory and are thus not user-changeable. They are available for use by the system without having to load any font definitions into the device first. Startup messages issued by a PC's System BIOS or displayed by an operating system before initializing its own code page switching logic and font management and before switching to graphics mode are displayed in a computer's default hardware code page.

<span class="mw-page-title-main">Atari ST character set</span> Character set of the Atari ST personal computer family

The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC.

The GEM character set is the character set of Digital Research's graphical user interface GEM on Intel platforms. It is based on code page 437, the original character set of the IBM PC.

References

  1. Character Sets, Internet Assigned Numbers Authority (IANA), 12 December 2018
  2. "CCSID 437 information document". Archived from the original on 27 March 2016.
  3. "OEM 437". Go Global Developer Center. Microsoft. Archived from the original on 9 June 2016. Retrieved 22 September 2011.
  4. 1 2 "OEM font". Encyclopedia. PCmag.com. Archived from the original on 27 November 2020. Retrieved 19 October 2021.
  5. Kano, Nadine. "Appendix H Code Pages". Globalization and Localization : Code Page 437 DOS Latin US. Microsoft. Archived from the original on 17 March 2016. Retrieved 14 November 2011.
  6. 1 2 3 "Glossary of Terms Used on this Site". Microsoft. (Please see the description about the term "Alt+Numpad"). Archived from the original on 8 September 2012. Retrieved 17 August 2018.
  7. Murray Sargent. "Entering Unicode Characters – Murray Sargent: Math in Office" . Retrieved 17 August 2018.
  8. "ALT+NUMPAD ASCII Key Combos: The α and Ω of Creating Obscure Passwords" . Retrieved 17 August 2018.
  9. "Insert ASCII or Unicode Latin-based symbols and characters - Office Support". Microsoft . Retrieved 17 August 2018.
  10. "SBCS code page information document CPGID 00437". Coded character sets and related resources. IBM. 1986 [1984-05-01]. Archived from the original on 9 June 2016. Retrieved 14 November 2011.
  11. 1 2 3 Steele, Shawn (24 April 1996). "cp437_DOSLatinUS to Unicode table" (TXT). 2.00. Unicode Consortium. Archived from the original on 9 June 2016. Retrieved 14 November 2011.
  12. 1 2 3 4 Code Page CPGID 00437 (PDF), IBM
  13. 1 2 3 4 "Code Page (CPGID): 00437". Coded character sets and related resources. IBM. 1984. Retrieved 3 August 2023.
  14. International Components for Unicode (ICU), ibm-437_P100-1995.ucm, 3 December 2002
  15. "Appendix D: Character Sets (§ IBM Special Character Set)" (PDF). User's Manual: LC-8021 Dot Matrix Printer. Star Micronics. 1997. p. 55. Archived (PDF) from the original on 8 September 2004. Retrieved 25 April 2024.
  16. Whistler, Ken (27 July 1999). "IBM PC memory-mapped video graphics to Unicode". Unicode Consortium.
  17. "IBM Japanese Graphic Character Set, Kanji: DBCS–Host and DBCS-PC" (PDF). IBM. 2002. C-H 3-3220-024 2002-11.
  18. 1 2 "Graphic character identifiers: Alphabetics, Greek". Coded character sets and related resources. IBM . Retrieved 25 February 2017.
  19. The Unicode Consortium (21 May 2003). "Chapter 7: European Alphabetic Scripts". The Unicode Standard 4.0 (PDF). Addison-Wesley (published August 2003). p. 176. ISBN   0-321-18578-1 . Retrieved 9 June 2016.
  20. "Greek and Coptic: Range: 0370–03FF" (PDF). The Unicode Standard, Version 9.0. Unicode Consortium . Retrieved 25 February 2017.
  21. Edwards, Benj (6 November 2015) [2011]. "Origins of the ASCII Smiley Character: An Email Exchange With Dr. David Bradley". Archived from the original on 28 November 2016. Retrieved 27 November 2016. If you look at the first 32 characters in the IBM PC character set you'll see lots of whimsical characters — smiley face, musical notes, playing card suits and others. These were intended for character based games [...] Since we were using 8-bit characters we had 128 new spots to fill. We put serious characters there — three columns of foreign characters, based on our Datamaster experience. Three columns of block graphic characters [...] many customers with Monochrome Display Adapter would have no graphics at all. [...] two columns had math symbols, greek letters (for math) and others [...] about the first 32 characters (x00-x1F)? [...] These characters originated with teletype transmission. But we could display them on the character based screens. So we added a set of "not serious" characters. They were intended as display only characters, not for transmission or storage. Their most probable use would be in character based games. [...] As in most things for the IBM PC, the one year development schedule left little time for contemplation and revision. [...] the character set was developed in a three person 4-hour meeting, and I was one of those on that plane from Seattle to Atlanta. There was some minor revision after that meeting, but there were many other things to design/fix/decide so that was about it. [...] the other participants in that plane trip were Andy Saenz — responsible for the video card, and Lew Eggebrecht — the chief engineer for the PC.
  22. Wilton, Richard (December 1987). Programmer's Guide to PC & PS/2 Video Systems: Maximum Video Performance Form the EGA, VGA, HGC, and MCGA (1st ed.). Microsoft Press. ISBN   1-55615-103-9.
  23. Joshua D. Neal, Attribute Controller Registers: Attribute Mode Control Register, Hardware Level VGA and SVGA Video Programming Information Page: bit 2 is Line Graphics Enable.
  24. Mike Jacobs. "MS LineDraw font family - Typography | Microsoft Docs". Microsoft typography. 2.00. Microsoft Corporation . Retrieved 17 August 2018.
  25. Staff (26 October 2013). "WD97: MS LineDraw Font Not Usable in Word". Microsoft. 2.0. Microsoft. KB179422, Q179422. Archived from the original on 24 March 2016. Retrieved 1 July 2012.
  26. "HP Symbol sets".
  27. "Code Page 1055" (PDF). Archived from the original (PDF) on 21 January 2013.