PETSCII

Last updated

PETSCII
PETSCII.png
PETSCII (shifted and unshifted)
Language(s) English with pseudographics
Classification8-bit extended early ASCII
Based on US-ASCII (1963 version)

PETSCII (PETStandard Code of Information Interchange), also known as CBM ASCII, is the character set used in Commodore Business Machines' 8-bit home computers.

Contents

This character set was first used by the PET from 1977, and was subsequently used by the CBM-II, VIC-20, Commodore 64, Commodore 16, Commodore 116, Plus/4, and Commodore 128. However, the Amiga personal computer family instead uses standard ISO/IEC 8859-1.

History

The character set was largely designed by Leonard Tramiel (the son of Commodore CEO Jack Tramiel) and PET designer Chuck Peddle. [1] [2] [3] The graphic characters of PETSCII were one of the extensions Commodore specified for Commodore BASIC when laying out desired changes to Microsoft's existing 6502 BASIC to Microsoft's Ric Weiland in 1977. [4] The VIC-20 used the same pixel-for-pixel font as the PET, although the characters appeared wider due to the VIC's 22-column screen. The Commodore 64, however, used a slightly re-designed, heavy upper-case font, essentially a thicker version of the PET's, in order to avoid color artifacts created by the machine's higher resolution screen. The C64's lowercase characters are identical to the lowercase characters in the Atari 800's system font (released several years earlier).

Peddle claims the inclusion of card suit symbols was spurred by the demand that it should be easy to write card games on the PET (as part of the specification list he received). [2]

Specifications

PETSCII Chart as displayed on the Commodore 64 in shifted and unshifted modes. (Not shown are control codes, as well as characters in the 0xC0-0xFF range, which are the standard uppercase keycodes returned from the keyboard, and which are duplicated to the range 0x60-0x7F.) C64 Petscii Charts.png
PETSCII Chart as displayed on the Commodore 64 in shifted and unshifted modes. (Not shown are control codes, as well as characters in the 0xC00xFF range, which are the standard uppercase keycodes returned from the keyboard, and which are duplicated to the range 0x600x7F.)

"Unshifted" PETSCII is based on the 1963 version of ASCII (rather than the 1967 version, which most if not all other computer character sets based on ASCII use). It has only uppercase letters, an up-arrow instead of caret ^ at 0x5E and a left-arrow instead of an underscore _ at 0x5F. In all versions except the original Commodore PET, it also has a British pound sign £ instead of the backslash \ at 0x5C. Other characters added in ASCII-1967 (lowercase letters, the grave accent, curly braces, vertical bar, and tilde) do not exist in PETSCII. Codes 0xA00xDF are allotted to CBM-specific block graphics characters—horizontal and vertical lines, hatches, shades, triangles, circles and card suits.

PETSCII also has a "shifted" mode (also called "business mode"), which changes the uppercase letters at 0x410x5A to lowercase, and changes the graphics at 0xC10xDA to uppercase letters. Upper- and lower-case are swapped from where ASCII has them. The mode is toggled by holding one of the SHIFT keys and then pressing and releasing the Commodore key. The shift can be done by POKEing location 59468 with the value 14 to select the alternative set or 12 to revert to standard. On the Commodore 64, the sets are alternated by flipping bit 2 of the byte 53272. On some models of PET, this can also be achieved via special control code PRINT CHR$(14) which adjust the line spacing as well as changing the character set; the POKE method is still available and does not alter the line spacing. [5]

Included in PETSCII are cursor and screen control codes, such as {HOME}, {CLR}, {RVS ON}, and {RVS OFF} (the latter two activating/deactivating reverse-video character display). The control codes appeared in program listings as reverse-video graphic characters, although some computer magazines, in their efforts to provide more clearly readable listings, pretty-printed the codes using their actual names in curly braces, like the above examples. This is unambiguous as PETSCII has no curly brace characters.

Different mappings are used for storing characters (the "interchange" mapping, as used by CHR$()) and displaying characters (the "video" mapping). For example, to display the characters "@ABC" on screen by directly writing into the screen memory, one would POKE the decimal values 0, 1, 2, and 3 rather than 64, 65, 66, and 67. [6] [7]

The keyboard by default provides access to the lower half of the code page. Pressing Shift and a key gives the corresponding upper half code point. Some PETSCII code points cannot be printed and are only used for keyboard input (e.g. F1, RUN/STOP).

PET 2001 keyboard layout, illustrating PETSCII graphics characters PET Keyboard.svg
PET 2001 keyboard layout, illustrating PETSCII graphics characters

Character set

The tables below represent the "interchange" PETSCII encoding, as used by CHR$().

Control characters are defined in the ranges 0x000x1F and 0x800x9F, although which control characters are defined and what they are defined as varies between systems. The tables below exclude control characters—the encoding of control characters in discussed in § Control characters.

The ranges 0x600x7F and 0xE00xFF are duplicate ranges, although what they duplicate varies between systems. On the Commodore PET, they duplicate 0x200x3F and 0xA00xBF, respectively; on the Commodore VIC-20, 64, 16, and 128 they duplicate 0xC00xDF and 0xA00xBF, respectively. [6] While these characters are visually duplicates, they are semantically different; for example, on the Commodore PET, code points 0x2C and 0x6C both produce a comma character, but only 0x2C functions as a delimiter between input fields. [8]

Graphic characters are mostly identical across systems, with the exceptions of 0x5C (which is \ on the Commodore PET, and £ on other systems), 0xD0 (which is U+1FB95 CHECKER BOARD FILL on the Commodore PET and VIC-20, and U+1FB96 INVERSE CHECKER BOARD FILL on other systems), and the range 0x600x7F (which duplicates a different range on Commodore PET). Additionally, in Commodore PET 2001's shifted character set, uppercase and lowercase letters are swapped relative to other systems'.

Compatibility symbols for PETSCII characters were added to Unicode 13.0 as part of the Symbols for Legacy Computing block. [9]

Standard

The following tables represent the PETSCII encoding used on the Commodore VIC-20, 64, 16, and 128.

Unshifted

Unshifted PETSCII [6] [10] [7] [11] [12]
0123456789ABCDEF
0x
1x
2x  SP  !"#$%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[£]
6x🭲🭸🭷🭶🭺🭱🭴🭼🭽
7x🭾🭻🭰🭵🮌π
8x
9x
Ax NBSP 🮏🮇
Bx🮈🮂🮃🭿
Cx🭲🭸🭷🭶🭺🭱🭴🭼🭽
Dx🭾🭻🭰🭵🮌π
Ex NBSP 🮏🮇
Fx🮈🮂🮃🭿
  Differs between PETSCII variants.

Shifted

Shifted PETSCII [6] [10] [7] [13] [14]
0123456789ABCDEF
0x
1x
2x  SP  !"#$%&'()*+,-./
3x0123456789:;<=>?
4x@abcdefghijklmno
5xpqrstuvwxyz[£]
6xABCDEFGHIJKLMNO
7xPQRSTUVWXYZ🮌🮕/🮖 [lower-alpha 1] 🮘
8x
9x
Ax NBSP 🮏🮙🮇
Bx🮈🮂🮃
CxABCDEFGHIJKLMNO
DxPQRSTUVWXYZ🮌🮕/🮖 [lower-alpha 1] 🮘
Ex NBSP 🮏🮙🮇
Fx🮈🮂🮃
  Differs between PETSCII variants.
  1. 1 2 This is U+1FB95 CHECKER BOARD FILL on the VIC-20; and U+1FB96 INVERSE CHECKER BOARD FILL on the Commodore 64 and Commodore 128.

Commodore PET

Unshifted

Unshifted PETSCII (PET) [6] [15]
0123456789ABCDEF
0x
1x
2x  SP  !"#$%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]
6x  SP  !"#$%&'()*+,-./
7x0123456789:;<=>?
8x
9x
Ax NBSP 🮏🮇
Bx🮈🮂🮃🭿
Cx🭲🭸🭷🭶🭺🭱🭴🭼🭽
Dx🭾🭻🭰🭵🮌π
Ex NBSP 🮏🮇
Fx🮈🮂🮃🭿
  Differs from standard PETSCII.

Shifted

Shifted PETSCII (PET) [6] [16]
0123456789ABCDEF
0x
1x
2x  SP  !"#$%&'()*+,-./
3x0123456789:;<=>?
4x@abcdefghijklmno
5xpqrstuvwxyz[\]
6x  SP  !"#$%&'()*+,-./
7x0123456789:;<=>?
8x
9x
Ax NBSP 🮏🮙🮇
Bx🮈🮂🮃
CxABCDEFGHIJKLMNO
DxPQRSTUVWXYZ🮌🮕🮘
Ex NBSP 🮏🮙🮇
Fx🮈🮂🮃
  Displayed case matches the Commodore PET 8032. The opposite case is used on the Commodore PET 2001.
  Differs from standard PETSCII.

Control characters

While the graphic characters were mostly shared between Commodore systems, the control characters frequently varied. The follow table describes what the control characters represent on the Commodore PET 2001, Commodore PET 8032, VIC-20, Commodore 64, Commodore 16, Commodore 128 (40 and 80 column modes).

PETSCII control characters [6] [17]
Hex DecimalPET 2001PET 8032VIC-20C64C16C128 (40 col)C128 (80 col)
000
011
022 UNDERLINE ON
033 STOP
044
055WHITE
066
077 BELL BELL
088 LOCK CASE
099 TAB UNLOCK CASE TAB
0A10 LINE FEED
0B11 UNLOCK CASE
0C12 LOCK CASE
0D13 RETURN
0E14LOWER CASE
0F15SET WINDOW TOPFLASH ON
1016
1117 CURSOR DOWN
1218REVERSE ON
1319 HOME
1420 DEL
1521KILL LINE
1622 ERASE TO RIGHT
1723
1824TAB SET/CLEAR
1925 SCROLL UP
1A26
1B27 ESC ESC
1C28RED
1D29 CURSOR RIGHT
1E30GREEN
1F31BLUE
80128
81129ORANGEDARK PURPLE
82130FLASH ON UNDERLINE OFF
83131RUN
84132FLASH OFF
85133 F1
86134F3
87135DOUBLE BELL F5
88136F7
89137TAB SET/CLEARF2
8A138F4
8B139F6
8C140F8 HELP F8
8D141 SHIFT + RETURN
8E142UPPER CASE
8F143SET WINDOW ENDFLASH OFF
90144BLACK
91145 CURSOR UP
92146REVERSE OFF
93147 CLEAR
94148 INST
95149INSERT LINE ABOVEBROWNDARK YELLOW
96150 ERASE TO LEFT PINKYELLOW-GREENPINK
97151DARK GRAYPINKDARK GRAYDARK CYAN
98152MEDIUM GRAYBLUE-GREENMEDIUM GRAY
99153 SCROLL DOWN LIGHT GREENLIGHT BLUELIGHT GREEN
9A154LIGHT BLUEDARK BLUELIGHT BLUE
9B155LIGHT GRAYLIGHT GREENLIGHT GRAY
9C156PURPLE
9D157 CURSOR LEFT
9E158YELLOW
9F159CYAN

The colors of the VIC-20 and C64/128 are listed in the VIC-II article.

Base 128

Out of PETSCII's first 192 codes, there are 128 graphic characters: 32–127 and 160–192. This permits "base128"-style encodings in DATA statements, or perhaps between PETSCII-speaking machines. This can also include control characters, which are visible when quoted, although which control characters are defined varies between systems.

The primary application for a "Base 128" encoding is in DATA statements in Commodore BASIC. Binary data can be stored with relatively low overhead, allowing one character of data to encode seven bits of data. On a standard 80-character line, typically four characters are used for the line number, and two characters for the tokenized DATA statement. Since the comma and colon are significant to BASIC, a quote character is also needed, leaving 73 characters for data. At seven bits per character, one DATA line could store 511 bits of binary data, for 79% efficiency. If three-digit line numbers are used, efficiency increases to 80%. If two-digit line numbers are used, efficiency is 82%.

  Line Numbers    Data chars per Line    Bits per Line    Efficiency   Max. Lines    Max. Total Data Bytes   1-9*            76                     532              0.83125      9             598   10-99           75                     525              0.8203125    90            5,906   100-999         74                     518              0.809375     900           58,275   1000-9999       73                     511              0.7984375    9,000         574,875   10000-65535**   72                     504              0.7875       55,536**      3.5mb (appx)

For storing binary data in Commodore BASIC, it appears that two- or three-digit line numbers are typically the best choice.

(*) Assume line 0 is a GOTO.

(**) Maximum line number is probably off-by-one.

Base 164

164 PETSCII characters are representatable in quoted strings; theoretically, then, Base 164 is possible. This adds in the color values, the function keys, and cursor controls.

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

<span class="mw-page-title-main">ATASCII</span> Character encoding used by the Atari 8-bit family of home computers

The ATASCII character set, from ATARI Standard Code for Information Interchange, alternatively ATARI ASCII, is a character encoding used in the Atari 8-bit family of home computers. ATASCII is based on ASCII, but is not fully compatible with it.

<span class="mw-page-title-main">Box-drawing character</span> Unicode block group

Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. These characters are characterized by being designed to be connected horizontally and/or vertically with adjacent characters, which requires proper alignment. Box-drawing characters therefore typically only work well with monospaced fonts.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

MIK (МИК) is an 8-bit Cyrillic code page used with DOS. It is based on the character set used in the Bulgarian Pravetz 16 IBM PC compatible system. Kermit calls this character set "BULGARIA-PC" / "bulgaria-pc". In Bulgaria, it was sometimes incorrectly referred to as code page 856. This code page is known by FreeDOS as Code page 3021.

MouseText is a set of 32 graphical characters designed by Bruce Tognazzini and first implemented in the Apple IIc. They were then retrofitted to the Apple IIe forming part of the Enhanced IIe upgrade. A slightly revised version was then released with the Apple IIGS.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Mac OS Icelandic is an obsolete character encoding that was used in Apple Macintosh computers to represent Icelandic text. It is largely identical to Mac OS Roman, except for the Icelandic special characters Ý, Þ and Ð which have replaced typography characters.

<span class="mw-page-title-main">ZX Spectrum character set</span>

The ZX Spectrum character set is the variant of ASCII used in the ZX Spectrum family computers. It is based on ASCII-1967 but the characters ^, ` and DEL are replaced with ↑, £ and ©. It also differs in its use of the C0 control codes other than the common BS and CR, and it makes use of the 128 high-bit characters beyond the ASCII range. The ZX Spectrum's main set of printable characters and system font are also used by the Jupiter Ace computer.

<span class="mw-page-title-main">Semigraphics</span> Method used in early text mode video hardware to emulate raster graphics

Text-based semigraphics, pseudographics, or character graphics is a primitive method used in early text mode video hardware to emulate raster graphics without having to implement the logic for such a display mode.

<span class="mw-page-title-main">ZX81 character set</span> Character encoding used in the Sinclair ZX81 computers

The ZX81 character set is the character encoding used by the Sinclair Research ZX81 family of microcomputers including the Timex Sinclair 1000 and Timex Sinclair 1500. The encoding uses one byte per character for 256 code points. It has no relationship with previously established ones like ASCII or EBCDIC, but it is related though not identical to the character set of the predecessor ZX80.

<span class="mw-page-title-main">Atari ST character set</span> Character set of the Atari ST personal computer family

The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC.

The Acorn RISC OS character set was used in the Acorn Archimedes series and subsequent computers from 1987 onwards. It is an extension of ISO/IEC 8859-1, similar to the Windows CP1252 in that many of the added characters are typographical punctuation marks.

MSX character sets are a group of single- and double-byte character sets developed by Microsoft for MSX computers. They are based on code page 437.

The TRS-80 computer manufacturered by Tandy / Radio Shack contains an 8-bit character set. It is partially derived from ASCII, and shares the code points from 32 - 95 on the standard model. Code points 96 - 127 are supported on models that have been fitted with a lower-case upgrade.

VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.

This article covers technical details of the character encoding system defined by ETS 300 706 of the ETSI, a standard for World System Teletext, and used for the Viewdata and Teletext variants of Videotex in Europe.

Sharp MZ character sets are character sets made by Sharp Corporation for Sharp MZ computers. The European and Japanese versions of the software use different character sets.

Symbols for Legacy Computing is a Unicode block containing graphic characters that were used for various home computers from the 1970s and 1980s and in Teletext broadcasting standards. It includes characters from the Amstrad CPC, MSX, Mattel Aquarius, RISC OS, MouseText, Atari ST, TRS-80 Color Computer, Oric, Texas Instruments TI-99/4A, TRS-80, Minitel, Teletext, ATASCII, PETSCII, ZX80, and ZX81 character sets, as well as semigraphics characters.

The Amstrad CPC character set is the character set used in the Amstrad CPC series of 8-bit personal computers when running BASIC. This character set existed in the built-in "lower" ROM chip. It is based on ASCII-1967, with the exception of character 0x5E which is the up arrow instead of the circumflex, as it is in ASCII-1963, a feature shared with other character sets of the time. Apart from the standard printable ASCII range (0x20-0x7e), it is completely different from the Amstrad CP/M Plus character set. The BASIC character set had symbols of particular use in games and home computing, while the CP/M Plus character reflected the International and Business flavor of the CP/M Plus environment. This character set is represented in Unicode as of the March 2020 release of Unicode 13.0, which added symbols for legacy computing.

References

  1. Reunanen, Markku; Heikkinen, Tero; Carlsson, Anders (22 November 2018). "PETSCII – A Character Set and a Creative Platform" (PDF). Replay. The Polish Journal of Game Studies. 5 (1): 27–47. doi:10.18778/2391-8551.05.02.
  2. 1 2 Bagnall, Brian (2007). On the Edge: The Spectacular Rise and Fall of Commodore. Winnipeg: Variant Press. pp. 43, 54–55. ISBN   0-9738649-0-7.
  3. Tramiel, Leonard (27 December 2021). "Creating PETSCII". Vintage Computer Stories. Blogspot.
  4. "A Conversation with Chuck Peddle, Bil Herd, Jeri Ellsworth - part 3 - BIOS - blip.tv". blip.tv. 5 September 2010 [2009]. 6:30. Archived from the original on 9 January 2011. (mirror)
  5. Andersson, Larry (25 November 2000). "THE COMMODORE PET COMPUTER FREQUENTLY ASKED QUESTIONS FILE". Zimmers.net. 1.7.
  6. 1 2 3 4 5 6 7 Oy, Aivosto (2014), "Commodore PETSCII character sets" (PDF), Aivosto
  7. 1 2 3 Ewell, Doug; Bettencourt, Rebecca; Bánffy, Ricardo; Everson, Michael; Marín Silva, Eduardo; Mårtenson, Elias; Shoulson, Mark; Steele, Shawn; Turner, Rebecca (4 January 2019), "ReadMe.txt", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF), The Unicode Consortium
  8. Brain, Jim (16 March 1996). "Commodore Trivia Edition #26 Answers for February 1996". Zimmers.net. Q $195) On CBM machines prior to the VIC-20, what chr$ code outputs the same character as chr$(44), the comma.
    A $195) 108.
    Q $196) Is the character described in $195 of any use?
    A $196) To put commas in strings read via INPUT. Remember, INPUT treats a comma (chr$(44)) as a delimiter between input fields, but chr$(108) does not produce the same effect, so you could replace 44 with 108 in data written to disk, and read it in with INPUT.
  9. Ewell, Doug; Bettencourt, Rebecca; Bánffy, Ricardo; Everson, Michael; Marín Silva, Eduardo; Mårtenson, Elias; Shoulson, Mark; Steele, Shawn; Turner, Rebecca (4 January 2019), L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF), The Unicode Consortium
  10. 1 2 Bettencourt, Rebecca G. "PETSCII to Unicode Mapping". KreativeKorp.
  11. Bettencourt, Rebecca (20 April 2018), "CVICIPRI.TXT", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF)
  12. Bettencourt, Rebecca (20 April 2018), "C64IPRI.TXT", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF)
  13. Bettencourt, Rebecca (11 October 2018), "CVICIALT.TXT", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF)
  14. Bettencourt, Rebecca (11 October 2018), "C64IALT.TXT", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF)
  15. Bettencourt, Rebecca (20 April 2018), "CPETIPRI.TXT", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF)
  16. Bettencourt, Rebecca (11 October 2018), "CPETIALT.TXT", L2/19-025: Proposal to add characters from legacy computers and teletext to the UCS (PDF)
  17. Commodore 128 Programmer's Reference Guide (PDF). Commodore Business Machines, Inc. February 1986. pp. 666–668. ISBN   0-553-34292-4.