BCD (character encoding)

Last updated

BCD Interchange Codes
Classification 6-bit alphanumeric basic Latin encodings
Succeeded by EBCDIC

BCD (binary-coded decimal), also called alphanumeric BCD, alphameric BCD, BCD Interchange Code, [1] or BCDIC, [1] is a family of representations of numerals, uppercase Latin letters, and some special and control characters as six-bit character codes.

Contents

Unlike later encodings such as ASCII, BCD codes were not standardized. Different computer manufacturers, and even different product lines from the same manufacturer, often had their own variants, and sometimes included unique characters. Other six-bit encodings with completely different mappings, such as some FIELDATA [1] variants or Transcode, are sometimes incorrectly termed BCD.

Many variants of BCD encode the characters '0' through '9' as the corresponding binary values.

History

Technically, binary-coded decimal describes the encoding of decimal numbers where each decimal digit is represented by a fixed number of bits, usually four.

With the introduction of the IBM card in 1928, IBM created a code [a] capable of representing alphanumeric information, [2] later adopted by other manufacturers. This code represents the numbers 0-9 by a single punch, and uses multiple punches for upper-case letters and special characters. [3] A letter has two punches (zone [12,11,0] + digit [1–9]); most special characters have two or three punches (zone [12,11,0,or none] + digit [2–7] + 8).

The BCD code is the adaptation of the punched card code to a six-bit binary code by encoding the digit rows (nine rows, plus unpunched) into the low four bits, and the zone rows (three rows, plus unpunched) into the high two bits. [4] The digit zero (a single punch in row 0) is usually handled specially in some way, and the digit code was extended to values 10 through 15 by combining a digit in the range 2–7 with a punch in row 8. IBM applied the terms binary-coded decimal and BCD to the variations of BCD alphamerics used in most early IBM computers, including the IBM 1620, IBM 1400 series, and non-Decimal Architecture members of the IBM 700/7000 series.

Among the vendors using BCD were Burroughs, [5] Bull, CDC, [6] IBM, General Electric (the computer division was purchased by Honeywell in 1969), NCR, Siemens, and Sperry-UNIVAC.

IBM announced the 8-bit Extended Binary Coded Decimal Interchange Code (EBCDIC), based on BCDIC, in 1964 with the introduction of its System/360 line.

Special characters

Some early commercial computers [b] had the Percent and lozenge (U+2311SQUARE LOZENGE) at the same code point as left and right parentheses in other [c] encodings.

The Recordmark or Record mark character (represented as ‡) is a character used to mark the end of a record. [7] The BCD code for this character is 328 in some BCD variants. The closest Unicode equivalent is U+29E7THERMODYNAMIC, but that is not found in many fonts, so U+2021DOUBLE DAGGER is often used instead. Functionally this corresponds to the EBCDIC IRS character (ASCII RS), X'1E'.

The Groupmark or Group mark character (represented as IBM 1401 Group Mark.GIF ) is a character used to indicate the start or finish of a group of related fields. [8] The BCD code for this character is 778 in some BCD variants. The groupmark was proposed for Unicode standardization in 2015, [9] and was assigned to value U+2BD2GROUP MARK. Functionally this corresponds to the EBCDIC IGS character (ASCII GS), X'1D'. It is now in Unicode 10.0 at this position, but only the Symbola and Unifont fonts support it.

The Wordmark , by contrast, is not a BCD character. Rather, it is a flag bit used to mark the end of a word on some variable word length computers such as the IBM 1401.

BCD code variations

There are many different versions of the six-bit BCD code. There are three major categories of difference:

  1. The mapping from zone punches to high-order bits. All codes translate no zone punches to a bit pattern of 00, but some encode the zone punches in 12-11-0 order, preserving alphabetical order, while others use 0-11-12 order, resulting in a partially reversed alphabet.
  2. The handling of the digit 0. The straightforward translation from punched form would place the blank before digits 19, and encode 0 at the start of the line with 'S' in it. All codes have some special-case handling which either translates the digit 0 to the all-zero binary code (and moves the blank elsewhere), or gives it binary code 001010 (decimal 10) and moves the 8+2 punch elsewhere.
  3. The assignment of special characters. The characters assigned to codes beyond the basic alphanumeric set varied widely, even within one model of computer.

In "Spanish speaking countries", the character "Ñ" did not exist in the original system, therefore "@" was chosen by most manufacturers: Bull, NCR, and Control Data, but there was an inconsistency when merging databases to 7-bit ASCII code, for in that coding system the "/" character was chosen, resulting in two different codes for the same character.

Examples of BCD codes

The following charts show the numeric values of BCD characters in hexadecimal (base-16) notation, as that most clearly reflects the structure of 4-bit binary coded decimal, plus two extra bits. For example, the code for 'A', in row 3x and column x1, is hexadecimal 31, or binary '11 0001'.

Tape style

48-character BCD code

The first versions of BCDIC had 48 characters, as they were based on card punch patterns and the character sets of printers, neither of which encouraged having a power-of-two number of characters.

IBM 48-character BCDIC code [1] :68
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890#@
1x/STUVWXYZ,%
2x-JKLMNOPQR$*
3x&ABCDEFGHI.

This was based on a 40-character punched card code; the original 37 (10 digits, 26 letters, and blank), plus three commercially important characters added around 1932: [1] :67 hyphen-minus used for printing credit balances and hyphenated names, the ampersand also used in many names and addresses (Procter & Gamble, Mr. & Mrs. Smith), and the asterisk used to overprint unused fields when printing cheques.

IBM 1401 BCD code

Rather than following the IBM 704's storage representation, IBM 1401 followed the tape representation (descended from the 48-character BCD), thus using the all-zero code for blank and the code 10 (0x0A) for the digit zero. It had defined character forms for all possible values, for documentation purposes, [10] but only 48 of the 63 non-blank characters were printable, and there was considerable variation in how the other code values (shaded in the table below) were depicted in practice. Even the other characters varied between different available print chains for the IBM 1403 printer.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890#@:>
1x¢/STUVWXYZ,%='"
2x-JKLMNOPQR!$*);Δ
3x&ABCDEFGHI?.(<

Code page 353

The BCDIC-A Code page was assigned as Code page 353, also known as CP353. Some of the characters in this code page are not in Unicode. (The duplication of '#' can be found in IBM's own documentation and is not a mistake here. [11] )

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890#@:>
1x/STUVWXYZ,%γ\
2x-JKLMNOPQR!#*];Δ
3x&ABCDEFGHI?.[<

At 0x1A is the record mark. At 0x3F is the group mark.

Code page 354

The BCDIC-B Code page was assigned as Code page 354, also known as CP354. [12] Some of the characters in this code page are not in Unicode.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890':>
1x/STUVWXYZ,(γ\
2x-JKLMNOPQR!#*];Δ
3x+ABCDEFGHI?.)[<

At 0x1A is the record mark. At 0x3F is the group mark.

PTTC/BCD code pages

PTTC/BCD had 5 options. There were five code pages. They are shown below. The PTTC/BCD Standard Option was assigned as Code page 355, or CP355.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890#
1x@/STUVWXYZ,γ
2x-JKLMNOPQR<$
3x&ABCDEFGHI).

The PTTC/BCD H Option was assigned as Code page 357, or CP357.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890=
1x'/STUVWXYZ,
2x-JKLMNOPQR!$
3x+ABCDEFGHI?.

The PTTC/BCD Correspondence Option was assigned as Code page 358, or CP358.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890'
1x !/STUVWXYZ,
2x-JKLMNOPQR<;
3x=ABCDEFGHI>.

The PTTC/BCD Monocase Option was assigned as Code page 359, or CP359.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890#
1x@/STUVWXYZ,
2x-JKLMNOPQR$
3x&ABCDEFGHI.

The PTTC/BCD Duocase Option was assigned as Code page 360, or CP360.

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0xspace1234567890#
1x@/STUVWXYZ,
2x-JKLMNOPQR$
3x&ABCDEFGHI.

IBM 704 storage style

IBM 704 BCD code

The IBM 704 reordered the BCDIC code to allow a normal alphabetic collating order internally, with 0 before 1 and A before Z. It could automatically translate between this internal form and the earlier BCDIC when reading and writing magnetic tapes. [13] :35

The following table shows the code assignments for the IBM 704 computer. Unassigned code positions appear as blanks. [13] :35

IBM 704 character set
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x0123456789#@
1x&ABCDEFGHI+0.
2x-JKLMNOPQR0$*
3xspace/STUVWXYZ,%

(+0 and 0 were rarely used characters that corresponded to the punched-card convention of a digit 0 with an overpunched sign in rows 12 or 11.)

The following table shows the code assignments for the type 716 printer used starting with the IBM 704 computer and through the 7094. [13] :58 The 704 interface [d] sent virtual punched-card rows to this printer, two words (72 bits) at a time, so the mapping from 6-bit BCD characters was done by software, and was not built into the printer.

IBM 716 printer character set G
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x*123456789+-
1x+ABCDEFGHI.
2x-JKLMNOPQR$*
3x0/STUVWXYZ,%

This is a repertoire of 45 characters (not counting blank, which is handled specially by the printer), as the characters +, - and * are duplicated.

Fortran character set

There was some variation; IBM 704 Fortran had a different set of special characters (preserving only the duplicated minus sign). [14]

IBM 716 printer Fortran character set
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x*123456789=-
1x+ABCDEFGHI.)
2x-JKLMNOPQR$*
3x0/STUVWXYZ,(

A similar code was used for the IBM 709, 7090 and 7094 successors, [15] but with some of the special characters reassigned:

IBM 7090/7094 character set
x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x0123456789="
1x&ABCDEFGHI+0.)
2x-JKLMNOPQR0$*
3xspace/STUVWXYZ±,(

GBCD code

Below is the table of GE/Honeywell's GBCD code, a variant of BCD. [16]

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x0123456789[#@:>?
1xspaceABCDEFGHI&.](<\
2x^JKLMNOPQR-$*);'
3x+/STUVWXYZ_,%="!

Burroughs B5500 BCD code

The following table shows the code assignments for the Burroughs B5500 computer, sometimes referred to as BIC (Burroughs Interchange Code). [17]

x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x0123456789#@?:>
1x+ABCDEFGHI.[&(<
2x×JKLMNOPQR$*-);
3xspace/STUVWXYZ,%=]"

See also

Notes

  1. There are actually multiple card codes, e.g, by 1964 there were ten versions of the IBM 026 with slightly different character sets.
  2. E.g., IBM 702, IBM 705
  3. E.g., IBM 701, IBM 704.
  4. The interface on, e.g., the 7090, is different, although the software still must do mapping.

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, an acronym for American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. ASCII has just 128 code points, of which only 95 are printable characters, which severely limit its scope. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers, including Unicode which has over a million code points, but the first 128 of these are the same as ASCII.

<span class="mw-page-title-main">Binary-coded decimal</span> System of digitally encoding numbers

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a character encoding are known as code points and collectively comprise a code space, a code page, or character map.

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

<span class="mw-page-title-main">Punched card</span> Paper-based recording medium

A punched card is a piece of card stock that stores digital data using punched holes. Punched cards were once common in data processing and the control of automated machines.

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

<span class="mw-page-title-main">Character (computing)</span> Primitive data type

In computing and telecommunications, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

<span class="mw-page-title-main">IBM 1401</span> 1960s decimal computer

The IBM 1401 is a variable-wordlength decimal computer that was announced by IBM on October 5, 1959. The first member of the highly successful IBM 1400 series, it was aimed at replacing unit record equipment for processing data stored on punched cards and at providing peripheral services for larger computers. The 1401 is considered by IBM to be the Ford Model-T of the computer industry due to its mass appeal. Over 12,000 units were produced and many were leased or resold after they were replaced with newer technology. The 1401 was withdrawn on February 8, 1971.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

<span class="mw-page-title-main">IBM 700/7000 series</span> Mainframe computer systems made by IBM through the 1950s and early 1960s

The IBM 700/7000 series is a series of large-scale (mainframe) computer systems that were made by IBM through the 1950s and early 1960s. The series includes several different, incompatible processor architectures. The 700s use vacuum-tube logic and were made obsolete by the introduction of the transistorized 7000s. The 7000s, in turn, were eventually replaced with System/360, which was announced in 1964. However the 360/65, the first 360 powerful enough to replace 7000s, did not become available until November 1965. Early problems with OS/360 and the high cost of converting software kept many 7000s in service for years afterward.

<span class="mw-page-title-main">Binary code</span> Encoding for data, using 0s and 1s

A binary code represents text, computer processor instructions, or any other data using a two-symbol system. The two-symbol system used is often "0" and "1" from the binary number system. The binary code assigns a pattern of binary digits, also known as bits, to each character, instruction, etc. For example, a binary string of eight bits can represent any of 256 possible values and can, therefore, represent a wide variety of different items.

The Burroughs B2500 through Burroughs B4900 was a series of mainframe computers developed and manufactured by Burroughs Corporation in Pasadena, California, United States, from 1966 to 1991. They were aimed at the business world with an instruction set optimized for the COBOL programming language. They were also known as Burroughs Medium Systems, by contrast with the Burroughs Large Systems and Burroughs Small Systems.

Chen–Ho encoding is a memory-efficient alternate system of binary encoding for decimal digits.

Densely packed decimal (DPD) is an efficient method for binary encoding decimal digits.

A six-bit character code is a character encoding designed for use on computers with word lengths a multiple of 6. Six bits can only encode 64 distinct characters, so these codes generally include only the upper-case letters, the numerals, some punctuation characters, and sometimes control characters. The 7-track magnetic tape format was developed to store data in such codes, along with an additional parity bit.

<span class="mw-page-title-main">Decimal computer</span> Computer operating on base-10 numbers

A decimal computer is a computer that can represent numbers and addresses in decimal and that provides instructions to operate on those numbers and addresses directly in decimal, without conversion to a pure binary representation. Some also had a variable wordlength, which enabled operations on numbers with a large number of digits.

In computing, a signed overpunch is a coding scheme which stores the sign of a number by changing (usually) the last digit. It is used in character data on IBM mainframes by languages such as COBOL, PL/I, and RPG. Its purpose is to save a character that would otherwise be used by the sign digit. The code is derived from the Hollerith Punched Card Code, where both a digit and a sign can be entered in the same card column. It is called an overpunch because the digit in that column has a 12-punch or an 11-punch above it to indicate the sign. The top three rows of the card are called zone punches, and so numeric character data which may contain overpunches is called zoned decimal.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

Several mutually incompatible versions of the Extended Binary Coded Decimal Interchange Code (EBCDIC) have been used to represent the Japanese language on computers, including variants defined by Hitachi, Fujitsu, IBM and others. Some are variable-width encodings, employing locking shift codes to switch between single-byte and double-byte modes. Unlike other EBCDIC locales, the lowercase basic Latin letters are often not preserved in their usual locations.

References

  1. 1 2 3 4 5 Mackenzie, Charles E. (1980). Coded Character Sets, History and Development (PDF). The Systems Programming Series (1 ed.). Addison-Wesley Publishing Company, Inc. ISBN   0-201-14460-3. LCCN   77-90165. Archived (PDF) from the original on 2016-05-26. Retrieved 2017-04-22.
  2. Pugh, Emerson W.; Heide, Lars. "STARS:Punched Card Equipment". IEEE Global History Network. Archived from the original on 2012-05-11. Retrieved 2012-06-09.
  3. Pugh, Emerson W. (1995). Building IBM: Shaping and Industry and Its Technology . MIT Press. pp.  50–51. ISBN   978-0-262-16147-3.
  4. Jones, Douglas W. "Punched Card Codes" . Retrieved 2014-01-01.
  5. Burroughs B5500 Information Processing Systems: Reference Manual (PDF). Burroughs Corporation. 1964. Archived from the original (PDF) on 2020-07-29. Retrieved 2012-06-08.
  6. Control Data Corporation (1965). Codes/Control Data 6600 Computer System (PDF).
  7. "Record-mark". Encyclopedia. PC Magazine . Retrieved 2016-04-09.
  8. "group mark". Encyclopedia.com. Retrieved 2016-04-09.
  9. Shirriff, Ken. "Proposal for addition of Group Mark symbol" (PDF). unicode.org. Retrieved 2016-04-09.
  10. IBM 1401 Data Processing System: Reference Manual (PDF). IBM. April 1962. p. 170. A24-1403-5. Archived from the original (PDF) on 2012-03-14.
  11. "Systems i Software Globalization cp00353z" (PDF). www-03.ibm.com. Archived from the original (PDF) on 2013-01-21. Retrieved 2022-06-30.
  12. https://ccsids.net/ccsids.html#ccsid-354.{{cite web}}: Missing or empty |title= (help)
  13. 1 2 3 IBM 704 electronic data-processing machine manual of operation (PDF). IBM. 1955. pp. 35, 58. Form 24-6661-2. Retrieved 2017-04-22.
  14. "Fortran Automatic Coding System for the IBM 704" (PDF). IBM. 1956-10-15. p. 49. Retrieved 2015-09-15.
  15. Harper, Jack (2001-08-21). "IBM 7090/94 Character Representation" . Retrieved 2017-04-22.
  16. "Section: Tables of characters in BULL computers" (PDF). Archived from the original (PDF) on 2011-07-08. Retrieved 2010-11-15.
  17. Burroughs B 5500 Information Processing Systems Extended Algol Reference Manual (PDF). 1966. p. B-1.

Further reading