Six-bit character code

Last updated

A six-bit character code is a character encoding designed for use on computers with word lengths a multiple of 6. Six bits can only encode 64 distinct characters, so these codes generally include only the upper-case letters, the numerals, some punctuation characters, and sometimes control characters. The 7-track magnetic tape format was developed to store data in such codes, along with an additional parity bit.

Contents

Types of six-bit codes

An early six-bit binary code was used for Braille, the reading system for the blind that was developed in the 1820s.

The earliest computers dealt with numeric data only, and made no provision for character data. Six-bit BCD, with several variants, was used by IBM on early computers such as the IBM 702 in 1953 and the IBM 704 in 1954. [1] :p.35 Six-bit encodings were replaced by the 8-bit EBCDIC code starting in 1964, when System/360 standardized on 8-bit bytes. There are some variants of this type of code (see below).

Six-bit character codes generally succeeded the five-bit Baudot code and preceded seven-bit ASCII.

Six-bit codes could encode more than 64 characters by the use of Shift Out and Shift In characters, essentially incorporating two distinct 62-character sets and switching between them. For example, the popular IBM 2741 communications terminal supported a variety of character sets of up to 88 printing characters plus control characters.

BCD six-bit code

Six-bit BCD code was the adaptation of the punched card code to binary code. IBM applied the terms binary-coded decimal and BCD to the variations of BCD alphamerics used in most early IBM computers, including the IBM 1620, IBM 1400 series, and non-decimal architecture members of the IBM 700/7000 series.

COBOL databases six-bit code

A six-bit code was also used in COBOL databases, where end-of-record information was stored separately.[ citation needed ]

Magnetic stripe card six-bit code

A six-bit code, with added odd parity bit, is used on Track 1 of magnetic stripe cards, as specified in ISO/IEC 7811-2.

DEC SIXBIT code

A popular six-bit code was DEC SIXBIT. This is simply the ASCII character codes from 32 to 95 coded as 0 to 63 by subtracting 32 (i.e., columns 2, 3, 4, and 5 of the ASCII table (16 characters to a column), shifted to columns 0 through 3, by subtracting 2 from the high bits); it includes the space, punctuation characters, numbers, and capital letters, but no control characters. Since it included no control characters, not even end-of-line, it was not used for general text processing. However, six-character names such as filenames and assembler symbols could be stored in a single 36-bit word of the PDP-10, and three characters fit in each word of the PDP-1 and two characters fit in each word of the PDP-8. See table below.

Another, less common, variant is obtained by just stripping the high bit of an ASCII code in 32 - 95 range (codes 32 - 63 remain at their positions, higher values have 64 subtracted from them). Such variant was sometimes used on DEC's PDP-8 (1965).

ECMA six-bit code

A six-bit code similar to DEC's, but replacing a few punctuation characters with the most useful control charactersincluding SO/SI, allowing code extensionwas specified as ECMA-1 in 1963 (see below).

FIELDATA six-bit code

FIELDATA was a seven-bit code (with optional parity) of which only 64 code positions (occupying six bits) were formally defined. [2] A variant was used by UNIVAC's 1100-series computers. [3] Treating the code as a six-bit code these systems used a 36-bit word (capable of storing six such reduced FIELDATA characters). [4]

Braille six-bit code

Braille characters are represented using six dot positions, arranged in a rectangle. Each position may contain a raised dot or not, so Braille can be considered to be a six-bit binary code. Some more modern Braille systems add an extra two dots, making these systems an eight-bit code instead.

Six-bit codes for binary-to-text encoding

Transmission of binary data over systems which are designed for text only can sometimes introduce problems. For example, email historically supported only 7-bit ASCII codes and would strip the 8th bit, thus corrupting binary data sent directly through any troublesome mail server. Other systems can cause issues by improperly interpreting control characters during storage or transmission. A number of schemes exist to pack 8-bit data into text-only representations which can pass through text mail systems, to be decoded at the destination. Examples of 6-bit character subsets used for packing binary data include Uuencode and Base64. These sets contain no control characters (only printable numbers, letters, some punctuation, and maybe space) and allow data to be transmitted over any medium which is also able to transmit human-readable text.

Examples of BCD six-bit codes

IBM, which dominated commercial data processing use a variety of six-bit codes, which were tied to the character set used on punched cards, see BCD (character encoding).

Other vendor character codes are shown below, with their Unicode equivalents.

CDC 1604: Magnetic tape BCD codes
0123456789ABCDEF
0x 1 2 3 4 5 6 7 8 9 0 # @ TAPE
MARK
1x  SP   / S T U V W X Y Z REC
MARK
, %
2x - J K L M N O P Q R -0 $ *
3x & A B C D E F G H I +0 . ¤ GRP
MARK
CDC 1604: Punched card codes
0123456789ABCDEF
0x 1 2 3 4 5 6 7 8 9 0 =
1x  SP   / S T U V W X Y Z , (
2x J K L M N O P Q R -0 $ *
3x + A B C D E F G H I +0 . )
CDC 1612: Printer codes (business applications)
0123456789ABCDEF
0x : 1 2 3 4 5 6 7 8 9 0 = ! [
1x  SP   / S T U V W X Y Z ] , ( ~
2x J K L M N O P Q R % $ * >
3x + A B C D E F G H I < . ) ? ;

Examples of six-bit ASCII variants

DEC SIXBIT
0123456789ABCDEF
0x  SP   ! " # $ % & ' ( ) * + , - . /
1x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
2x @ A B C D E F G H I J K L M N O
3x P Q R S T U V W X Y Z [ \ ] ^ _
ECMA-1
0123456789ABCDEF
0x  SP    HT    LF    VT    FF    CR    SO    SI    ( ) * + , - . /
1x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
2x NUL A B C D E F G H I J K L M N O
3x P Q R S T U V W X Y Z [ \ ] ESC DEL
ICL Mainframes
0123456789ABCDEF
0x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
1x  SP   ! " # £ % & ' ( ) * + , - . /
2x @ A B C D E F G H I J K L M N O
3x P Q R S T U V W X Y Z [ $ ]
SixBit ASCII (used by AIS) [5]
0123456789ABCDEF
0x @ A B C D E F G H I J K L M N O
1x P Q R S T U V W X Y Z [ \ ] ^ _
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

GOST 6-bit code

GOST 6-bit code
0123456789ABCDEF
0x0123456789+-/,.  SP  
1x()×=;[]*<>:
2x А Б В Г Д Е Ж З И Й К Л М Н О П
3x Р С Т У Ф Х Ц Ч Ш Щ Ы Ь Э Ю Я DEL

Example of six-bit Braille codes

The following table shows the arrangement of characters, with the hex value, corresponding ASCII character, Braille 6-bit codes (dot combinations), Braille Unicode glyph, and general meaning (the actual meaning may change depending on context). [6] [7]

HexASCII GlyphBraille DotsBraille GlyphBraille Meaning
20 (space) Braille NULL.svg (space)
21 ! 2-3-4-6 Braille E.svg the
22 " 5 Braille ContractionPrefix.svg (contraction)
23 # 3-4-5-6 Braille NumberSign.svg (number prefix)
24 $ 1-2-4-6 Braille E.svg ed
25 % 1-4-6 Braille SH.svg sh
26 & 1-2-3-4-6 Braille AND.svg and
27 ' 3 Braille Apostrophe.svg '
28 ( 1-2-3-5-6 Braille A.svg of
29 ) 2-3-4-5-6 Braille U.svg with
2A * 1-6 Braille A.svg ch
2B + 3-4-6 Braille O.svg ing
2C , 6 Braille CapitalSign.svg (uppercase prefix)
2D - 3-6 Braille Hyphen.svg -
2E . 4-6 Braille DecimalPoint.svg (italic prefix)
2F / 3-4 Braille ST.svg st
3003-5-6 Braille QuoteClose.svg "
3112 Braille Comma.svg ,
3222-3 Braille Semicolon.svg ;
3332-5 Braille Colon.svg :
3442-5-6 Braille Period.svg .
3552-6 Braille QuestionMark.svg en
3662-3-5 Braille ExclamationPoint.svg !
3772-3-5-6 Braille Bracket.svg ( or )
3882-3-6 Braille QuoteOpen.svg " or ?
3993-5 Braille Asterisk.svg in
3A : 1-5-6 Braille U.svg wh
3B ; 5-6 Braille Correction.svg (letter prefix)
3C < 1-2-6 Braille E.svg gh
3D = 1-2-3-4-5-6 Braille E.svg for
3E > 3-4-5 Braille A.svg ar
3F ? 1-4-5-6 Braille O.svg th
 
HexASCII GlyphBraille DotsBraille GlyphBraille Meaning
40 @ 4 Braille Accent.svg (accent prefix)
41A1 Braille A1.svg a
42B1-2 Braille B2.svg b
43C1-4 Braille C3.svg c
44D1-4-5 Braille D4.svg d
45E1-5 Braille E5.svg e
46F1-2-4 Braille F6.svg f
47G1-2-4-5 Braille G7.svg g
48H1-2-5 Braille H8.svg h
49I2-4 Braille I9.svg i
4AJ2-4-5 Braille J0.svg j
4BK1-3 Braille K.svg k
4CL1-2-3 Braille L.svg l
4DM1-3-4 Braille M.svg m
4EN1-3-4-5 Braille N.svg n
4FO1-3-5 Braille O.svg o
50P1-2-3-4 Braille P.svg p
51Q1-2-3-4-5 Braille Q.svg q
52R1-2-3-5 Braille R.svg r
53S2-3-4 Braille S.svg s
54T2-3-4-5 Braille T.svg t
55U1-3-6 Braille U.svg u
56V1-2-3-6 Braille V.svg v
57W2-4-5-6 Braille W.svg w
58X1-3-4-6 Braille X.svg x
59Y1-3-4-5-6 Braille Y.svg y
5AZ1-3-5-6 Braille Z.svg z
5B [ 2-4-6 Braille O.svg ow
5C \ 1-2-5-6 Braille U.svg ou
5D ] 1-2-4-5-6 Braille I.svg er
5E ^ 4-5 Braille Currency.svg (contraction)
5F _ 4-5-6 Braille CursiveSign.svg (contraction)

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, an acronym for American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as the Internet Protocol refer to an 8-bit byte as an octet. Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness.

<span class="mw-page-title-main">Binary-coded decimal</span> System of digitally encoding numbers

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

A binary code represents text, computer processor instructions, or any other data using a two-symbol system. The two-symbol system used is often "0" and "1" from the binary number system. The binary code assigns a pattern of binary digits, also known as bits, to each character, instruction, etc. For example, a binary string of eight bits can represent any of 256 possible values and can, therefore, represent a wide variety of different items.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

<span class="mw-page-title-main">Fieldata</span> Military communication project and ASCII precursor

FIELDATA was a pioneering computer project run by the US Army Signal Corps in the late 1950s that intended to create a single standard for collecting and distributing battlefield information. In this respect it could be thought of as a generalization of the US Air Force's SAGE system that was being created at about the same time.

<span class="mw-page-title-main">36-bit computing</span> Computer architecture bit width

In computer architecture, 36-bit integers, memory addresses, or other data units are those that are 36 bits wide. Also, 36-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers, address buses, or data buses of that size. 36-bit computers were popular in the early mainframe computer era from the 1950s through the early 1970s.

Chen–Ho encoding is a memory-efficient alternate system of binary encoding for decimal digits.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

Braille ASCII is a subset of the ASCII character set which uses 64 of the printable ASCII characters to represent all possible dot combinations in six-dot braille. It was developed around 1969 and, despite originally being known as North American Braille ASCII, it is now used internationally.

<span class="mw-page-title-main">Decimal computer</span> Computer operating on base-10 numbers

A decimal computer is a computer that can represent numbers and addresses in decimal and that provides instructions to operate on those numbers and addresses directly in decimal, without conversion to a pure binary representation. Some also had a variable wordlength, which enabled operations on numbers with a large number of digits.

In computer architecture, 18-bit integers, memory addresses, or other data units are those that are 18 bits wide. Also, 18-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers, address buses, or data buses of that size.

<span class="mw-page-title-main">Extended ASCII</span> Nickname for 8-bit ASCII-derived character sets

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

BCD, also called alphanumeric BCD, alphameric BCD, BCD Interchange Code, or BCDIC, is a family of representations of numerals, uppercase Latin letters, and some special and control characters as six-bit character codes.

The following outline is provided as an overview of and topical guide to computing:

Computer Braille is an adaptation of braille for precise representation of computer-related materials such as programs, program lines, computer commands, and filenames. Unlike standard 6-dot braille scripts, but like Gardner–Salinas braille codes, this may employ the extended 8-dot braille patterns.

References

  1. IBM Corporation (1954). 704 electronic data-processing machine: manual of operation (PDF).
  2. Mackenzie, Charles E. (1980). Coded Character Sets, History and Development (PDF). The Systems Programming Series (1 ed.). Addison-Wesley Publishing Company, Inc. ISBN   978-0-201-14460-4. LCCN   77-90165. Archived (PDF) from the original on May 26, 2016. Retrieved August 25, 2019.
  3. Walker, John (1996-08-06). "UNIVAC 1100 Series FIELDATA Code". UNIVAC Memories. Archived from the original on 2016-05-22. Retrieved 2016-05-22.
  4. Jennings, Thomas Daniel (2016-04-20) [1999]. "An annotated history of some character codes or ASCII: American Standard Code for Information Infiltration". sensitive research (SR-IX). FIELDATA. Retrieved 2022-06-01.
  5. Raymond, Eric S. (2023-06-24). "AIVDM/AIVDO protocol decoding". AIS Payload Data Types. Retrieved 2024-03-14.
  6. "Representing and Displaying Braille". DotlessBraille.org. 2002-02-20. Retrieved 2024-03-14.
  7. Halleck, John (2000-08-24). "braille-ascii.ads". Braille.Ascii. Archived from the original on 2010-06-13. Retrieved 2009-08-10.