ISO 2033

Last updated

The ISO 2033:1983 standard ("Coding of machine readable characters (MICR and OCR)") [1] defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 ("Coding of machine readable characters (OCR and MICR)", originally designated JIS C 6229-1984) is closely related. [2]

Contents

Character set for OCR-A

The version of the encoding for the OCR-A font registered with the ISO-IR registry as ISO-IR-91 is the Japanese (JIS X 9010 / JIS C 6229) version, which differs from the encoding defined by ISO 2033 only in the addition of a Yen sign at 5C. [2]

ISO 2033 and JIS C 6229 OCR-A set
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2x  SP   " £
00A3
$ % & ' {
007B
}
007D
* + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ;
2440
=
2441
?
4x A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z ¥
00A5

2442
6x
7x | DEL
  Redefined compared to JIS-Roman

Character set for OCR-B

The version of the G0 set for the OCR-B font registered with the ISO-IR registry as ISO-IR-92 is the Japanese (JIS X 9010 / JIS C 6229) version, which differs from the encoding defined by ISO 2033 only in being based on JIS-Roman (with a dollar sign at 0x24 and a Yen sign at 0x5C) rather than on the ISO 646 IRV (with a backslash at 0x5C and, at the time, a universal currency sign (¤) at 0x24). [3] Besides those code points, it differs from ASCII only in omitting the backtick (`) and tilde (~). [3] An additional supplementary set registered as ISO-IR-93 assigns the pound sign (£), universal currency sign (¤) and section sign (§) to their ISO-8859-1 codepoints, and the backslash to the ISO-8859-1 codepoint for the Yen sign. [4]

Character set for JIS X 9008 (JIS C 6257)

JIS X 9010 (JIS C 6229) also defines character sets for the JIS X 9008:1981 (formerly JIS C 6257-1981) "hand-printed" OCR font. [5] :fn1 These include subsets of the JIS X 0201 Roman set (registered as ISO-IR-94 and omitting the backtick (`), lowercase letters, curly braces ({, }) and overline (‾)), [5] and kana set (registered as ISO-IR-96 and omitting the East Asian style comma (、) and full stop (。), the interpunct (・) and the small kana), [6] in addition to a set (registered as ISO-IR-95) containing only the backslash, which is assigned to the same code point as in ISO-IR-93. [7]

The JIS C 6527 font stylises the slash [5] and backslash [7] characters with a doubled appearance. The character names given are "Solidus" [5] and "Reverse Solidus", [7] matching the Unicode character names for the ASCII slash and backslash. [8] However, the Unicode Optical Character Recognition block includes an additional code point for an "OCR Double Backslash" (⑊), although not for a double (forward) slash, [9] although a double slash is available elsewhere, as U+2AFDDOUBLE SOLIDUS OPERATOR.

Character set for E-13B

The MICR E-13B font, showing the ISO-IR-98 character repertoire. MICR2.svg
The MICR E-13B font, showing the ISO-IR-98 character repertoire.

The ISO-IR-98 encoding defined by ISO 2033 encodes the character repertoire of the E13B font, as used with magnetic ink character recognition. [10] Although ISO 2033 also specifies other encodings, the encoding for E-13B is the encoding referred to as ISO_2033_1983 by Perl libintl, [11] and as ISO_2033-1983 or csISO2033 by the IANA. [12] Other registered labels include iso-ir-98, its ISO-IR registration number, and simply e13b. [12]

The digits are preserved in their ASCII locations. Letters and symbols unavailable in the E13B font are omitted, while specialised punctuation for bank cheques included in the E13B font is added. The same symbols are available in Unicode in the Optical Character Recognition block.

ISO 2033:1983 E-13B set [11]
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2x  SP  
3x 0 1 2 3 4 5 6 7 8 9
2446

2447

2448

2449
4x
5x
6x
7x DEL
  Redefined compared to ASCII

Related Research Articles

ISO/IEC 646 is the name of a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

<span class="mw-page-title-main">Yen and yuan sign</span> Latin symbol for CN and JP currencies

The yen and yuan sign, ¥, is a currency sign used for the Japanese yen and the Chinese yuan currencies when writing in Latin scripts. This monetary symbol resembles a Latin letter Y with a single or double horizontal stroke. The symbol is usually placed before the value it represents, for example: ¥50, or JP¥50 and CN¥50 when disambiguation is needed. When writing in Japanese and Chinese, the Japanese kanji and Chinese character is written following the amount, for example 50円 in Japan, and 50元 or 50圆 in China.

Magnetic ink character recognition code, known in short as MICR code, is a character recognition technology used mainly by the banking industry to streamline the processing and clearance of cheques and other documents. MICR encoding, called the MICR line, is at the bottom of cheques and other vouchers and typically includes the document-type indicator, bank code, bank account number, cheque number, cheque amount and a control indicator. The format for the bank code and bank account number is country-specific.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. Originating in 1971, it was most recently revised in 1994.

Shift JIS is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. As of October 2022, 0.2% of all web pages used Shift JIS, a decline from 1.3% in July 2014.

T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languages. It is also used by older versions of LDAP. While T.61 continues to be supported in modern versions of X.500 and X.509, it has been deprecated in favor of Unicode. It is also called Code page 1036, CP1036, or IBM 01036.

<span class="mw-page-title-main">Shift Out and Shift In characters</span> ASCII control characters

Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively. These are sometimes also called "Control-N" and "Control-O".

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

<span class="mw-page-title-main">JIS X 0201</span> Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. It is either a 7-bit encoding or an 8-bit encoding, although the 8-bit form is dominant for modern use. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

<span class="mw-page-title-main">OCR-A</span> Typeface designed for early computer OCR

OCR-A is a font created in 1968, in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm apart, and the reader required to accept any spacing between 0.2286 cm and 0.4572 cm.

In mathematics, the radical sign, radical symbol, root symbol, radix, or surd is a symbol for the square root or higher-order root of a number. The square root of a number is written as

The CCITT Chinese Primary Set is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992. It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex. It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165, and encodable in the ISO-2022-CN-EXT code version.

Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards.

Code page 895 is a 7-bit character set and is Japan's national ISO 646 variant. It is the Roman set of the JIS X 0201 Japanese Standard and is variously called Japan 7-Bit Latin, JISCII, JIS Roman, JIS C6220-1969-ro, ISO646-JP or Japanese-Roman. Its ISO-IR registration number is 14.

Microsoft Windows code page 932, also called Windows-31J amongst other names, is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873.

<span class="mw-page-title-main">ARIB STD B24 character set</span> Character encoding and character set extensions used in Japanese broadcasting.

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26. The latest revision is version 6.3 as of 2016-07-06.

The character sets used by Videotex are based, to greater or lesser extents, on ISO/IEC 2022. Three Data Syntax systems are defined by ITU T.101, corresponding to the Videotex systems of different countries.

References

  1. ISO/IEC JTC 1/SC 2 (1983). Information processing — Coding of machine readable characters (MICR and OCR). ISO. ISO 2033:1983.
  2. 1 2 ISO/TC97/SC2 (1985-08-01). ISO-IR-91: Japanese OCR-A Graphic Character Set (PDF). ITSCJ/IPSJ. Archived from the original (PDF) on 2022-03-10.
  3. 1 2 ISO/TC97/SC2 (1985-08-01). ISO-IR-92: Japanese OCR-B Basic Graphic Character Set (PDF). ITSCJ/IPSJ. Archived from the original (PDF) on 2022-03-10.
  4. ISO/TC97/SC2 (1985-08-01). ISO-IR-93: Japanese OCR-B - Additional Graphic Character Set (PDF). ITSCJ/IPSJ. Archived from the original (PDF) on 2022-03-10.
  5. 1 2 3 4 ISO/TC97/SC2 (1985-08-01). ISO-IR-94: Japanese Basic Hand-printed Graphic Character Set for OCR (PDF). ITSCJ/IPSJ. Archived from the original (PDF) on 2022-03-10.
  6. ISO/TC97/SC2 (1985-08-01). ISO-IR-96: Katakana Hand-printed Graphic Character Set for OCR (PDF). ITSCJ/IPSJ. Archived from the original (PDF) on 2022-03-10.
  7. 1 2 3 ISO/TC97/SC2 (1985-08-01). ISO-IR-95: Japanese Additional Hand-printed Graphic Character Set for OCR (PDF). ITSCJ/IPSJ. Archived from the original (PDF) on 2022-03-10.
  8. Unicode Consortium. "C0 Controls and Basic Latin" (PDF). The Unicode Standard.
  9. Unicode Consortium. "Optical Character Recognition" (PDF). The Unicode Standard.
  10. ISO/TC97/SC2 (1985-08-01). ISO-IR-98: A set of 14 graphic characters of the E13B font (PDF). ITSCJ/IPSJ. Archived from the original (PDF) on 2022-03-10.
  11. 1 2 Flohr, Guido. "Conversion routines for ISO_2033_1983". libintl. Locale::RecodeData::ISO_2033_1983.
  12. 1 2 "Character Sets". IANA.