OCR-B

Last updated
OCR-B
OCR-B font.svg
Category Sans-serif
Classification Neo-grotesque
Designer(s) Adrian Frutiger
Date created1968
Typeface specimen OCR B.svg
Sample

OCR-B is a monospace font developed in 1968 by Adrian Frutiger for Monotype by following the European Computer Manufacturer's Association standard. Its function was to facilitate the optical character recognition operations by specific electronic devices, originally for financial and bank-oriented uses. It was accepted as the world standard in 1973. [1] It follows the ISO 1073-2:1976 (E) standard, refined in 1979 ("letterpress" design, size I). It includes all ASCII symbols, and other symbols needed in the bank environment. It is widely used for the human readable digits in UPC/EAN barcodes. [2] [ citation needed ] It is also used for machine-readable passports. [3] It shares that purpose with OCR-A, but it is easier for the human eye and brain to read and it has a less technical look than OCR-A.

Contents

History

In June 1961, the European Computer Manufacturers Association (ECMA) started standardization activities related to Optical Character Recognition (OCR). After evaluating existing OCR designs, it was decided to develop two new fonts: A stylized design with just digits, called “Class A”; and a more conventional type design with broader character coverage, called “Class B”. In February 1965, ECMA proposed a design for the “Class B” font to ISO, who adopted it as international standard ISO 1073-2 in October 1965. [4] The first revision contained three font sizes: I, II and III. The specification included a Letterpress design, intended for high-quality printing equipment; and a rounded-edge Constant Strokewidth design for impact printers [5] :3 with reduced typographic quality.

In September 1969, ECMA started work to revise its published standard. To make OCR-B more widely accepted, the shapes of some characters were slightly modified. The new revision removed font size II, which had been rarely used in practice; it deleted five character shapes; and it added a new font size IV. ECMA published the second edition of OCR-B in October 1971. [4]

In March 1976, ECMA published a third revision of its ECMA-11 specification. It added the symbols § and ¥ to OCR-B; two types of erasure marks (█) for blackening out mis-printed characters were added; and the length of the Vertical bar was changed to match ISO 1073-2. [4]

In 1993, Turkey proposed extending ISO 1073-2 to include the Turkish letters Ğğ, İı, and Şş. [6] The request was generalized to extend OCR-B with a number of Latin and Greek letters used in European languages. [7] :27 A revision of the ISO 1073-2:1976 standard was therefore started, producing three successive draft documents. The final draft would have extended OCR-B with 40 Latin and 10 Greek letters; for six Latin letters, the draft gave new alternate shapes. [7] :26 A request to extend OCR-B with Vietnamese accents was rejected. [7] :27 Other than previous versions of the standard, which specified glyph shapes via reference drawings, the new revision would have included the shapes in machine-readable form. [7] :26 However, industry support for testing the new font could not be secured at the time, so the revision effort was halted in 1997. [7] :IV The working group described their findings in a technical report. [7] :1

Two proposed variants for the OCR-B Euro sign OCR-B-Euro-Proposals.png
Two proposed variants for the OCR-B Euro sign

In June 1998, the European Committee for Standardization published a report for adding the Euro sign to OCR-B. [5] The report proposed both a single-stroked and a double-stroked variant of the Euro sign, leaving the decision to further testing of OCR performance. [5] :4 Testing was difficult: the theoretical design methods used when the OCR-B glyphs were originally developed could no longer be reproduced, and the technological constraints of the 1960s were also not entirely relevant anymore in the OCR environments of the 1990s. [8] A new test method was devised, using present-time OCR technology. The tests found no difference in OCR performance between the two Euro variants, and recommended the adoption of the double-stroked variant as it matches the conventional glyph shape. [8] The project did not have funds to thorougly test the glyph extensions of the 1993 proposal; initial results were inconclusive. [8]

Availability

Microsoft Office ships a version of Letterpress OCR-B produced by Monotype. It covers Windows-1252. [9] Many vendors, including Adobe, still sell their versions of OCR-A and OCR-B.

The TeX typesetting system has a public domain Constant Strokewidth OCR-B font in METAFONT definition form. It was created by Norbert Swartz in 1995 and updated in 2010. It has a setting for square stroke ends. [10] The definition has also been translated to METATYPE1, so the rounded version is available in TrueType and OpenType too. [11]

A version of Constant Strokewidth OCR-B by Matthew Anderson has extended character coverage. It is available under CC-BY 4.0. [12]

MS-DOS OCR-B encoding

The MS-DOS OCR-B encoding is code page 877. Note that the grave, acute, circumflex (at 0x9B), tilde, diaeresis, and cedilla can be added over (in the case of the cedilla, under) letters to form accented letters.

MS-DOS OCR-B [13]
0123456789ABCDEF
0x
1x [a]
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ [b]
8x ü ä å Ä Å
9x æ Æ ö Ö Ü ^ £ ¥
Ax Ñ ø Ø ˍ
02CD
Bx IJ ij
Cx ¤
Dx
Ex ß ´
Fx § ¸ ¨

Characters not in Unicode:

Related Research Articles

<span class="mw-page-title-main">Glyph</span> Element of writing

A glyph is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A grapheme, or part of a grapheme, or sometimes several graphemes in combination can be represented by a glyph.

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">Typeface</span> Set of characters that share common design features

A typeface is a design of letters, numbers and other symbols, to be used in printing or for electronic display. Most typefaces include variations in size, weight, slope, width, and so on. Each of these variations of the typeface is a font.

A cedilla, or cedille, is a hook or tail added under certain letters as a diacritical mark to modify their pronunciation. In Catalan, French, and Portuguese it is used only under the letter c, and the entire letter is called, respectively, c trencada, c cédille, and c cedilhado. It is used to mark vowel nasalization in many languages of sub-Saharan Africa, including Vute from Cameroon.

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

Magnetic ink character recognition code, known in short as MICR code, is a character recognition technology used mainly by the banking industry to streamline the processing and clearance of cheques and other documents. MICR encoding, called the MICR line, is at the bottom of cheques and other vouchers and typically includes the document-type indicator, bank code, bank account number, cheque number, cheque amount, and a control indicator. The format for the bank code and bank account number is country-specific.

<span class="mw-page-title-main">Romanian alphabet</span> Variant of the Latin alphabet

The Romanian alphabet is a variant of the Latin alphabet used for writing the Romanian language. It is a modification of the classical Latin alphabet and consists of 31 letters, five of which have been modified from their Latin originals for the phonetic requirements of the language:

Wingdings is a series of dingbat fonts that render letters as a variety of symbols. They were originally developed in 1990 by Microsoft by combining glyphs from Lucida Icons, Arrows, and Stars licensed from Charles Bigelow and Kris Holmes. Certain versions of the font's copyright string include attribution to Type Solutions, Inc., the maker of a tool used to hint the font.

<span class="mw-page-title-main">Code page 437</span> Character set of the original IBM PC

Code page 437 is the character set of the original IBM PC. It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII".

<span class="mw-page-title-main">Bitstream Cyberbit</span>

Bitstream Cyberbit is a commercial serif Unicode font designed by Bitstream Inc. It is freeware for non-commercial uses. It was one of the first widely available fonts to support a large portion of the Unicode repertoire.

Segoe is a typeface, or family of fonts, that is best known for its use by Microsoft. The company uses Segoe in its online and printed marketing materials, including recent logos for a number of products. Additionally, the Segoe UI font sub-family is used by numerous Microsoft applications, and may be installed by applications. It was adopted as Microsoft's default operating system font beginning with Windows Vista, and is also used on Outlook.com, Microsoft's web-based email service. In August 2012, Microsoft unveiled its new corporate logo typeset in Segoe, replacing the logo it had used for the previous 25 years.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

<span class="mw-page-title-main">Microsoft Sans Serif</span> Neo-grotesque sans-serif typeface

Microsoft Sans Serif is a sans-serif typeface introduced with early Microsoft Windows versions. It is the successor of MS Sans Serif, formerly Helv, a proportional bitmap font introduced in Windows 1.0. Both typefaces are very similar in design to Arial and Helvetica. The typeface was designed to match the MS Sans bitmap included in the early releases of Microsoft Windows.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

<span class="mw-page-title-main">OCR-A</span> Typeface designed for early computer OCR

OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm apart, and the reader required to accept any spacing between 0.2286 cm and 0.4572 cm.

<span class="mw-page-title-main">Extended ASCII</span> Nickname for 8-bit ASCII-derived character sets

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.

References

  1. Frutiger, Adrian. Type. Sign. Symbol. ABC Verlag, Zurich, 1980. p. 50
  2. "GS1 Human Readable Interpretation (HRI) Implementation Guideline" (PDF). GS1 AISBL. 2018. p. 13. Retrieved 2018-09-27.
  3. Doc 9303: Machine Readable Travel Documents, Part 3: Specifications Common to all MRTDs (PDF) (Eighth ed.). International Civil Aviation Organization. 2015. p. 25. ISBN   978-92-9249-792-7 . Retrieved 2016-03-03.
  4. 1 2 3 "Standard ECMA-11 for the Alphanumeric Character Set OCR-B for Optical Recognition" (PDF). European Computer Manufacturers Association. March 1976. Section “Brief History”.
  5. 1 2 3 4 "Draft Report on the Euro Glyph in OCR-B" (PDF). June 28, 1998.
  6. Karl Ivar Larsson (August 8, 2000). "Notes on transfer of responsibility for OCR-B standards".
  7. 1 2 3 4 5 6 "Proposal for Type 3 Technical Report, TR 15907, Information technology — Revision of OCR-B standard (ISO 1073/II-1976)" (PDF). September 28, 1998.
  8. 1 2 3 Karsson, Kent Ivar (June 28, 1998), Report to TC304 on OCR-B situation, Unicode Technical Committee, Unicode Consortium, UTC Document L2/01-259
  9. "OCRB font family - Typography". 30 March 2022.
  10. "CTAN: /Tex-archive/Fonts/Ocr-b".
  11. "OCR a and OCR B".
  12. "OCR-B". wehtt.am. Archived from the original on 28 March 2019. Retrieved 11 January 2022.
  13. "Code Page 877" (PDF). Archived from the original (PDF) on 2013-01-21.