Category | Sans-serif |
---|---|
Classification | Neo-grotesque |
Designer(s) | Adrian Frutiger |
Date created | 1968 |
Sample |
OCR-B is a monospace font developed in 1968 by Adrian Frutiger for Monotype by following the European Computer Manufacturer's Association standard. Its function was to facilitate the optical character recognition operations by specific electronic devices, originally for financial and bank-oriented uses. It was accepted as the world standard in 1973. [1] It follows the ISO 1073-2:1976 (E) standard, refined in 1979 ("letterpress" design, size I). It includes all ASCII symbols, and other symbols needed in the bank environment. It is widely used for the human readable digits in UPC/EAN barcodes. [2] [ citation needed ] It is also used for machine-readable passports. [3] It shares that purpose with OCR-A, but it is easier for the human eye and brain to read and it has a less technical look than OCR-A.
In June 1961, the European Computer Manufacturers Association (ECMA) started standardization activities related to Optical Character Recognition (OCR). After evaluating existing OCR designs, it was decided to develop two new fonts: A stylized design with just digits, called “Class A”; and a more conventional type design with broader character coverage, called “Class B”. In February 1965, ECMA proposed a design for the “Class B” font to ISO, who adopted it as international standard ISO 1073-2 in October 1965. [4] The first revision contained three font sizes: I, II and III. The specification included a Letterpress design, intended for high-quality printing equipment; and a rounded-edge Constant Strokewidth design for impact printers [5] : 3 with reduced typographic quality.
In September 1969, ECMA started work to revise its published standard. To make OCR-B more widely accepted, the shapes of some characters were slightly modified. The new revision removed font size II, which had been rarely used in practice; it deleted five character shapes; and it added a new font size IV. ECMA published the second edition of OCR-B in October 1971. [4]
In March 1976, ECMA published a third revision of its ECMA-11 specification. It added the symbols § and ¥ to OCR-B; two types of erasure marks (█) for blackening out mis-printed characters were added; and the length of the Vertical bar was changed to match ISO 1073-2. [4]
In 1993, Turkey proposed extending ISO 1073-2 to include the Turkish letters Ğğ, İı, and Şş. [6] The request was generalized to extend OCR-B with a number of Latin and Greek letters used in European languages. [7] : 27 A revision of the ISO 1073-2:1976 standard was therefore started, producing three successive draft documents. The final draft would have extended OCR-B with 40 Latin and 10 Greek letters; for six Latin letters, the draft gave new alternate shapes. [7] : 26 A request to extend OCR-B with Vietnamese accents was rejected. [7] : 27 Other than previous versions of the standard, which specified glyph shapes via reference drawings, the new revision would have included the shapes in machine-readable form. [7] : 26 However, industry support for testing the new font could not be secured at the time, so the revision effort was halted in 1997. [7] : IV The working group described their findings in a technical report. [7] : 1
In June 1998, the European Committee for Standardization published a report for adding the Euro sign to OCR-B. [5] The report proposed both a single-stroked and a double-stroked variant of the Euro sign, leaving the decision to further testing of OCR performance. [5] : 4 Testing was difficult: the theoretical design methods used when the OCR-B glyphs were originally developed could no longer be reproduced, and the technological constraints of the 1960s were also not entirely relevant anymore in the OCR environments of the 1990s. [8] A new test method was devised, using present-time OCR technology. The tests found no difference in OCR performance between the two Euro variants, and recommended the adoption of the double-stroked variant as it matches the conventional glyph shape. [8] The project did not have funds to thoroughly test the glyph extensions of the 1993 proposal; initial results were inconclusive. [8]
Microsoft Office ships a version of Letterpress OCR-B produced by Monotype. It covers Windows-1252. [9] Many vendors, including Adobe, still sell their versions of OCR-A and OCR-B.
The TeX typesetting system has a public domain Constant Strokewidth OCR-B font in METAFONT definition form. It was created by Norbert Swartz in 1995 and updated in 2010. It has a setting for square stroke ends. [10] The definition has also been translated to METATYPE1, so the rounded version is available in TrueType and OpenType too. [11]
A version of Constant Strokewidth OCR-B by Matthew Anderson has extended character coverage. It is available under CC-BY 4.0. [12]
The MS-DOS OCR-B encoding is code page 877. Note that the grave, acute, circumflex (at 0x9B), tilde, diaeresis, and cedilla can be added over (in the case of the cedilla, under) letters to form accented letters. Note that the alternative Latin small latter m (not in Unicode), euro sign (€), and vertical bar (|) were later added to the code page, but it is unknown where they are located.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1x | [a] | |||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | [b] | } | ~ | [c] |
8x | ü | ä | å | Ä | Å | |||||||||||
9x | æ | Æ | ö | Ö | Ü | ^ | £ | ¥ | ||||||||
Ax | Ñ | ø | Ø | ˍ 02CD | ||||||||||||
Bx | IJ | ij | ||||||||||||||
Cx | ¤ | |||||||||||||||
Dx | ||||||||||||||||
Ex | ß | ´ | ||||||||||||||
Fx | § | ¸ | ¨ |
Characters not in Unicode:
ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.
Blackboard bold is a style of writing bold symbols on a blackboard by doubling certain strokes, commonly used in mathematical lectures, and the derived style of typeface used in printed mathematical texts. The style is most commonly used to represent the number sets , (integers), , , and .
A typeface is a design of letters, numbers and other symbols, to be used in printing or for electronic display. Most typefaces include variations in size, weight, slope, width, and so on. Each of these variations of the typeface is a font.
A cedilla, or cedille, is a hook or tail added under certain letters to indicate that their pronunciation is modified. In Catalan, French, and Portuguese it is used only under the letter ⟨c⟩, and the entire letter is called, respectively, c trencada, c cédille, and c cedilhado. It is used to mark vowel nasalization in many languages of Sub-Saharan Africa, including Vute from Cameroon.
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.
OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.
Magnetic ink character recognition code, known in short as MICR code, is a character recognition technology used mainly by the banking industry to streamline the processing and clearance of cheques and other documents. MICR encoding, called the MICR line, is at the bottom of cheques and other vouchers and typically includes the document-type indicator, bank code, bank account number, cheque number, cheque amount, and a control indicator. The format for the bank code and bank account number is country-specific.
In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.
The Romanian alphabet is a variant of the Latin alphabet used for writing the Romanian language. It is a modification of the classical Latin alphabet and consists of 31 letters, five of which have been modified from their Latin originals for the phonetic requirements of the language.
Segoe is a typeface, or family of fonts, that is best known for its use by Microsoft. The company uses Segoe in its online and printed marketing materials, including recent logos for a number of products. Additionally, the Segoe UI font sub-family is used by numerous Microsoft applications, and may be installed by applications. It was adopted as Microsoft's default operating system font, and is also used on Outlook.com, Microsoft's web-based email service. On August 23, 2012, Microsoft unveiled its new corporate logo typeset in Segoe, replacing the logo it had used for the previous 25 years.
A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.
OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm apart, and the reader required to accept any spacing between 0.2286 cm and 0.4572 cm.
Unicode input is method to add a specific Unicode character to a computer file; it is a common way to input characters not directly supported by a physical keyboard. Characters can be entered either by selecting them from a display, by typing a certain sequence of keys on a physical keyboard, or by drawing the symbol by hand on touch-sensitive screen. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.
Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.
KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.
Runic is a Unicode block containing runic characters. It was introduced in Unicode 3.0 (1999), with eight additional characters introduced in Unicode 7.0 (2014). The original encoding of runes in UCS was based on the recommendations of the "ISO Runes Project" submitted in 1997.
Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards.
The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.