Timeline of optical character recognition

Last updated August 26, 2024

This is a timeline of optical character recognition.

Overview

Time period	Summary
1870–1931	Earliest ideas of optical character recognition (OCR) are conceived. Fournier d'Albe's Optophone and Tauschek's Reading Machine are developed as devices to help the blind read.^[1]
1931–1954	First OCR tools are invented and applied in industry, able to interpret Morse code and read text out loud. The Intelligent Machines Research Corporation is the first company created to sell such tools.
1954–1974	The Optacon, the first portable OCR device, is developed. Similar devices are used to digitise Reader's Digest coupons and postal addresses. Special typefaces are designed to facilitate scanning.^[1]^[2]^[3]
1974–2000	Scanners are used massively to read price tags and passports.^[4] Companies such as Caere Corporation, ABBYY and Kurzweil Computer Products Inc, are created. The latter one develops the first omni-font OCR software, capable of reading any text document.^[5]
2000–2016	OCR software is made available online for free, through products like Adobe Acrobat, WebOCR, and Google Drive.^[6]^[7]

Timeline

Year	Event type	Technology	Details
1870	Invention		American inventor Charles R. Carey invents the retina scanner, an image transmission system using a mosaic of photocells, considered the first OCR invention in the world.^[1]
1885	Invention	Image scanner	Paul Nipkow invents the Nipkow disk, an image scanning device that later will be a major breakthrough both for modern television and reading machines.^[8]
1900	Invention		Russian scientist Tyurin envisions the first OCR machine to serve as an aid to the visually handicapped, but never manages to develop it.^[1]
1912	Product	Text-to-speech	Edmund Fournier d'Albe develops the Optophone, a handheld scanner that when moved across a printed page, produces tones that corresponded to specific letters or characters, so as to be interpreted by a blind person.^[9]^[10]
1916	Patent		American engineer John B. Flowers patents the "One-Eyed Machine Stenographer", a machine capable of reading and typing a script. It worked by superimposing all the letters to find a point that marked each of them.^[11]
1921	Invention	Text-to-tactile sensations	Italian professor Ciro Codelupi envisions the "Reading machine for the blind", capable of transforming luminous sensations into tactile sensations.^[12]
1929	Invention		Austrian engineer Gustav Tauschek creates the first OCR device called the "Reading Machine", with a photo-sensor pointing light on words when they corresponded to a content template in its memory.^[13]
1931	Patent	Text-to-telegraph	Israeli physicist and inventor Emanuel Goldberg is granted a patent for his "Statistical machine" (US Patent 1838389), which was later acquired by IBM. It was described as capable of reading characters and converting them into standard telegraph code.^[1]
1938	Invention		MIT professor Vannevar Bush develops the Microfilm Rapid Selector, a similar but simpler Goldberg' statistical machine, and 40 times faster.^[14]
1949	Application		Engineers working on the Radio Corporation of America start a project to help the blind and the U.S. Department of Veterans Affairs, using the first text-to-speech techniques.^[15]
1951	Invention	Text & Morse-to-speech	American cryptoanalyst David H. Shepard and Harvey Cook Jr. build "Gismo", a machine able to read aloud letter by letter and interpret Morse code (U.S. Patent 2,663,758).
1952	Company		The Intelligent Machines Research Corporation is founded by D. Shepard and William Lawless Jr, to commercialise Gismo (later renamed to "Analysing Reader").^[16]
1954	Application		American magazine Reader's Digest becomes the first business to install an OCR reader, used to convert typewritten sales reports into punched cards.^[1]
1962	Invention	Portability	Stanford professor John Linvill develops the Optacon, the first portable reading device for the blind.^[17]
1965	Application		Reader's Digest expands its OCR use to digitise serial numbers of coupons. with a RCA 501 computer.^{[ citation needed ]}
1965	Invention		American inventor Jacob Rabinow develops an OCR machine to sort mail from the US Post Office.^[3]
1966	Invention	Handwriting scanner	The IBM Rochester lab develops the IBM 1287, the first scanner capable of reading any handwritten numbers.^[18]
1966	Patent		Linvill is granted the patent for the Optacon, described as "Reading aid for the blind" (U.S. patent 3229387).
1968	Invention	Typefaces	American Type Founders and Swiss designer Adrian Frutiger introduced OCR-A and OCR-B; typefaces made to facilitate OCR operations.^[2]^[19]
1969			The US Army implemented what may have been one of the first major applications using OCR technology by converting their manual allotment program to a centralized system using IBM 360 computers. The process involved the purchase of IBM Selectric typewriters using Time Roman font 12 for all of its finance offices around the world. This application allowed all military personnel to allot portions of their paycheck through automated payroll deductions to pay bills, send to savings, etc. which eliminated monthly processing. The success of this program paved the way for all military services to follow and eventually led to the conversion to a fully automated pay system years later.^{[ citation needed ]}
1971	Application	Postal scanner	Canadian postal operator Canada Post starts using OCR systems, to read the name and address on the envelopes and to print barcodes, using ultraviolet ink (U.S. Patent 5420403).^[20]
1974	Company	Omni-font	American inventor Ray Kurzweil creates Kurzweil Computer Products Inc., which develops the first omni-font OCR software, able to recognize text printed in virtually any font.^[4]
1976	Company		Dallas company Recognition Equipment Inc. is founded to read credit card receipts from gasoline purchases (U.S. Patent 4027141).^[8]
1977	Company	Commercialisation	Robert Noyce founds the Caere Corporation (now Nuance Communications), and introduces the first commercial handheld OCR reader.^[21]
1978	Product		Kurzweil Computer Products begins selling a commercial version of the OCR computer program, called the "Kurzweil Reading Machine".^[5]
1980	Selling		Kurzweil's company is sold to Xerox, who renamed it as Scansoft (now merged with Nuance Communications).^[8]
1984	Product	Passport scanner	Caere Corporation develops the first passport scanner for the U.S. State Department.^[22]
1987	Application	Price tag scanner	American retailers Sears, Kmart and J.C. Penney start using OCR to scan price tags.^[20]
1989	Company		OCR Russian company ABBYY is founded by David Yang, and starts selling products intended to simplify converting paper files to digital data.^[23]
1992	Invention		The first program that recognizes Cyrillic is invented by Russian company OKRUS.^[1]
2000	Application	Online service	OCR technology is made available online as a service (WebOCR), in a cloud computing environment, as well as in mobile applications like real-time translation of foreign-language signs on a smartphone.^[24]
2005	Application	Software	The free cross-platform OCR engine Tesseract is published by Hewlett Packard and the University of Nevada, Las Vegas.
2008	Application		Adobe Acrobat starts including support for OCR on any PDF file.^[7]
2011	Application	Word-frequency lookup	Google Ngram Viewer is developed to chart frequencies of words on any source printed from 1950 to 2008.^[25]^[26]
2013	Application		The MNIST database is created to train machine learning models in pattern recognition.^[27]
2015	Application	Open access	Google offers OCR tools to scan any Google Drive files in over 200 languages for free.^[6]

Related Research Articles

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">Monospaced font</span> Font whose characters occupy the same amount of horizontal space

A monospaced font, also called a fixed-pitch, fixed-width, or non-proportional font, is a font whose letters and characters each occupy the same amount of horizontal space. This contrasts with variable-width fonts, where the letters and spacings have different widths.

Magnetic ink character recognition code, known in short as MICR code, is a character recognition technology used mainly by the banking industry to streamline the processing and clearance of cheques and other documents. MICR encoding, called the MICR line, is at the bottom of cheques and other vouchers and typically includes the document-type indicator, bank code, bank account number, cheque number, cheque amount, and a control indicator. The format for the bank code and bank account number is country-specific.

<span class="mw-page-title-main">Data entry clerk</span> Profession

A data entry clerk, also known as data preparation and control operator, data registration and control operator, and data preparation and registration operator, is a member of staff employed to enter or update data into a computer system. Data is often entered into a computer from paper documents using a keyboard. The keyboards used can often have special keys and multiple colors to help in the task and speed up the work. Proper ergonomics at the workstation is a common topic considered.

Automatic identification and data capture (AIDC) refers to the methods of automatically identifying objects, collecting data about them, and entering them directly into computer systems, without human involvement. Technologies typically considered as part of AIDC include QR codes, bar codes, radio frequency identification (RFID), biometrics, magnetic stripes, optical character recognition (OCR), smart cards, and voice recognition. AIDC is also commonly referred to as "Automatic Identification", "Auto-ID" and "Automatic Data Capture".

A multiline optical-character reader, or MLOCR, is a type of mail sorting machine that uses optical character recognition (OCR) technology to determine how to route mail through the postal system.

Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI and MusicXML . In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used.

A reading machine is a piece of assistive technology that allows blind people to access printed materials. It scans text, converts the image into text by means of optical character recognition and uses a speech synthesizer to read out what it has found.

Piergiorgio Odifreddi is an Italian mathematician, logician, scholar of the history of science, and popular science writer and essayist, especially on philosophical atheism as a member of the Italian Union of Rationalist Atheists and Agnostics. He is philosophically and politically near to Bertrand Russell and Noam Chomsky.

OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm apart, and the reader required to accept any spacing between 0.2286 cm and 0.4572 cm.

Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that hard copy data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.

Yann André LeCun is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Professor of the Courant Institute of Mathematical Sciences at New York University and Vice-President, Chief AI Scientist at Meta.

OCR-B is a monospace font developed in 1968 by Adrian Frutiger for Monotype by following the European Computer Manufacturer's Association standard. Its function was to facilitate the optical character recognition operations by specific electronic devices, originally for financial and bank-oriented uses. It was accepted as the world standard in 1973. It follows the ISO 1073-2:1976 (E) standard, refined in 1979. It includes all ASCII symbols, and other symbols needed in the bank environment. It is widely used for the human readable digits in UPC/EAN barcodes. It is also used for machine-readable passports. It shares that purpose with OCR-A, but it is easier for the human eye and brain to read and it has a less technical look than OCR-A.

The Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in printed sources published between 1500 and 2022 in Google's text corpora in English, Chinese (simplified), French, German, Hebrew, Italian, Russian, or Spanish. There are also some specialized English corpora, such as American English, British English, and English Fiction.

Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards.

Scan-Optics LLC, founded in 1968, is an enterprise content management services company and optical character recognition (OCR) and image scanner manufacturer headquartered in Manchester, Connecticut.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

Leonardo3 is an interactive museum and exhibition center at Galleria Vittorio Emanuele II, Piazza della Scala, Milano, Italy. The museum was inaugurated in 2013, and is devoted to Italy’s notable personality Leonardo da Vinci, who is portrayed both as an artist and inventor.

Annalisa Malara is an Italian doctor who treated the first COVID-19 patient in Italy. She is an intensivist and anaesthesiologist in Codogno, Italy. She was named "personality of the year" by SkyTg24, a satellite news channel.

IBM designed, manufactured and sold optical mark and character readers from 1960 until 1984. The IBM 1287 is notable as being the first commercially sold scanner capable of reading handwritten numbers.

References

1 2 3 4 5 6 7 Schantz, H. F. (1982) The history of OCR: optical character recognition, Recognition Technologies Users Association.
1 2 Frutiger, Adrian. Type. Sign. Symbol. ABC Verlag, Zurich, 1980. p. 50
1 2 "Optical character recognition - History". ABBYY Technology. Retrieved 18 September 2016.
1 2 J. Scott Hauger, Reading Machines for the Blind ( PDF ), Blacksburg, Virginia, Faculty of the Virginia Polytechnic Institute and State University, April 1995, pp. I-II, 11-13.
1 2 "Kurzweil Computer Products". www.kurzweiltech.com. Retrieved 2016-09-18.
1 2 "Paper to Digital in 200+ languages". 6 May 2015. Retrieved 2016-09-18.
1 2 "Press Room". Adobe Systems. 14 July 2009. Retrieved 4 December 2010.
1 2 3 "The History of OCR". Data processing magazine. 12: 46. 1970.
↑ EE Fournier, The Type-Reading Optophone, Our Surplus, Our Ships, and Europe's Need, and more ( PDF ), inScientific American, vol. 123, nº 19, New York, Scientific American Publishing Co., November 6, 1920, pp. 463-465.
↑ d'Albe, E. E. Fournier (1914-07-01). "On a Type-Reading Optophone". Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. 90 (619): 373–375. Bibcode:1914RSPSA..90..373D. doi: 10.1098/rspa.1914.0061 . ISSN 1364-5021.
↑ La macchina che legge e che scrive (PDF), in La scienza per tutti, Year XXIII, nº 11, Milano, Casa Editrice Sozogno, 1º June 1916, p. 166. (italian)
↑ Macchina per leggere pei ciechi (PDF), in La scienza per tutti, Year XXVIII, nº 2, Milano, Casa Editrice Sozogno, 15 January 1921, p. 20 (italian)
↑ "History of Computers and Computing, Birth of the modern computer, The bases of digital computers, OCR". history-computer.com. Retrieved 2016-09-09.
↑ Buckland, Michael Keeble (2006-01-01). Emanuel Goldberg and His Knowledge Machine: Information, Invention, and Political Forces. Greenwood Publishing Group. ISBN 9780313313325.
↑ "Reading Machine Speaks Out Loud", February 1949, Popular Science.
↑ Douglas Martin (December 11, 2007). "David H. Shepard, 84, Dies; Optical Reader Inventor". New York Times . Retrieved June 5, 2010.
↑ "The Reading Machine That Hasn't Been Built Yet". AccessWorld. Retrieved 18 September 2016.
↑ "Rochester chronology". IBM. 23 January 2003. Archived from the original on March 28, 2008. Retrieved 18 September 2016.
↑ "OCR-A Std | Typekit". typekit.com. Retrieved 2016-09-18.
1 2 "Overview of OCR and Its Applications" (PDF). Understanding Optical Character Recognition. Retrieved 18 September 2016.
↑ "History of Caere Corporation – FundingUniverse". www.fundinguniverse.com. Retrieved 2016-09-23.
↑ Jacobson, Gary. "No grudges, Bill Moore says, but he still seeks justice". Dallas News. Retrieved 18 September 2016.
↑ "Mixergy interview: How A Bulletin Board Post Changed Everything – with David Yang" . Retrieved 22 August 2013.
↑ "Understanding Optical Character Recognition" (PDF). Bar Code & Data Acquisition. Retrieved 18 September 2016.
↑ "Google Ngram Database Tracks Popularity Of 500 Billion Words" Huffington Post, 17 December 2010, webpage: HP8150.
↑ "Culturomics, Ngrams and new power tools for Science". 10 August 2011. Retrieved 2016-09-18.
↑ "MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges". yann.lecun.com. Retrieved 2016-09-18.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 3 4 5 6 7 Schantz, H. F. (1982) The history of OCR: optical character recognition, Recognition Technologies Users Association.

[:5-2] 1 2 Frutiger, Adrian. Type. Sign. Symbol. ABC Verlag, Zurich, 1980. p. 50

[:6-3] 1 2 "Optical character recognition - History". ABBYY Technology. Retrieved 18 September 2016.

[:7-4] 1 2 J. Scott Hauger, Reading Machines for the Blind ( PDF ), Blacksburg, Virginia, Faculty of the Virginia Polytechnic Institute and State University, April 1995, pp. I-II, 11-13.

[:3-5] 1 2 "Kurzweil Computer Products". www.kurzweiltech.com. Retrieved 2016-09-18.

[:8-6] 1 2 "Paper to Digital in 200+ languages". 6 May 2015. Retrieved 2016-09-18.

[:9-7] 1 2 "Press Room". Adobe Systems. 14 July 2009. Retrieved 4 December 2010.

[:2-8] 1 2 3 "The History of OCR". Data processing magazine. 12: 46. 1970.

[9] EE Fournier, The Type-Reading Optophone, Our Surplus, Our Ships, and Europe's Need, and more ( PDF ), inScientific American, vol. 123, nº 19, New York, Scientific American Publishing Co., November 6, 1920, pp. 463-465.

[10] 'Albe, E. E. Fournier (1914-07-01). "On a Type-Reading Optophone". Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. 90 (619): 373–375. Bibcode:1914RSPSA..90..373D. doi: 10.1098/rspa.1914.0061 . ISSN 1364-5021.

[11] La macchina che legge e che scrive (PDF), in La scienza per tutti, Year XXIII, nº 11, Milano, Casa Editrice Sozogno, 1º June 1916, p. 166. (italian)

[12] Macchina per leggere pei ciechi (PDF), in La scienza per tutti, Year XXVIII, nº 2, Milano, Casa Editrice Sozogno, 15 January 1921, p. 20 (italian)

[13] "History of Computers and Computing, Birth of the modern computer, The bases of digital computers, OCR". history-computer.com. Retrieved 2016-09-09.

[14] Buckland, Michael Keeble (2006-01-01). Emanuel Goldberg and His Knowledge Machine: Information, Invention, and Political Forces. Greenwood Publishing Group. ISBN 9780313313325.

[15] "Reading Machine Speaks Out Loud", February 1949, Popular Science.

[NT-16] Douglas Martin (December 11, 2007). "David H. Shepard, 84, Dies; Optical Reader Inventor". New York Times . Retrieved June 5, 2010.

[17] "The Reading Machine That Hasn't Been Built Yet". AccessWorld. Retrieved 18 September 2016.

[18] "Rochester chronology". IBM. 23 January 2003. Archived from the original on March 28, 2008. Retrieved 18 September 2016.

[19] "OCR-A Std | Typekit". typekit.com. Retrieved 2016-09-18.

[:1-20] 1 2 "Overview of OCR and Its Applications" (PDF). Understanding Optical Character Recognition. Retrieved 18 September 2016.

[21] "History of Caere Corporation – FundingUniverse". www.fundinguniverse.com. Retrieved 2016-09-23.

[22] Jacobson, Gary. "No grudges, Bill Moore says, but he still seeks justice". Dallas News. Retrieved 18 September 2016.

[Mixergy_interview-23] "Mixergy interview: How A Bulletin Board Post Changed Everything – with David Yang" . Retrieved 22 August 2013.

[24] "Understanding Optical Character Recognition" (PDF). Bar Code & Data Acquisition. Retrieved 18 September 2016.

[Huf-25] "Google Ngram Database Tracks Popularity Of 500 Billion Words" Huffington Post, 17 December 2010, webpage: HP8150.

[26] "Culturomics, Ngrams and new power tools for Science". 10 August 2011. Retrieved 2016-09-18.

[27] "MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges". yann.lecun.com. Retrieved 2016-09-18.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

Timeline of optical character recognition

Contents

Overview

Timeline

See also

Related Research Articles

References