This comparison of optical character recognition software includes:
Name | Founded year | Latest stable version | Latest release year | License | Online | Windows | Mac OS X | Linux | BSD | Android | iOS | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ABBYY FineReader | 1989 | 16 | 2022 | Proprietary | Yes | Yes | Yes | No | Yes | Yes | Yes | C/C++ | Yes | 192 [1] | All fonts | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2 [2] | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac. [3] |
AIDA | 2016 | 13.0 | 2024 | Proprietary | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | All languages using Latin alphabet | Machine and handprinted text, Latin alphabet | DOCX, XLSX, PPTX, TXT, CSV, PDF, JSON, XML | AIDA is able to learn how to extract any value from any document, with a single click on a single document. [4] | |
AnyDoc Software | 1989 | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | VBScript | ? | ? | ? | Works with structured, semi-structured, and unstructured documents. | |
Asprise OCR SDK | 1998 | 15 | 2015 | Proprietary | Yes | Yes | Yes | Yes | Yes | ? | ? | Java, C#,VB.NET, C/C++/Delphi | Yes | 20+ [5] | ? | Plain text, searchable PDF, XML [6] | Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix. [7] |
CuneiForm | 1996 | 1.1 | 2011 | BSD variant | No | Yes | Yes | Yes | Yes | ? | ? | C/C++ | Yes | 28 | Any printed font | HTML, hOCR, native, RTF, TeX, TXT [8] | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure |
E-aksharayan | 2010 | Yes | No | Yes | No | ? | ? | 14 | RTF, TXT, BRL | ||||||||
GOCR | 2000 | 0.52 [9] | 2018 | GPL | Yes [10] | Yes | Yes | Yes | Yes | ? | ? | C | ? | 20+ | ? | ||
Google Drive OCR or Google Cloud Vision | 2015 | Proprietary | Yes | Browser | Browser | Browser | Unknown | ? | ? | Unknown | Yes | 200+ | All fonts | text | Google blog post [11] [12] | ||
Microsoft Office Document Imaging | ? | Office 2007 | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ? | ? | Uses OmniPage[ citation needed ] | |
Microsoft Office OneNote 2007 | 2011 | ? | 2007 | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ? | ? | ||
OCRFeeder | 2009-03 | 0.8.5 | 2022 | GPL | No | No | No | Yes | No | ? | ? | Python | ? | ? | ? | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad | |
Ocrad | ? | 0.29 [13] | 2024 | GPL | Yes | No | Yes | Yes | Yes | ? | ? | C++ | Yes | Latin alphabet | ? | Command line | |
OCRopus | 2007 | 1.3.3 | 2017 | Apache | No | No | Yes | Yes | Yes | ? | ? | Python | ? | All languages using Latin script (other languages can be trained) | Normal Latin script and Fraktur (other scripts can be trained) | TXT, hOCR, [14] PDF [15] | Pluggable framework under active development, used for Google Books |
OmniPage | 1970s | 19.2 | 2015 | Proprietary | Yes | Yes | Yes | Yes | No | ? | ? | C/C++, C# [16] | Yes | 125 [17] | Machine and handprinted fonts | DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 | Product of Nuance Communications |
Puma.NET | ? | ? | 2009 | BSD | No | Yes | No | No | No | ? | ? | C# | Yes | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications | |
ReadSoft | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ? | ? | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | |
Scantron | ? | ? | ? | Proprietary | No | Yes | No | No | No | ? | ? | ? | ? | ? | ? | For working with localized interfaces, corresponding language support is required. | |
SmartScore | 1991 | 10.5.8 | 2015 | Proprietary | No | Yes | Yes | No | No | ? | ? | ? | ? | ? | ? | For musical scores | |
Tesseract | 1985 | 5.4.1 | 2024 | Apache | No | Yes | Yes | Yes | Yes | ? | ? | C++, C | Yes | 100+ [18] | Any printed font | Text, ALTO, hOCR, [19] PDF, others with different user interfaces [20] or the API | Created by Hewlett-Packard; under further development by Google [21] |
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Android | iOS | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
A 2016 analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others. [22]
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.
This is a comparison of both historical and current web browsers based on developer, engine, platform(s), releases, license, and cost.
DocuShare® is a content management system developed by Xerox Corporation. It uses open standards and allows for managing content, integrating it with other business systems, and creating customized and packaged software applications.
OmniPage is an optical character recognition (OCR) application available from Kofax Incorporated.
This article compares computer software tools that are used for accomplishing comparisons of files of various types. The file types addressed by individual file comparison apps varies but may include text, symbols, images, audio, or video. This category of software tool is often called "file comparison" or "diff tool", but those effectively are equivalent terms — where the term "diff" is more commonly associated with the Unix diff
utility.
PaperPort is commercial document management software published by Tungsten Automation, used for working with scanned documents. It uses a built-in optical character recognition to create files in searchable Portable Document Format (PDF); text in these files is indexed and can be searched for with appropriate software, such as Microsoft's Windows Search. Earlier versions of PaperPort used OmniPage to provide this function. It provides image editing tools for these files.
Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development was sponsored by Google in 2006.
ABBYY FineReader PDF is an optical character recognition (OCR) application developed by ABBYY. First released in 1993, the program runs on Microsoft Windows and Apple macOS. Since v15, the Windows version can also edit PDF files.
Ocrad is an optical character recognition program and part of the GNU Project. It is free software licensed under the GNU GPL.
OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.
hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.
OCRFeeder is an optical character recognition suite for GNOME, which also supports virtually any command-line OCR engine, such as CuneiForm, GOCR, Ocrad and Tesseract. It converts paper documents to digital document files and can serve to make them accessible to visually impaired users.
Audiveris is an open source tool for optical music recognition (OMR).
Asprise OCR is a commercial optical character recognition and barcode recognition SDK library that provides an API to recognize text as well as barcodes from images and output in formats like plain text, XML and searchable PDF.
Indic OCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly, it can also refer to the OCR systems of Brahmic scripts for languages of South Asia and Southeast Asia, not just the scripts of the Indian subcontinent, which are all written in an abugida-based writing system.
Scene text is text that appears in an image captured by a camera in an outdoor environment.
ABBYY is an American technology company specializing in AI-powered document processing and automation, data capture, process mining and optical character recognition (OCR). It was founded in the USSR and operated in Russia for nine years before moving to the United States. Primarily focused on software as a service model, the company serves clients worldwide. One of ABBYY's best-known products is ABBYY FineReader, an optical character recognition (OCR) computer program.