Developer(s) | ABBYY |
---|---|
Initial release | July 1993 |
Stable release | 16.0.13.4766 [1] / 10 November 2022 |
Operating system | Windows, macOS, Linux |
Type | OCR |
License | Commercial proprietary software (Retail or volume licensing) |
Website | pdf |
ABBYY FineReader PDF is an optical character recognition (OCR) application developed by ABBYY, [2] [3] with support for PDF file editing since v15. The program runs under Microsoft Windows 7 or later, and (without PDF editing) Apple macOS 10.12 Sierra or later. The first version was released in 1993. [2]
The program allows the conversion of image documents (photos, scans, PDF files) and screen captures into editable file formats, including Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Rich Text Format, HTML, PDF/A, searchable PDF, CSV and txt (plain text) files. [4] From version 11 files can be saved in the DjVu format. Version 15 supports recognition of text in 192 languages and has a built-in spell check for 48 of them.
FineReader recognizes new characters by: training characters so that they are added to the recognition alphabet; selecting additional characters from a list and adding them to the alphabet of a selected language (for example, adding certain Icelandic characters to a German alphabet for a German text describing Iceland); and adding domain-specific vocabulary to the FineReader’s built-in lexicon. [5] The program also allows users to compare documents, add annotations and comments, and schedule batch processing. [6] [7]
As of 2015 [update] , there were more than 20 million users of ABBYY FineReader worldwide. [8] [2] [9] Based on FineReader optical character recognition, ABBYY licenses the technology to companies including Fujitsu, Panasonic, Xerox, Plustek and Samsung. [10] [11]
In February 2020, version 15 of the software was rated "Highest-quality OCR on the market" by PC Magazine . [12]
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020.
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.
Adobe Acrobat is a family of application software and Web services developed by Adobe Inc. to view, create, manipulate, print and manage Portable Document Format (PDF) files.
Microsoft OneNote is a note-taking software developed by Microsoft. It is available as part of the Microsoft 365 suite and since 2014 has been free on all platforms outside the suite. OneNote is designed for free-form information gathering and multi-user collaboration. It gathers users' notes, drawings, screen clippings, and audio commentaries. Notes can be shared with other OneNote users over the Internet or a network.
capella is a musical notation program or scorewriter developed by the German company Capella Software AG, running on Microsoft Windows or corresponding emulators in other operating systems, like Wine on Linux and others on Apple Macintosh. Capella requires to be activated after a trial period of 30 days. The publisher writes the name in lower case letters only. The program was initially created by Hartmut Ring, and is now maintained and developed by Bernd Jungmann.
Evernote is a note-taking and task-management application developed by the Evernote Corporation. It is intended for archiving and creating notes with embedded photos, audio, and saved web content. Notes are stored in virtual "notebooks" and can be tagged, annotated, edited, searched, and exported.
PaperPort is commercial document management software published by Kofax, used for working with scanned documents. It uses a built-in optical character recognition to create files in searchable Portable Document Format (PDF); text in these files is indexed and can be searched for with appropriate software, such as Microsoft's Windows Search. Earlier versions of PaperPort used OmniPage to provide this function. It provides image editing tools for these files.
Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.
CuneiForm Cognitive OpenOCR is a freely distributed open-source OCR system developed by Russian software company Cognitive Technologies.
OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm apart, and the reader required to accept any spacing between 0.2286 cm and 0.4572 cm.
hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.
This comparison of optical character recognition software includes:
Document Capture Software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.
OCRFeeder is an optical character recognition suite for GNOME, which also supports virtually any command-line OCR engine, such as CuneiForm, GOCR, Ocrad and Tesseract. It converts paper documents to digital document files and can serve to make them accessible to visually impaired users.
Microsoft Office shared tools are software components that are included in all Microsoft Office products.
Solid PDF Tools is a document reconstruction software product which allows users to convert PDFs into editable documents and create PDFs from a variety of file sources. The same technology used in the software's Solid Framework SDK is licensed by Adobe for Acrobat X
Asprise OCR is a commercial optical character recognition and barcode recognition SDK library that provides an API to recognize text as well as barcodes from images and output in formats like plain text, xml and searchable PDF.
Project Naptha is a browser extension software for Google Chrome that allows users to highlight, copy, edit and translate text from within images. It was created by developer Kevin Kwok, and released in April 2014 as a Chrome add-on. This software was first made available only on Google Chrome, downloadable from the Chrome Web Store. It was then made available on Mozilla Firefox, downloadable from the Mozilla Firefox add-ons repository but was soon removed. The reason behind the removal remains unknown.
ABBYY is a US-based company that develops solutions in the fields of intelligent document processing, data capture, process intelligence and optical character recognition (OCR). The company serves clients worldwide. One of ABBYY's best-known products is the ABBYY FineReader — an OCR application.