CuneiForm (software)

Last updated
CuneiForm
Original author(s) Cognitive Technologies
Developer(s) Cognitive Technologies
Initial releaseSource April 2, 2008;15 years ago (2008-04-02) [1]
Stable release
1.1 / April 19, 2011;12 years ago (2011-04-19)
Written in C and C++
Operating system Cross-platform
Type Optical character recognition
License Freeware/BSD licenses
Website launchpad.net/cuneiform-linux   OOjs UI icon edit-ltr-progressive.svg

CuneiForm Cognitive OpenOCR is a freely distributed open-source OCR system developed by Russian software company Cognitive Technologies.

Contents

CuneiForm OCR was developed by Cognitive Technologies as a commercial product in 1993. The system came with the most popular models of scanners, MFPs and software in Russia and the rest of the world: Corel Draw, Hewlet-Packard, Epson, Xerox, Samsung, Brother, Mustek, OKI, Canon, Olivetti, etc.
In 2008 Cognitive Technologies opened the program's source codes.

Features

CuneiForm is a system developed for transforming the electronic copies of paper documents and image files into an editable form without changing the structure and the original document fonts in automatic or semi-automatic mode. The system includes two components for single and batch processing of electronic documents.

The list of languages supported by the system:

Besides, the system supports a mixture of Russian and English. Recognition of other mixed languages is only supported in the branch, developed by Andrei Borovsky in 2009. [2] Educating the system to recognize other languages is difficult since each language is related to a dat-file, the structure and development method of which are not disclosed by the developers.

History

1993 - Cognitive Technologies signed an OEM-contract with Corel, under the terms which Cognitive recognition library came embedded into the Corel Draw 3.0 (and later versions) package popular in the publishing sphere.

1994 – The contract with Hewlett-Packard on the equipment of all scanners imported into Russia with CuneiForm OCR. This was the first HP contract with a Russian software company.

1995 - The contract with the Japanese corporation Epson on supplying their scanners with the CuneiForm OCR. [3] The OEM contract was signed with the world's largest manufacturer of fax machines, laser printers, scanners and other office equipment - Brother Corporation. According to the agreement, the new roller scanner Brother IC-150 was equipped with Cognitive software for scanning and recognition worldwide.

1996 - OEM agreement with one of the world's largest manufacturers of monitors, fax machines, laser printers, MFPs and other office equipment - Samsung Information Systems America. According to the agreement the new multifunction device Samsung OFFICE MASTER OML-8630A was to be equipped with the Cognitive Cuneiform LE system of symbol optical recognition worldwide.

Adaptive Recognition - a method based on a combination of two types of printed character recognition algorithms: multifont and omnifont. The system generates an internal font for each input document based on well printed characters using a dynamic adjustment (adaptation) to the specific input symbols. Thus, the method combines the omnitude and the technological efficiency of the omnifont approach with the high font recognition accuracy that dramatically improves the recognition rate.

1997 – The first usage of neural network-based technologies in CuneiForm. The algorithms using neural networks for character recognition are developed as follows: the character image that is to be recognized (pattern) is reduced to a certain standard size (normalized). The luminance values of the normalized pattern are used as input parameters for the neural network. The number of output parameters of the neural network is equal to the number of recognized characters. The result of recognition is a symbol, which corresponds to the maximum value of the output vector of the neural network.

1999

2001 - OEM-contract with Canon on its scanners and multifunction devices equipment with Cognitive Technologies CuneiForm OCR software for Eastern Europe

Development prospects

See also

Related Research Articles

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">Multi-function printer</span> Office machine

An MFP, multi-functional, all-in-one (AIO), or multi-function device (MFD), is an office machine which incorporates the functionality of multiple devices in one, so as to have a smaller footprint in a home or small business setting, or to provide centralized document management/distribution/production in a large-office setting. A typical MFP may act as a combination of some or all of the following devices: email, fax, photocopier, printer, scanner.

<span class="mw-page-title-main">Handwriting recognition</span> Ability of a computer to receive and interpret intelligible handwritten input

Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.

<span class="mw-page-title-main">Image scanner</span> Device that optically scans images, printed text

An image scanner—often abbreviated to just scanner—is a device that optically scans images, printed text, handwriting or an object and converts it to a digital image. Commonly used in offices are variations of the desktop flatbed scanner where the document is placed on a glass window for scanning. Hand-held scanners, where the device is moved by hand, have evolved from text scanning "wands" to 3D scanners used for industrial design, reverse engineering, test and measurement, orthotics, gaming and other applications. Mechanically driven scanners that move the document are typically used for large-format documents, where a flatbed design would be impractical.

Optical mark recognition (OMR) collects data from people by identifying markings on a paper. OMR enables the hourly processing of hundreds or even thousands of documents. For instance, students may remember completing quizzes or surveys that required them to use a pencil to fill in bubbles on paper. A teacher or teacher's aide would fill out the form, then feed the cards into a system that grades or collects data from them.

Document processing is a field of research and a set of production processes aimed at making an analog document digital. Document processing does not simply aim to photograph or scan a document to obtain a digital image, but also to make it digitally intelligible. This includes extracting the structure of the document or the layout and then the content, which can take the form of text or images. The process can involve traditional computer vision algorithms, convolutional neural networks or manual labor. The problems addressed are related to semantic segmentation, object detection, optical character recognition (OCR), handwritten text recognition (HTR) and, more broadly, transcription, whether automatic or not. The term can also include the phase of digitizing the document using a scanner and the phase of interpreting the document, for example using natural language processing (NLP) or image classification technologies. It is applied in many industrial and scientific fields for the optimization of administrative processes, mail processing and the digitization of analog archives and historical documents.

Automatic identification and data capture (AIDC) refers to the methods of automatically identifying objects, collecting data about them, and entering them directly into computer systems, without human involvement. Technologies typically considered as part of AIDC include QR codes, bar codes, radio frequency identification (RFID), biometrics, magnetic stripes, optical character recognition (OCR), smart cards, and voice recognition. AIDC is also commonly referred to as "Automatic Identification", "Auto-ID" and "Automatic Data Capture".

DocuShare is a content management system developed by Xerox Corporation. DocuShare makes use of open standards and allows for managing content, integrating it with other business systems, and developing customized and packaged software applications.

<span class="mw-page-title-main">Delrina</span> Canadian software company founded in 1988

Delrina Corporation was a Canadian software company active from 1988 to 1995. The company was best known for WinFax, a software package which enabled computers equipped with fax modems to transmit copies of documents to standalone fax machines or other similarly equipped computers. It also sold PerForm and FormFlow, electronic form software. Delrina was acquired by the American software firm Symantec in 1995.

WinFax is a discontinued Microsoft Windows-based software product developed and published by Delrina designed to let computers equipped with fax-modems communicate directly to stand-alone fax machines, or other similarly equipped computers.

Intelligent character recognition (ICR) is used to extract handwritten text from image images using ICR, also referred to as intelligent OCR. It is a more sophisticated type of OCR technology that recognizes different handwriting styles and fonts to intelligently interpret data on forms and physical documents.

TeleForm is a form of processing applications originally developed by Cardiff Software and now is owned by OpenText.

A paperless office is a work environment in which the use of paper is eliminated or greatly reduced. This is done by converting documents and other papers into digital form, a process known as digitization. Proponents claim that "going paperless" can save money, boost productivity, save space, make documentation and information sharing easier, keep personal information more secure, and help the environment. The concept can be extended to communications outside the office as well.

<span class="mw-page-title-main">OCRopus</span>

OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.

<span class="mw-page-title-main">OCRFeeder</span>

OCRFeeder is an optical character recognition suite for GNOME, which also supports virtually any command-line OCR engine, such as CuneiForm, GOCR, Ocrad and Tesseract. It converts paper documents to digital document files and can serve to make them accessible to visually impaired users.

The Mopria Alliance is an association of printer and scanner manufacturers and producers of related software, that develops "universal standards and solutions for scan and print". The alliance was formed in September 2013 by Canon, HP, Samsung, and Xerox.

Cognitive Technologies is a Russian software corporation that develops corporate business applications, AI-based advanced driver assistance systems. Founded in 1993 in Moscow (Russia), the company has offices in Eastern Europe, with R&D Centers in Russia.

Olga Uskova is a Russian tech entrepreneur, investor and philanthropist. She is a founder and president of Cognitive Technologies, one of the leading software development companies in Russia and Eastern Europe. Since 2012 she is head of the Department of Engineering Cybernetics at National University of Science and Technology MISiS.

<span class="mw-page-title-main">OCR Systems</span> American computing company

OCR Systems, Inc., was an American computer hardware manufacturer and software publisher dedicated to optical character recognition technologies. The company's first product, the System 1000 in 1970, was used by numerous large corporations for bill processing and mail sorting. Following a series of pitfalls in the 1970s and early 1980s, founder Theodor Herzl Levine put the company in the hands of Gregory Boleslavsky and Vadim Brikman, the company's vice presidents and recent immigrants from the Soviet Ukraine, who were able to turn OCR System's fortunes around and expand its employee base. The company released the software-based OCR application ReadRight for DOS, later ported to Windows, in the late 1980s. Adobe Inc. bought the company in 1992.

References

  1. "Cognitive Technologies открыла код OCR Cuneiform". Archived from the original on 2009-11-06. Retrieved 2008-04-02.
  2. "~anb-symmetrica/Cuneiform-linux/Cuneiform-multilang : Revision 400".
  3. PCworld
  4. Cuneiform-Qt
  5. Cuneiform Linux 0.9.0 is released