Optical braille recognition

Last updated
Image of a page showing both the raised braille characters, and the recessed characters on the other side of the page. Braille text.jpg
Image of a page showing both the raised braille characters, and the recessed characters on the other side of the page.

Optical braille recognition is technology to capture and process images of braille characters into natural language characters. It is used to convert braille documents for people who cannot read them into text, and for preservation and reproduction of the documents.

Contents

History

In 1984, a group of researchers at the Delft University of Technology designed a braille reading tablet, in which a reading head with photosensitive cells was moved along set of rulers to capture braille text line-by-line. [1] In 1988, a group of French researchers at the Lille University of Science and Technology developed an algorithm, called Lectobraille, which converted braille documents into plain text. The system photographed the braille text with a low-resolution CCD camera, and used spatial filtering techniques, median filtering, erosion, and dilation to extract the braille. The braille characters were then converted to natural language using adaptive recognition. [2] The Lectobraille technique had an error rate of 1%, and took an average processing time of seven seconds per line. [1] In 1993, a group of researchers from the Katholieke Universiteit Leuven developed a system to recognize braille that had been scanned with a commercially available scanner. [1] The system, however, was unable to handle deformities in the braille grid, so well-formed braille documents were required. [3] In 1999, a group at the Hong Kong Polytechnic University implemented an optical braille recognition technique using edge detection to translate braille into English or Chinese text. [4] In 2001, Murray and Dais created a handheld recognition system, that scanned small sections of a document at once. [5] Because of the small area scanned at once, grid deformation was less of an issue, and a simpler, more efficient algorithm was employed. [3] In 2003, Morgavi and Morando designed a system to recognize braille characters using artificial neural networks. This system was noted for its ability to handle image degradation more successfully than other approaches. [3]

Challenges

Many of the challenges to successfully processing braille text arise from the nature of braille documents. Braille is generally printed on solid-color paper, with no ink to produce contrast between the raised characters and the background paper. However, imperfections in the page can appear in a scan or image of the page.

Many documents are printed inter-point, meaning they are double-sided. As such, the depressions of the braille of one side appear interlaid with the protruding braille of the other side. [6]

Techniques

Some optical braille recognition techniques attempt to use oblique lighting and a camera to reveal the shadows of the depressions and protrusions of the braille. Others make use of commercially available document scanners. [6]

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

In computer vision or natural language processing, document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order. Detection and labeling of the different zones as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis. But text zones play different logical roles inside the document and this kind of semantic labeling is the scope of the logical layout analysis.

Document processing is a field of research and a set of production processes aimed at making an analog document digital. Document processing does not simply aim to photograph or scan a document to obtain a digital image, but also to make it digitally intelligible. This includes extracting the structure of the document or the layout and then the content, which can take the form of text or images. The process can involve traditional computer vision algorithms, convolutional neural networks or manual labor. The problems addressed are related to semantic segmentation, object detection, optical character recognition (OCR), handwritten text recognition (HTR) and, more broadly, transcription, whether automatic or not. The term can also include the phase of digitizing the document using a scanner and the phase of interpreting the document, for example using natural language processing (NLP) or image classification technologies. It is applied in many industrial and scientific fields for the optimization of administrative processes, mail processing and the digitization of analog archives and historical documents.

<span class="mw-page-title-main">3D scanning</span> Scanning of an object or environment to collect data on its shape

3D scanning is the process of analyzing a real-world object or environment to collect three dimensional data of its shape and possibly its appearance. The collected data can then be used to construct digital 3D models.

Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI and MusicXML . In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used.

Structure from motion (SfM) is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. It is studied in the fields of computer vision and visual perception.

<span class="mw-page-title-main">OCRopus</span>

OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.

In computer vision, the bag-of-words model sometimes called bag-of-visual-words model can be applied to image classification or retrieval, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.

<span class="mw-page-title-main">Visual odometry</span> Determining the position and orientation of a robot by analyzing associated camera images

In robotics and computer vision, visual odometry is the process of determining the position and orientation of a robot by analyzing the associated camera images. It has been used in a wide variety of robotic applications, such as on the Mars Exploration Rovers.

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that hard copy data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.

Computer security compromised by hardware failure is a branch of computer security applied to hardware. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to remain accessible and productive to its intended users. Such secret information could be retrieved by different ways. This article focus on the retrieval of data thanks to misused hardware or hardware failure. Hardware could be misused or exploited to get secret data. This article collects main types of attack that can lead to data theft.

The Medical Intelligence and Language Engineering Laboratory, also known as MILE lab, is a research laboratory at the Indian Institute of Science, Bangalore under the Department of Electrical Engineering. The lab is known for its work on Image processing, online handwriting recognition, Text-To-Speech and Optical character recognition systems, all of which are focused mainly on documents and speech in Indian languages. The lab is headed by A. G. Ramakrishnan.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

<span class="mw-page-title-main">Scene text</span> Text captured as part of outdoor surroundings in a photograph

Scene text is text that appears in an image captured by a camera in an outdoor environment.

A copy detection pattern (CDP) or graphical code is a small random or pseudo-random digital image which is printed on documents, labels or products for counterfeit detection. Authentication is made by scanning the printed CDP using an image scanner or mobile phone camera. It is possible to store additional product-specific data into the CDP that will be decoded during the scanning process. A CDP can also be inserted into a 2D barcode to facilitate smartphone authentication and to connect with traceability data.

References

  1. 1 2 3 Mennens, Jan; Tichelen, Luc van; François, Guido; Engelen, Jan J. (December 1994). "Optical recognition of Braille using standard equipment". IEEE Transactions on Rehabilitation Engineering. 2 (4): 207–212. doi:10.1109/86.340878. ISSN   1063-6528.
  2. Dubus, J.P.; Benjelloun, M.; Devlaminck, V.; Wauquier, F.; Altmayer, P. (November 1988). "Image processing techniques to perform an autonomous system to translate relief braille into black-ink, called: Lectobraille". Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Vol. 4. Engineering in Medicine and Biology Society. pp. 1584–1585. doi:10.1109/IEMBS.1988.94726. ISBN   978-0-7803-0785-8. S2CID   57767398.
  3. 1 2 3 Wong, Lisa; Abdulla, Waleed; Hussman, Stephan (August 2004). "A software algorithm prototype for optical recognition of embossed Braille". Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. pp. 586–589. doi:10.1109/ICPR.2004.1334316. ISBN   978-0-7695-2128-2. ISSN   1051-4651. S2CID   16350406.
  4. Ng, C.; Ng, V.; Lau, Y. (September 1999). "Regular feature extraction for recognition of Braille". Proceedings Third International Conference on Computational Intelligence and Multimedia Applications. ICCIMA'99 (Cat. No.PR00300). pp. 302–306. doi:10.1109/ICCIMA.1999.798547. ISBN   978-0-7695-0300-4. S2CID   58353838.
  5. Murray, I.; Dias, T. (2001). "A portable device for optically recognizing Braille. I. Hardware development". The Seventh Australian and New Zealand Intelligent Information Systems Conference, 2001. pp. 302–306. doi:10.1109/ANZIIS.2001.974063. ISBN   978-1-74052-061-4. S2CID   195859595.
  6. 1 2 Antonacopoulos, A.; Bridson, D. (2004). "A Robust Braille Recognition System". Document Analysis Systems. Lecture Notes in Computer Science. Vol. VI. pp. 533–545. doi:10.1007/978-3-540-28640-0_50. ISBN   978-3-540-23060-1. ISSN   0302-9743.