Sayre's paradox

Last updated

Sayre's paradox is a dilemma encountered in the design of automated handwriting recognition systems. A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized. [1] [2] The paradox was first articulated in a 1973 publication by Kenneth M. Sayre, after whom it was named. [3]

Contents

Nature of the problem

It is relatively easy to design automated systems capable of recognizing words inscribed in a printed format. Such words are segmented into letters by the very act of writing them on the page. Given templates matching typical letter shapes in a given language, individual letters can be identified with a high degree of probability. In cases of ambiguity, probable letter sequences can be compared with a selection of properly spelled words in that language (called a lexicon). [4] If necessary, syntactic features of the language can be applied to render a generally accurate identification of the words in question. [5] Printed-character recognition systems of this sort are commonly used in processing standardized government forms, in sorting mail by zip code, and so forth.

In cursive writing, however, letters comprising a given word typically flow sequentially without gaps between them. Unlike a sequence of printed letters, cursively connected letters are not segmented in advance. Here is where Sayre's Paradox comes into play. Unless the word is already segmented into letters, template-matching techniques like those described above cannot be applied. That is, segmentation is a prerequisite for word recognition. But there are no reliable techniques for segmenting a word into letters unless the word itself has been identified. Word recognition requires letter segmentation, and letter segmentation requires word recognition. There is no way a cursive writing recognition system employing standard template-matching techniques can do both simultaneously.

Advantages to be gained by use of automated cursive writing recognition systems include routing mail with handwritten addresses, reading handwritten bank checks, and automated digitalization of hand-written documents. [1] These are practical incentives for finding ways of circumventing Sayre's Paradox.

Avoiding the paradox

One way of ameliorating the adverse effects of the paradox is to normalize the word inscriptions to be recognized. Normalization amounts to eliminating idiosyncrasies in the penmanship of the writer, such as unusual slope of the letters and unusual slant of the cursive line. [4] This procedure can increase the probability of a correct match with a letter template, resulting in an incremental improvement in the success rate of the system. Since improvement of this sort still depends on accurate segmentation, however, it remains subject to the limitations of Sayre's Paradox. Researchers have come to realize that the only way to circumvent the paradox is by use of procedures that do not rely on accurate segmentation. [1]

Directions of current research

Segmentation is accurate to the extent that it matches distinctions among letters in the actual inscriptions presented to the system for recognition (the input data). This is sometimes referred to as “explicit segmentation”. [4] “Implicit segmentation,” by contrast, is division of the cursive line into more parts than the number of actual letters in the cursive line itself. Processing these “implicit parts” to achieve eventual word identification requires specific statistical procedures involving hidden Markov models (HMM).

A Markov model is a statistical representation of a random process, which is to say a process in which future states are independent of states occurring before the present. In such a process, a given state is dependent only on the conditional probability of its following the state immediately before it. An example is a series of outcomes from successive casts of a die. An HMM is a Markov model, individual states of which are not fully known. Conditional probabilities between states are still determinate, but the identities of individual states are not fully disclosed.

Recognition proceeds by matching HMMs of words to be recognized with previously prepared HMMs of words in the lexicon. The best match in a given case is taken to indicate the identity of the handwritten word in question. As with systems based on explicit segmentation, automated recognition systems based on implicit segmentation are judged more or less successful according to the percentage of correct identifications they accomplish.

Instead of explicit segmentation techniques, most automated handwriting recognition systems today employ implicit segmentation in conjunction with HMM-based matching procedures. [1] The constraints epitomized by Sayre's Paradox are largely responsible for this shift in approach.

Related Research Articles

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">Penmanship</span> Technique of writing with the hand

Penmanship is the technique of writing with the hand using a writing instrument. Today, this is most commonly done with a pen, or pencil, but throughout history has included many different implements. The various generic and formal historical styles of writing are called "hands" while an individual's style of penmanship is referred to as "handwriting".

Although people in many parts of the world share common alphabets and numeral systems, styles of handwritten letterforms vary between individuals, and sometimes also vary systematically between regions.

A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent Markov process. An HMM requires that there be an observable process whose outcomes depend on the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about state of by observing By definition of being a Markov model, an HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time Estimation of the parameters in an HMM can be performed using maximum likelihood. For linear chain HMMs, the Baum–Welch algorithm can be used to estimate the parameters.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

<span class="mw-page-title-main">Handwriting recognition</span> Ability of a computer to receive and interpret intelligible handwritten input

Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.

<span class="mw-page-title-main">Handwriting</span> Writing created by a person with a writing implement

Handwriting is the personal and unique style of writing with a writing instrument, such as a pen or pencil in the hand. Handwriting includes both block and cursive styles and is separate from generic and formal handwriting script/style, calligraphy or typeface. Because each person's handwriting is unique and different, it can be used to verify a document's writer. The deterioration of a person's handwriting is also a symptom or result of several different diseases. The inability to produce clear and coherent handwriting is also known as dysgraphia.

<span class="mw-page-title-main">Cursive</span> Style of penmanship

Cursive is any style of penmanship in which characters are written joined in a flowing manner, generally for the purpose of making writing faster, in contrast to block letters. It varies in functionality and modern-day usage across languages and regions; being used both publicly in artistic and formal documents as well as in private communication. Formal cursive is generally joined, but casual cursive is a combination of joins and pen lifts. The writing style can be further divided as "looped", "italic", or "connected".

Within the Markov model in probability theory, Markovian discrimination is a spam filtering method used in CRM114 and other spam filters to model the statistical behaviors of spam and non-spam, more accurately than can be done by using simple Bayesian methods. It is based on the theory of Markov chains by Andrey Markov. While a simple Bayesian model of written text contains only the dictionary of legal words and their relative probabilities, a Markovian model adds to this the relative transition probabilities -a Bayesian filter works on single words alone, while a Markovian filter works on phrases or entire sentences.

Intelligent character recognition (ICR) is used to extract handwritten text from images. It is a more sophisticated type of OCR technology that recognizes different handwriting styles and fonts to intelligently interpret data on forms and physical documents.

Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language processing.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.

<span class="mw-page-title-main">Russian cursive</span> Handwritten form of Russian Cyrillic

Russian cursive is a variant of the Russian alphabet used for writing by hand. It is typically referred to as (ру́сский) рукопи́сный шрифт (rússky) rukopísny shrift, "(Russian) handwritten font". It is the handwritten form of the modern Russian Cyrillic script, used instead of the block letters seen in printed material. In addition, Russian italics for lowercase letters are often based on Russian cursive. Most handwritten Russian, especially in personal letters and schoolwork, uses the cursive alphabet. In Russian schools most children are taught from first grade how to write with this script.

<span class="mw-page-title-main">Pandemonium architecture</span>

Pandemonium architecture is a theory in cognitive science that describes how visual images are processed by the brain. It has applications in artificial intelligence and pattern recognition. The theory was developed by the artificial intelligence pioneer Oliver Selfridge in 1959. It describes the process of object recognition as a hierarchical system of detection and association by a metaphorical set of "demons" sending signals to each other. This model is now recognized as the basis of visual perception in cognitive science.

Intelligent Word Recognition, or IWR, is the recognition of unconstrained handwritten words. IWR recognizes entire handwritten words or phrases instead of character-by-character, like its predecessor, optical character recognition (OCR). IWR technology matches handwritten or printed words to a user-defined dictionary, significantly reducing character errors encountered in typical character-based recognition engines.

This is a software system for forensic comparison of handwriting. It was developed at CEDAR, the Center of Excellence for Document Analysis and Recognition at the University at Buffalo. CEDAR-FOX has capabilities for interaction with the questioned document examiner to go through processing steps such as extracting regions of interest from a scanned document, determining lines and words of text, recognize textual elements. The final goal is to compare two samples of writing to determine the log-likelihood ratio under the prosecution and defense hypotheses. It can also be used to compare signature samples. The software, which is protected by a United States Patent can be licensed from Cedartech, Inc.

Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood. The goal of this approach is to classify the images by using the contextual information.

<span class="mw-page-title-main">Signature recognition</span>

Signature recognition is an example of behavioral biometrics that identifies a person based on their handwriting. It can be operated in two different ways:

Time-series segmentation is a method of time-series analysis in which an input time-series is divided into a sequence of discrete segments in order to reveal the underlying properties of its source. A typical application of time-series segmentation is in speaker diarization, in which an audio signal is partitioned into several pieces according to who is speaking at what times. Algorithms based on change-point detection include sliding windows, bottom-up, and top-down methods. Probabilistic methods based on hidden Markov models have also proved useful in solving this problem.

Kenneth M. Sayre was an American philosopher who spent most of his career at the University of Notre Dame (ND). His early career was devoted mainly to philosophic applications of artificial intelligence, cybernetics, and information theory. Later on his main interests shifted to Plato, philosophy of mind, and environmental philosophy. His retirement in 2014 was marked by publication of a history of ND's Philosophy Department, Adventures in Philosophy at Notre Dame.

References

  1. 1 2 3 4 Vinciarelli, Alessandro (April 2003). Offline Cursive Handwriting: From Word To Text Recognition (PhD). IDIAP.
  2. Fischer, Andreas; Frinken, Volkmar; Bunke, Horst (2013). "Chapter 17 - Hidden Markov Models for Off-Line Cursive Handwriting Recognition". In Rao, C.R.; Govindaraju, Venu (eds.). Handbook of Statistics. Elsevier. pp. 421–442. doi:10.1016/B978-0-444-53859-8.00017-5. ISBN   9780444538598. ISSN   0169-7161.
  3. Sayre, Kenneth M. (1973). "Machine recognition of handwritten words: A project report". Pattern Recognition. 5 (3). Pergamon Press: 213–228. Bibcode:1973PatRe...5..213S. doi:10.1016/0031-3203(73)90044-7. ISSN   0031-3203.
  4. 1 2 3 Vinciarelli, Alessandro (July 2002). "A survey on off-line Cursive Word Recognition". Pattern Recognition. 35 (7): 1433–1446. Bibcode:2002PatRe..35.1433V. doi:10.1016/S0031-3203(01)00129-7. ISSN   0031-3203.
  5. Maroneze, André O.; Coüasnon, Bertrand; Lemaitre, Aurélie (2011-01-24). Agam, Gady; Viard-Gaudin, Christian (eds.). Introduction of statistical information in a syntactic analyzer for document image recognition. Document Recognition and Retrieval XVIII. Vol. 7874. SPIE. pp. 28–38. doi:10.1117/12.873393.