This is a software system for forensic comparison of handwriting. It was developed at CEDAR, the Center of Excellence for Document Analysis and Recognition at the University at Buffalo. [1] [2] [3] CEDAR-FOX has capabilities for interaction with the questioned document examiner to go through processing steps such as extracting regions of interest from a scanned document, determining lines and words of text, recognize textual elements. The final goal is to compare two samples of writing to determine the log-likelihood ratio under the prosecution and defense hypotheses. It can also be used to compare signature samples. The software, which is protected by a United States Patent [4] can be licensed from Cedartech, Inc.
Writer verification is the task to determine whether two handwritten samples are written by the same writer or not. It is used in questioned document examiner. By using a set of metrics, CedarFox can associate a measure of confidence whether two documents are written by the same individual or by different individuals. CedarFox allows you to select either the entire document or a specific region of a document in order to obtain the comparison. The comparison is based on macro features (which measure global characteristics such as slant, connectivity, etc.), micro features (which are based on individual character shapes), and style features (e.g., shapes of character pairs, or bigrams). Two different modes of writer verification are available: (i) a questioned document is compared against a single known document (the basis of this comparison are statistics based on how much variation a person can have), and (ii) a questioned document is compared against "multiple known" documents. Here the system learns from the known documents about the writer's habits. At least four known documents have to be available to use this mode. The task of identifying the user is split into two parts,
CEDAR-FOX performs variety of operations on document to make them ready for comparison. They include thresholding, line removal, line segmentation, word segmentation and transcript mapping.
CedarFox has user interfaces for scanning documents directly as well as for entering the results directly into spread-sheets and for printing intermediate results. A database access is also available for storing document meta-data.
Many options are available with CEDAR-FOX for document comparison. The four major verification model used are
CedarFox has several modalities for searching handwritten documents for the presence of key-words. Word spotting allows the user to select a word image as a query, which is used to find similar word images in a specified document. Another type of search allows the user to type in a word which is used to rank all words in the document(s) as to how likely the word matches the query.
CedarFox has automatic character recognition capability. Word recognition with a pre-specified lexicon is also built-in. The user can also manually input character identities if the highest character recognition accuracy is desired for the purpose of writer verification/identification.
Word gap comparison and comparison with Palmer metrics is supported.
A signature is a handwritten depiction of someone's name, nickname, or even a simple "X" or other mark that a person writes on documents as a proof of identity and intent. The writer of a signature is a signatory or signer. Similar to a handwritten signature, a signature work describes the work as readily identifying its creator. A signature may be confused with an autograph, which is chiefly an artistic signature. This can lead to confusion when people have both an autograph and signature and as such some people in the public eye keep their signatures private whilst fully publishing their autograph.
A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software. Text editors are provided with operating systems and software development packages, and can be used to change files such as configuration files, documentation files and programming language source code.
Visual Basic for Applications (VBA) is an implementation of Microsoft's event-driven programming language Visual Basic 6.0 built into most desktop Microsoft Office applications. Although based on pre-.NET Visual Basic, which is no longer supported or updated by Microsoft, the VBA implementation in Office continues to be updated to support new Office features. VBA is used for professional and end-user development due to its perceived ease-of-use, Office's vast installed userbase, and extensive legacy in business.
WordStar is a word processor application for microcomputers. It was published by MicroPro International and originally written for the CP/M-80 operating system, with later editions added for MS-DOS and other 16-bit PC OSes. Rob Barnaby was the sole author of the early versions of the program.
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.
Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.
Handwriting is the personal and unique style of writing with a writing instrument, such as a pen or pencil in the hand. Handwriting includes both block and cursive styles and is separate from generic and formal handwriting script/style, calligraphy or typeface. Because each person's handwriting is unique and different, it can be used to verify a document's writer. The deterioration of a person's handwriting is also a symptom or result of several different diseases. The inability to produce clear and coherent handwriting is also known as dysgraphia.
In forensic science, questioned document examination (QDE) is the examination of documents potentially disputed in a court of law. Its primary purpose is to provide evidence about a suspicious or questionable document using scientific processes and methods. Evidence might include alterations, the chain of possession, damage to the document, forgery, origin, authenticity, or other questions that come up when a document is challenged in court.
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
In computer vision or natural language processing, document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order. Detection and labeling of the different zones as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis. But text zones play different logical roles inside the document and this kind of semantic labeling is the scope of the logical layout analysis.
Microsoft Office XP is an office suite which was officially revealed in July 2000 by Microsoft for the Windows operating system. Office XP was released to manufacturing on March 5, 2001, and was later made available to retail on May 31, 2001. A Mac OS X equivalent, Microsoft Office v. X was released on November 19, 2001.
Intelligent character recognition (ICR) is used to extract handwritten text from images. It is a more sophisticated type of OCR technology that recognizes different handwriting styles and fonts to intelligently interpret data on forms and physical documents.
A text entry interface or text entry device is an interface that is used to enter text information in an electronic device. A commonly used device is a mechanical computer keyboard. Most laptop computers have an integrated mechanical keyboard, and desktop computers are usually operated primarily using a keyboard and mouse. Devices such as smartphones and tablets mean that interfaces such as virtual keyboards and voice recognition are becoming more popular as text entry systems.
Windows Speech Recognition (WSR) is speech recognition developed by Microsoft for Windows Vista that enables voice commands to control the desktop user interface, dictate text in electronic documents and email, navigate websites, perform keyboard shortcuts, and operate the mouse cursor. It supports custom macros to perform additional or supplementary tasks.
Intelligent Word Recognition, or IWR, is the recognition of unconstrained handwritten words. IWR recognizes entire handwritten words or phrases instead of character-by-character, like its predecessor, optical character recognition (OCR). IWR technology matches handwritten or printed words to a user-defined dictionary, significantly reducing character errors encountered in typical character-based recognition engines.
Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.
Sargur Narasimhamurthy Srihari was an Indian and American computer scientist and educator who made contributions to the field of pattern recognition. The principal impact of his work has been in handwritten address reading systems and in computer forensics. He was a SUNY Distinguished Professor in the School of Engineering and Applied Sciences at the University at Buffalo, Buffalo, New York, USA.
MovAlyzeR is a software package for handwriting movement analysis for research and professional applications. Handwriting movements are recorded using a digitizing tablet connected to a computer. MovAlyzeR is used in many different fields ranging from research in kinesiology, psychology, education, geriatrics, neurology, psychiatry, occupational therapy, forensic document examination or questioned document examination, computer science, to educational demonstrations or student projects in these fields.
The Center of Excellence for Document Analysis and Recognition (CEDAR) is a research laboratory at the University at Buffalo, State University of New York. The center was established with funding from the United States Postal Service and National Institute of Justice. CEDAR was formalized by the United States Postal Service by Postmaster General Anthony Frank in 1991.The primary goal of CEDAR was to conduct research and development for developing software useful for the automation of postal sorting equipment. Work at CEDAR, with Sargur Srihari as principal investigator, led to the first handwritten address interpretation system in the world. CEDAR-FOX, the first system for automatic comparison of handwriting for the purpose of forensic analysis, was developed at CEDAR.
Sayre's paradox is a dilemma encountered in the design of automated handwriting recognition systems. A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized. The paradox was first articulated in a 1973 publication by Kenneth M. Sayre, after whom it was named.