The Center of Excellence for Document Analysis and Recognition (CEDAR) is a research laboratory at the University at Buffalo, State University of New York. The center was established with funding from the United States Postal Service [1] and National Institute of Justice. [2] CEDAR was formalized by the United States Postal Service by Postmaster General Anthony Frank in 1991.The primary goal of CEDAR was to conduct research and development for developing software useful for the automation of postal sorting equipment. Work at CEDAR, with Sargur Srihari as principal investigator, led to the first handwritten address interpretation system in the world. [3] CEDAR-FOX, the first system for automatic comparison of handwriting for the purpose of forensic analysis, was developed at CEDAR.
The State University of New York is a system of public institutions of higher education in New York, United States. It is the largest comprehensive system of universities, colleges, and community colleges in the United States, with a total enrollment of 606,232 students, plus 1.1 million adult education students, spanning 64 campuses across the state. Led by Chancellor Kristina M. Johnson, the SUNY system has 88,000 faculty members and some 7,660 degree and certificate programs overall and a $10.7 billion budget.
The United States Postal Service is an independent agency of the executive branch of the United States federal government responsible for providing postal service in the United States, including its insular areas and associated states. It is one of the few government agencies explicitly authorized by the United States Constitution.
The National Institute of Justice (NIJ) is the research, development and evaluation agency of the United States Department of Justice. NIJ, along with the Bureau of Justice Statistics (BJS), Bureau of Justice Assistance (BJA), Office of Juvenile Justice and Delinquency Prevention (OJJDP), Office for Victims of Crime (OVC), and other program offices, comprise the Office of Justice Programs (OJP) branch of the Department of Justice.
HandWritten Address Interpretation is a software system developed at CEDAR. Known as HWAI, it was first deployed by the United States Postal Service through its contractor Lockheed-Martin in Tampa, Florida during the holiday December season in 1997. Initially 10% of the handwritten mail was successfully sorted and the project was considered a success. This was due to the large volume of mail that the US Postal Service processes and the cost of labor involved. The key to the success was the discovery of a heuristic by researchers Sargur Srihari and Jonathan Hull that the street number and ZIP code could be relatively easily recognized, because they only consist of numerals, which could then be used to constrain the possible street. Subsequent improvements to HWAI led to a 45% sort rate with a 2% error rate. Today more than 95% of the handwritten mail is sorted automatically. Versions of HWAI were developed for Australia Post and UK Royal Mail.
Sargur Narasimhamurthy Srihari is an American computer scientist and educator who has made contributions to the field of pattern recognition. The principal impact of his work has been in handwritten address reading systems and in computer forensics. He is a SUNY Distinguished Professor in the School of Engineering and Applied Sciences at the University at Buffalo, Buffalo, New York, USA.
The Australian Postal Corporation, operating as Australia Post, is the government-owned corporation that provides postal services in Australia. The head office of Australia Post is located at 111 Bourke Street, Melbourne, which also serves as a post office.
Royal Mail is a postal service and courier company in the United Kingdom, originally established in 1516. The company's subsidiary, Royal Mail Group Limited, operates the brands Royal Mail (letters) and Parcelforce Worldwide (parcels). General Logistics Systems, an international logistics company, is a wholly owned subsidiary of Royal Mail Group.
Optical character recognition or optical character reader, often abbreviated as OCR, is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image.
The mail or post is a system for physically transporting postcards, letters, and parcels. A postal service can be private or public, though many governments place restrictions on private systems. Since the mid-19th century, national postal systems have generally been established as government monopolies, with a fee on the article prepaid. Proof of payment is often in the form of adhesive postage stamps, but postage meters are also used for bulk mailing. Modern private postal systems are typically distinguished from national postal agencies by the names "courier" or "delivery service".
Postal codes used in the United Kingdom are known as postcodes. They are alphanumeric and were adopted nationally between 11 October 1959 and 1974, having been devised by the General Post Office. A full postcode is known as a "postcode unit" and designates an area with a number of addresses or a single major delivery point.
Handwriting recognition (HWR) is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available.
Geocoding is the computational process of transforming a physical address description to a location on the Earth's surface. Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries.
A "Postal Address" is a delivery address as defined by Irish Standard - I.S. EN 14142-1:2011, as operated by the Universal Service Provider, An Post. Their addressing guides comply with the Universal Postal Union’s (UPU) addressing guidelines.
The Coding Accuracy Support System (CASS) enables the United States Postal Service (USPS) to evaluate the accuracy of software that corrects and matches street addresses. CASS certification is offered to all mailers, service bureaus, and software vendors that would like the USPS to evaluate the quality of their address-matching software and improve the accuracy of their ZIP+4, carrier route, and five-digit coding.
A multiline optical-character reader, or MLOCR, is a type of mail sorting machine that uses optical character recognition (OCR) technology to determine how to route mail through the postal system.
Remote Bar Coding System (RBCS), also called Remote Video Encoding (RVE) is a method used by the United States Postal Service to encode the address of letter-sized mailpieces that are not decipherable by a Multiline Optical Character Reader (MLOCR). When an MLOCR does not recognize a valid address on a letter, it sends an image of the mailpiece to a central RBCS (RVE) site where more sophisticated optical character recognition software is able to interpret many hand-written addresses using neural net and fuzzy logic algorithms. If this does not succeed, human operators visually examine the image and enter the address. In both cases, the data is sent back to the originating mail facility where mailpieces are then automatically matched back up with data through the use of a unique fluorescent barcode printed on the back during initial MLOCR attempt, and receive a POSTNET barcode representing the full address.
In computer science, intelligent character recognition (ICR) is an advanced optical character recognition (OCR) or — rather more specific — handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer during processing to improve accuracy and recognition levels.
Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques.
Postcodes are used in Australia to more efficiently sort and route mail within the Australian postal system. Postcodes in Australia have four digits and are placed at the end of the Australian address. Postcodes were introduced in Australia in 1967 by the Postmaster-General's Department and are now managed by Australia Post, and are published in booklets available from post offices or online from the Australia Post website.
First used by postal services to expedite and automate mail processing, mail sorting systems are now also used by corporations and other mailers to presort mail prior to delivery in order to earn discounts on postage. In the United States, for example, presort discounts can reduce the cost of First-Class Mail from $0.42 to as low as $0.324. Many companies also use mail sorters to handle incoming mail such as checks, orders and correspondence.
Automated ECG interpretation is the use of artificial intelligence and pattern recognition software and knowledge bases to carry out automatically the interpretation, test reporting, and computer-aided diagnosis of electrocardiogram tracings obtained usually from a patient.
This is a software system for forensic comparison of handwriting. It was developed at CEDAR, the Center of Excellence for Document Analysis and Recognition at the University at Buffalo. CEDAR-FOX has capabilities for interaction with the questioned document examiner to go through processing steps such as extracting regions of interest from a scanned document, determining lines and words of text, recognize textual elements. The final goal is to compare two samples of writing to determine the log-likelihood ratio under the prosecution and defense hypotheses. It can also be used to compare signature samples. The software, which is protected by a United States Patent can be licensed from Cedartech, Inc.
A mail sack or mailsack is a mail bag used to carry large quantities of mail.
The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
Sayre’s Paradox is a dilemma encountered in the design of automated handwriting recognition systems. A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized. The paradox was first articulated in a 1973 publication by Kenneth M. Sayre, after whom it was named.