The International Conference on Document Analysis and Recognition (ICDAR) is an international academic conference which is now held annually, each time in a different country. It is about character and symbol recognition, printed/handwritten text recognition, graphics analysis and recognition, document analysis, document understanding, historical documents and digital libraries, document based forensics, camera and video based scene text analysis. [1]
ICDAR was held every second year between 1991 and 2023. It then transitioned to be an annual conference. The host country changes every time and the conference has taken place on five different continents so far. [2] An attempt is made to rotate around the world between Europe-Africa, Asia-Australia and the Americas - depending on the availability of hosts.
Year | Country | City | Website |
---|---|---|---|
1991 | France | Saint-Malo | |
1993 | Japan | Tsukuba, Ibaraki | |
1995 | Canada | Montreal | |
1997 | Germany | Ulm | |
1999 | India | Bangalore | |
2001 | United States | Seattle | |
2003 | Scotland | Edinburgh | |
2005 | Korea | Seoul | |
2007 | Brazil | Curitiba | |
2009 | Spain | Barcelona | |
2011 | China | Peking | |
2013 | United States | Washington, D.C. | https://iapr.org/archives/icdar2013/ |
2015 | France | Nancy | https://iapr.org/archives/icdar2015/ |
2017 | Japan | Kyoto | https://iapr.org/archives/icdar2017/ |
2019 | Australia | Sydney | https://iapr.org/archives/icdar2019/ |
2021 | Switzerland | Lausanne | https://iapr.org/archives/icdar2021/ |
2023 | United States | San Jose, CA | |
2024 | Greece | Athens | https://icdar2024.net/ |
2025 | China | Wuhan | https://www.icdar2025.com/ |
2026 | Austria | Vienna | https://icdar2026.org/ |
2027 | Malaysia | Kuala Lumpur |
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.
Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.
In computer vision or natural language processing, document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order. Detection and labeling of the different zones as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis. But text zones play different logical roles inside the document and this kind of semantic labeling is the scope of the logical layout analysis.
Document processing is a field of research and a set of production processes aimed at making an analog document digital. Document processing does not simply aim to photograph or scan a document to obtain a digital image, but also to make it digitally intelligible. This includes extracting the structure of the document or the layout and then the content, which can take the form of text or images. The process can involve traditional computer vision algorithms, convolutional neural networks or manual labor. The problems addressed are related to semantic segmentation, object detection, optical character recognition (OCR), handwritten text recognition (HTR) and, more broadly, transcription, whether automatic or not. The term can also include the phase of digitizing the document using a scanner and the phase of interpreting the document, for example using natural language processing (NLP) or image classification technologies. It is applied in many industrial and scientific fields for the optimization of administrative processes, mail processing and the digitization of analog archives and historical documents.
In handwriting research, the concept of stroke is used in various ways. In engineering and computer science, there is a tendency to use the term stroke for a single connected component of ink or a complete pen-down trace. Thus, such stroke may be a complete character or a part of a character. However, in this definition, a complete word written as connected cursive script should also be called a stroke. This is in conflict with the suggested unitary nature of stroke as a relatively simple shape.
Optical music recognition (OMR) is a field of research that investigates how to computationally read musical notation in documents. The goal of OMR is to teach the computer to read and interpret sheet music and produce a machine-readable version of the written music score. Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI and MusicXML . In the past it has, misleadingly, also been called "music optical character recognition". Due to significant differences, this term should no longer be used.
Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing/transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency.
Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.
OCRopus is a free document analysis and optical character recognition (OCR) system released under the Apache License v2.0 with a very modular design using command-line interfaces.
FADO is a European image-archiving system that was set up to help combat illegal immigration and organised crime. It was established by a Joint Action of the Council of the European Union enacted in 1998.
Sargur Narasimhamurthy Srihari was an Indian and American computer scientist and educator who made contributions to the field of pattern recognition. The principal impact of his work has been in handwritten address reading systems and in computer forensics. He was a SUNY Distinguished Professor in the School of Engineering and Applied Sciences at the University at Buffalo, Buffalo, New York, USA.
Sharjah International Book Fair (SIBF) is a major literary and publishing event held annually in Sharjah, United Arab Emirates. It is one of the world's largest book fairs, hosting 2,033 exhibitors from 109 countries in 2023 and featuring over 1,700 cultural activities spanning the 12-day event.
Text, Speech and Dialogue (TSD) is an annual conference involving topics on natural language processing and computational linguistics. The meeting is held every September alternating in Brno and Plzeň, Czech Republic.
Handwritten biometric recognition is the process of identifying the author of a given text from the handwriting style. Handwritten biometric recognition belongs to behavioural biometric systems because it is based on something that the user has learned to do.
In natural language processing, Entity Linking, also referred to as named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities mentioned in text. For example, given the sentence "Paris is the capital of France", the main idea is to first identify "Paris" and "France" as named entities, and then to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris" and "France" to the french country. The Entity Linking task is composed of 3 subtasks. First, Named Entity Recognition, which consist in the extraction of named entities from a text. Second, for each named entity, the objective is to generate candidates from a Knowledge Base. We call this step candidate generation. The main challenge being that we want to get the corresponding entity inside the candidates set. Lastly, the objective is to choose from the candidate set the correct entity. We call this step disambiguation.
The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
A convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns features by itself via filter optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. Convolution-based networks are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently have been replaced -- in some cases -- by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.
The International Association for Pattern Recognition (IAPR), is an international association of organizations dedicated to computer vision or pattern recognition. It only admits one organization per country or territory. Individual people participate in the IAPR through their own country or territory's organization.
Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.
Scene text is text that appears in an image captured by a camera in an outdoor environment.