Document AI

Last updated February 06, 2024

Document AI, also known as Document Intelligence, refers to a field of technology that employs machine learning (ML) techniques, such as natural language processing (NLP).^[1] These techniques are used to develop computer models capable of analyzing documents in a manner akin to human review.

Key Features

Machine learning is utilized in Document AI to extract information from both digital and printed documents. This technology recognizes text, characters, and images in various languages, aiding in the extraction of insights from unstructured documents. The use of this technology can improve the speed and quality of decision-making in document analysis. Additionally, the automation of data extraction and validation can contribute to increased efficiency in document analysis processes.

Example

Formal-letter Formal-letter.gif — Formal-letter

A business letter contains information in for the form of text, as well as other types of information, such as the position of the text. For instance, a typical letter contains two addresses before the body of the text. The address at the very top (sometimes aligned to the right) is the sender address. This is normally followed by the date of the letter, with the place of writing. After this, the receiver address is listen.

The distinction between the sender address and the receiver address is conveyed soley by the position of the address on the page, i.e. there is no textual indication like Sender: in front of the addresses.

Data dimensions & ML architecture

Data is typically distinguished in spatial data and time-series data, the former can be things like images, maps, graphs, etc. the latter can be e.g. stock-price or a voice recording. Document AI combines text data, which has a time dimension, with other types of data, such as the position of an address in a business letter, which is spatial.

Historically in machine learning spatial data was analyzed using a convolutional neural network, and temporal data using a recurrent neural network. With the advent of dimension-type agnostic transformer architecture, these two different types of dimension can be more easily combined, Document AI is an example of this.

Common Uses

Enhancing the reliability of business information by reducing manual data entry errors
Utilizing AI to identify anomalies in new invoices from established customers
Accelerating the mortgage workflow process
Automating the monitoring of loan portfolios for credit risk management
Enabling employee focus on higher-value tasks
Detecting counterfeit currency and fraudulent checks
Extracting and analyzing data previously inaccessible in document silos for informed business decisions
Streamlining the processing of receipts on a global scale
Assisting firms in automating the assessment of regulatory change impacts on contracts
In the real estate sector, aiding in developing standardized document classification and automated information extraction^[3]

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

Electronic data interchange (EDI) is the concept of businesses electronically communicating information that was traditionally communicated on paper, such as purchase orders, advance ship notices, and invoices. Technical standards for EDI exist to facilitate parties transacting such instruments without having to make special arrangements.

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction.

Document processing is a field of research and a set of production processes aimed at making an analog document digital. Document processing does not simply aim to photograph or scan a document to obtain a digital image, but also to make it digitally intelligible. This includes extracting the structure of the document or the layout and then the content, which can take the form of text or images. The process can involve traditional computer vision algorithms, convolutional neural networks or manual labor. The problems addressed are related to semantic segmentation, object detection, optical character recognition (OCR), handwritten text recognition (HTR) and, more broadly, transcription, whether automatic or not. The term can also include the phase of digitizing the document using a scanner and the phase of interpreting the document, for example using natural language processing (NLP) or image classification technologies. It is applied in many industrial and scientific fields for the optimization of administrative processes, mail processing and the digitization of analog archives and historical documents.

Disease Informatics (also infectious disease informatics) studies the knowledge production, sharing, modeling, and management of infectious diseases. It became a more studied field as a by-product of the rapid increases in the amount of biomedical and clinical data widely available, and to meet the demands for useful data analyses of such data.

Document clustering is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering.

Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that hard copy data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.

Error-driven learning is a type of reinforcement learning method. This method tweaks a model’s parameters based on the difference between the proposed and actual results. These models stand out as they depend on environmental feedback instead of explicit labels or categories. They are based on the idea that language acquisition involves the minimization of the prediction error (MPSE). By leveraging these prediction errors, the models consistently refine expectations and decrease computational complexity. Typically, these algorithms are operated by the GeneRec algorithm.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

The following outline is provided as an overview of and topical guide to natural-language processing:

In natural language processing (NLP), a text graph is a graph representation of a text item. It is typically created as a preprocessing step to support NLP tasks such as text condensation term disambiguation (topic-based) text summarization, relation extraction and textual entailment.

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

Resume parsing, also known as CV parsing, resume extraction, or CV extraction, allows for the automated storage and analysis of resume data. The resume is imported into parsing software and the information is extracted so that it can be sorted and searched.

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems.

Artificial intelligence is used by many different businesses and organizations. It is widely used in the financial sector, especially by accounting firms, to help detect fraud.

References

↑ Cui, Lei; Xu, Yiheng; Lv, Tengchao; Wei, Furu (2021). "Document AI: Benchmarks, Models and Applications". arXiv: 2111.08609 [cs.CL].
↑ "Why Digitizing Documents has been Accelerated by COVID-19 Pandemic". eWEEK. 15 January 2021. Retrieved 2021-02-11.
↑ Bodenbender, Mario; Kurzrock, Björn-Martin; Müller, Philipp Maximilian (April 2019). "Broad application of artificial intelligence for document classification, information extraction and predictive analytics in real estate". Journal of General Management. 44 (3): 170–179. doi:10.1177/0306307018823113. ISSN 0306-3070.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Cui, Lei; Xu, Yiheng; Lv, Tengchao; Wei, Furu (2021). "Document AI: Benchmarks, Models and Applications". arXiv: 2111.08609 [cs.CL].

[2] "Why Digitizing Documents has been Accelerated by COVID-19 Pandemic". eWEEK. 15 January 2021. Retrieved 2021-02-11.

[3] Bodenbender, Mario; Kurzrock, Björn-Martin; Müller, Philipp Maximilian (April 2019). "Broad application of artificial intelligence for document classification, information extraction and predictive analytics in real estate". Journal of General Management. 44 (3): 170–179. doi:10.1177/0306307018823113. ISSN 0306-3070.

[1]

[2]

[3]