Smart data capture (SDC), also known as 'intelligent data capture' or 'automated data capture', describes the branch of technology concerned with using computer vision techniques like optical character recognition (OCR), barcode scanning, object recognition and other similar technologies to extract and process information from semi-structured and unstructured data sources. IDC characterize smart data capture as an integrated hardware, software, and connectivity strategy to help organizations enable the capture of data in an efficient, repeatable, scalable, and future-proof way. [1] Data is captured visually from barcodes, text, IDs and other objects - often from many sources simultaneously - before being converted and prepared for digital use, typically by artificial intelligence-powered software. [2] An important feature of SDC is that it focuses not just on capturing data more efficiently but serving up easy-to-access, actionable insights at the instant of data collection to both frontline and desk-based workers, aiding decision-making and making it a two-way process.
Smart data capture automates and accelerates capture, applying insights in real time and automating processes based on extracted input. Smart data capture is designed to be repeatable and scalable to reduce low-level manual tasks and eliminate human error. To achieve this goal, smart data capture solutions are often made available using specialist software installed on commodity hardware such as smartphones. [3] However, some solutions may rely on specialized hardware such as dedicated scanning devices, wearables [4] or shop floor robots. [5]
Optical character recognition applications are typically concerned with the actual data capture process; they are intended to faithfully reproduce text, words, letters and symbols from a printed document. Smart data capture is multimodal, [6] capable of extracting data from a wider range of semi-structured and unstructured sources, going beyond basic text recognition to offer a wider scope of applications. By extending functionality to provide actionable insights at the point of capture, SDC is also a two-way process (capture-display), while OCR is more commonly one-way (capture only), primarily used for data input. [7]
Smart data capture solutions typically have two parts:
Smart data capture can be applied to almost any industry and application that requires visual information capture and interpretation. This may include:
Historically, PriceWaterhouseCoopers described smart data capture as a combination of robotic process automation and intelligent character recognition. [13] This description is no longer sufficient because it is focused purely on text-based capture systems (automated OCR).
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.
Machine vision is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to many technologies, software and hardware products, integrated systems, actions, methods and expertise. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of computer science. It attempts to integrate existing technologies in new ways and apply them to solve real world problems. The term is the prevalent one for these functions in industrial automation environments but is also used for these functions in other environment vehicle guidance.
An image scanner is a device that optically scans images, printed text, handwriting, or an object and converts it to a digital image. The most common type of scanner used in offices and in the home is the flatbed scanner, where the document is placed on a glass window for scanning. A sheetfed scanner, which moves the page across an image sensor using a series of rollers, may be used to scan one document at a time or multiple, as in an automatic document feeder. A handheld scanner is a portable version of an image scanner that can be used on any flat surface. Scans are usually downloaded to the computer that the scanner is connected to, although some scanners are able to store scans on standalone flash media.
Automatic identification and data capture (AIDC) refers to the methods of automatically identifying objects, collecting data about them, and entering them directly into computer systems, without human involvement. Technologies typically considered as part of AIDC include QR codes, bar codes, radio frequency identification (RFID), biometrics, magnetic stripes, optical character recognition (OCR), smart cards, and voice recognition. AIDC is also commonly referred to as "Automatic Identification", "Auto-ID" and "Automatic Data Capture".
A smart camera is a machine vision system which, in addition to image capture circuitry, is capable of extracting application-specific information from the captured images, along with generating event descriptions or making decisions that are used in an intelligent and automated system. A smart camera is a self-contained, standalone vision system with built-in image sensor in the housing of an industrial video camera. It is also known as an intelligent camera, a (smart) vision sensor, an intelligent vision sensor, a smart optical sensor, an intelligent optical sensor, a smart visual sensor, or an intelligent visual sensor.
Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval, and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.
A multiline optical-character reader, or MLOCR, is a type of mail sorting machine that uses optical character recognition (OCR) technology to determine how to route mail through the postal system.
Intelligent character recognition (ICR) is used to extract handwritten text from images. It is a more sophisticated type of OCR technology that recognizes different handwriting styles and fonts to intelligently interpret data on forms and physical documents.
TeleForm is a form of processing applications originally developed by Cardiff Software and now is owned by OpenText.
Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that hard copy data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.
Document capture software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.
Enterprise forms automation is a company-wide computer system or set of systems for managing, distributing, completing, and processing paper-based forms, applications, surveys, contracts, and other documents. It plays a vital role in the concept of a paperless office.
Digital mailroom is the automation of incoming mail processes. Using document scanning and document capture technologies, companies can digitise incoming mail and automate the classification and distribution of mail within the organization. Both paper and electronic mail (email) can be managed through the same process allowing companies to standardize their internal mail distribution procedures and adhere to company compliance policies.
Datacap, a privately owned company, manufactures and sells computer software, and services. Datacap's first product, Paper Keyboard, was a "forms processing" product and shipped in 1989. In August 2010, IBM announced that it had acquired Datacap for an undisclosed amount.
Gurpreet Singh Lehal is a professor in the Computer Science Department, Punjabi University, Patiala and Director of the Advanced Centre for Technical Development of Punjabi Language Literature and Culture. He is noted for his work in the application of computer technology in the use of the Punjabi language both in the Gurmukhi and Shahmukhi script.
Scan-Optics LLC, founded in 1968, is an enterprise content management services company and optical character recognition (OCR) and image scanner manufacturer headquartered in Manchester, Connecticut.
UiPath Inc. is a global software company that makes robotic process automation (RPA) software. It was founded in Bucharest, Romania, by Daniel Dines and Marius Tîrcă. Its headquarters are in New York City. The company's software monitors user activity to automate repetitive front and back office tasks, including those performed using other business software such as customer relationship management or enterprise resource planning (ERP) software.
Intelligent automation (IA), or alternately intelligent process automation, is a software term that refers to a combination of artificial intelligence (AI) and robotic process automation (RPA). Companies use intelligent automation to cut costs and streamline tasks by using artificial-intelligence-powered robotic software to mitigate repetitive tasks. As it accumulates data, the system learns in an effort to improve its efficiency. Intelligent automation applications consist of but are not limited to, pattern analysis, data assembly, and classification. The term is similar to hyperautomation, a concept identified by research group Gartner as being one of the top technology trends of 2020.
Scandit AG, commonly referred to as Scandit, is a Swiss technology company that provides smart data capture software. Their technology allows any smart device equipped with a camera to scan barcodes, IDs and text and to perform additional functions using augmented reality and advanced analytics.
ABBYY is an American technology company specializing in document processing, data capture, process mining and optical character recognition (OCR). Primarily focused on software as a service model, the company serves clients worldwide. One of ABBYY's best-known products is ABBYY FineReader, an optical character recognition (OCR) computer program.