Data fusion

Last updated
Fusion of the data from two sources (dimensions #1 & #2) can yield a classifier superior to any classifiers based on dimension #1 or dimension #2 alone. Data Fusion - Scatter plot.png
Fusion of the data from two sources (dimensions #1 & #2) can yield a classifier superior to any classifiers based on dimension #1 or dimension #2 alone.

Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source.

Contents

Data fusion processes are often categorized as low, intermediate, or high, depending on the processing stage at which fusion takes place. [1] Low-level data fusion combines several sources of raw data to produce new raw data. The expectation is that fused data is more informative and synthetic than the original inputs.

For example, sensor fusion is also known as (multi-sensor) data fusion and is a subset of information fusion.

The concept of data fusion has origins in the evolved capacity of humans and animals to incorporate information from multiple senses to improve their ability to survive. For example, a combination of sight, touch, smell, and taste may indicate whether a substance is edible. [2]

The JDL/DFIG model

Joint Director of the Labs (JDL)/Data Fusion Information Group (DFIG) Model Data Fusion Information Group Model.png
Joint Director of the Labs (JDL)/Data Fusion Information Group (DFIG) Model

In the mid-1980s, the Joint Directors of Laboratories formed the Data Fusion Subpanel (which later became known as the Data Fusion Group). With the advent of the World Wide Web, data fusion thus included data, sensor, and information fusion. The JDL/DFIG introduced a model of data fusion that divided the various processes. Currently, the six levels with the Data Fusion Information Group (DFIG) model are:

Although the JDL Model (Level 1–4) is still in use today, it is often criticized for its implication that the levels necessarily happen in order and also for its lack of adequate representation of the potential for a human-in-the-loop. The DFIG model (Level 0–5) explored the implications of situation awareness, user refinement, and mission management. [3] Despite these shortcomings, the JDL/DFIG models are useful for visualizing the data fusion process, facilitating discussion and common understanding, [4] and important for systems-level information fusion design. [3] [5]

Geospatial applications

In the geospatial (GIS) domain, data fusion is often synonymous with data integration. In these applications, there is often a need to combine diverse data sets into a unified (fused) data set which includes all of the data points and time steps from the input data sets. The fused data set is different from a simple combined superset in that the points in the fused data set contain attributes and metadata which might not have been included for these points in the original data set.

A simplified example of this process is shown below where data set "α" is fused with data set β to form the fused data set δ. Data points in set "α" have spatial coordinates X and Y and attributes A1 and A2. Data points in set β have spatial coordinates X and Y and attributes B1 and B2. The fused data set contains all points and attributes.

Input Data Set αInput Data Set βFused Data Set δ
PointXYA1A2
α11010MN
α21030MN
α33010MN
α43030MN
PointXYB1B2
β12020QR
β22040QR
β34020QR
β44040QR
PointXYA1A2B1B2
δ11010MNQ?R?
δ21030MNQ?R?
δ33010MNQ?R?
δ43030MNQ?R?
δ52020M?N?QR
δ62040M?N?QR
δ74020M?N?QR
δ84040M?N?QR

In a simple case where all attributes are uniform across the entire analysis domain, the attributes may be simply assigned: M?, N?, Q?, R? to M, N, Q, R. In a real application, attributes are not uniform and some type of interpolation is usually required to properly assign attributes to the data points in the fused set.

Visualization of fused data sets for rock lobster tracks in the Tasman Sea. Image generated using Eonfusion software by Myriax Pty. Ltd. Lobster movement3.jpg
Visualization of fused data sets for rock lobster tracks in the Tasman Sea. Image generated using Eonfusion software by Myriax Pty. Ltd.

In a much more complicated application, marine animal researchers use data fusion to combine animal tracking data with bathymetric, meteorological, sea surface temperature (SST) and animal habitat data to examine and understand habitat utilization and animal behavior in reaction to external forces such as weather or water temperature. Each of these data sets exhibit a different spatial grid and sampling rate so a simple combination would likely create erroneous assumptions and taint the results of the analysis. But through the use of data fusion, all data and attributes are brought together into a single view in which a more complete picture of the environment is created. This enables scientists to identify key locations and times and form new insights into the interactions between the environment and animal behaviors.

In the figure at right, rock lobsters are studied off the coast of Tasmania. Hugh Pederson of the University of Tasmania used data fusion software to fuse southern rock lobster tracking data (color-coded for in yellow and black for day and night, respectively) with bathymetry and habitat data to create a unique 4D picture of rock lobster behavior.

Data integration

In applications outside of the geospatial domain, differences in the usage of the terms Data integration and Data fusion apply. In areas such as business intelligence, for example, data integration is used to describe the combining of data, whereas data fusion is integration followed by reduction or replacement. Data integration might be viewed as set combination wherein the larger set is retained, whereas fusion is a set reduction technique with improved confidence.

Application areas

From multiple traffic sensing modalities

The data from the different sensing technologies can be combined in intelligent ways to determine the traffic state accurately. A Data fusion based approach that utilizes the road side collected acoustic, image and sensor data has been shown to combine the advantages of the different individual methods. [6]

Decision fusion

In many cases, geographically dispersed sensors are severely energy- and bandwidth-limited. Therefore, the raw data concerning a certain phenomenon are often summarized in a few bits from each sensor. When inferring on a binary event (i.e., or ), in the extreme case only binary decisions are sent from sensors to a Decision Fusion Center (DFC) and combined in order to obtain improved classification performance. [7] [8] [9]

For enhanced contextual awareness

With a multitude of built-in sensors including motion sensor, environmental sensor, position sensor, a modern mobile device typically gives mobile applications access to a number of sensory data which could be leveraged to enhance the contextual awareness. Using signal processing and data fusion techniques such as feature generation, feasibility study and principal component analysis (PCA) such sensory data will greatly improve the positive rate of classifying the motion and contextual relevant status of the device. [10] Many context-enhanced information techniques are provided by Snidaro, et al. [11] [12]

Statistical methods for data fusion

Bayesian auto-regressive Gaussian processes

Gaussian processes are a popular machine learning model. If an auto-regressive relationship between the data is assumed, and each data source is assumed to be a Gaussian process, this constitutes a non-linear Bayesian regression problem. [13]

See also Multifidelity Simulation

Semiparametric estimation

Many data fusion methods assume common conditional distributions across several data sources. [14] Recently, methods have been developed to enable efficient estimation within the resulting semiparametric model. [15]

See also

Related Research Articles

<span class="mw-page-title-main">Image registration</span> Mapping of data into a single system

Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, military automatic target recognition, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements.

<span class="mw-page-title-main">Simultaneous localization and mapping</span> Computational navigational technique used by robots and autonomous vehicles

Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. While this initially appears to be a chicken or the egg problem, there are several algorithms known to solve it in, at least approximately, tractable time for certain environments. Popular approximate solution methods include the particle filter, extended Kalman filter, covariance intersection, and GraphSLAM. SLAM algorithms are based on concepts in computational geometry and computer vision, and are used in robot navigation, robotic mapping and odometry for virtual reality or augmented reality.

Wireless sensor networks (WSNs) refer to networks of spatially dispersed and dedicated sensors that monitor and record the physical conditions of the environment and forward the collected data to a central location. WSNs can measure environmental conditions such as temperature, sound, pollution levels, humidity and wind.

Prognostics is an engineering discipline focused on predicting the time at which a system or a component will no longer perform its intended function. This lack of performance is most often a failure beyond which the system can no longer be used to meet desired performance. The predicted time then becomes the remaining useful life (RUL), which is an important concept in decision making for contingency mitigation. Prognostics predicts the future performance of a component by assessing the extent of deviation or degradation of a system from its expected normal operating conditions. The science of prognostics is based on the analysis of failure modes, detection of early signs of wear and aging, and fault conditions. An effective prognostics solution is implemented when there is sound knowledge of the failure mechanisms that are likely to cause the degradations leading to eventual failures in the system. It is therefore necessary to have initial information on the possible failures in a product. Such knowledge is important to identify the system parameters that are to be monitored. Potential uses for prognostics is in condition-based maintenance. The discipline that links studies of failure mechanisms to system lifecycle management is often referred to as prognostics and health management (PHM), sometimes also system health management (SHM) or—in transportation applications—vehicle health management (VHM) or engine health management (EHM). Technical approaches to building models in prognostics can be categorized broadly into data-driven approaches, model-based approaches, and hybrid approaches.

<span class="mw-page-title-main">Sensor fusion</span> Combining of sensor data from disparate sources

Sensor fusion is the process of combining sensor data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually. For instance, one could potentially obtain a more accurate location estimate of an indoor object by combining multiple data sources such as video cameras and WiFi localization signals. The term uncertainty reduction in this case can mean more accurate, more complete, or more dependable, or refer to the result of an emerging view, such as stereoscopic vision.

Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.

Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation. Video processing systems may introduce some amount of distortion or artifacts in the video signal that negatively impact the user's perception of the system. For many stakeholders in video production and distribution, ensuring video quality is an important task.

Information integration (II) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations. It is used in data mining and consolidation of data from unstructured or semi-structured resources. Typically, information integration refers to textual representations of knowledge but is sometimes applied to rich-media content. Information fusion, which is a related term, involves the combination of information into a new set of information towards reducing redundancy and uncertainty.

The image fusion process is defined as gathering all the important information from multiple images, and their inclusion into fewer images, usually a single one. This single image is more informative and accurate than any single source image, and it consists of all the necessary information. The purpose of image fusion is not only to reduce the amount of data but also to construct images that are more appropriate and understandable for the human and machine perception. In computer vision, multisensor image fusion is the process of combining relevant information from two or more images into a single image. The resulting image will be more informative than any of the input images.

Requirements traceability is a sub-discipline of requirements management within software development and systems engineering. Traceability as a general term is defined by the IEEE Systems and Software Engineering Vocabulary as (1) the degree to which a relationship can be established between two or more products of the development process, especially products having a predecessor-successor or primary-subordinate relationship to one another; (2) the identification and documentation of derivation paths (upward) and allocation or flowdown paths (downward) of work products in the work product hierarchy; (3) the degree to which each element in a software development product establishes its reason for existing; and (4) discernible association among two or more logical entities, such as requirements, system elements, verifications, or tasks.

<span class="mw-page-title-main">SASS-C</span>

For air traffic control, SASS-C is an acronym for "Surveillance Analysis Support System for ATC-Centre". SASS-C Service is part of Eurocontrol Communications, navigation and surveillance.

<span class="mw-page-title-main">Indoor positioning system</span> Network of devices used to wirelessly locate objects inside a building

An indoor positioning system (IPS) is a network of devices used to locate people or objects where GPS and other satellite technologies lack precision or fail entirely, such as inside multistory buildings, airports, alleys, parking garages, and underground locations.

The joint probabilistic data-association filter (JPDAF) is a statistical approach to the problem of plot association in a target tracking algorithm. Like the probabilistic data association filter (PDAF), rather than choosing the most likely assignment of measurements to a target, the PDAF takes an expected value, which is the minimum mean square error (MMSE) estimate for the state of each target. At each time, it maintains its estimate of the target state as the mean and covariance matrix of a multivariate normal distribution. However, unlike the PDAF, which is only meant for tracking a single target in the presence of false alarms and missed detections, the JPDAF can handle multiple target tracking scenarios. A derivation of the JPDAF is given in.

The Semantic Sensor Web (SSW) is a marriage of sensor web and semantic Web technologies. The encoding of sensor descriptions and sensor observation data with Semantic Web languages enables more expressive representation, advanced access, and formal analysis of sensor resources. The SSW annotates sensor data with spatial, temporal, and thematic semantic metadata. This technique builds on current standardization efforts within the Open Geospatial Consortium's Sensor Web Enablement (SWE) and extends them with Semantic Web technologies to provide enhanced descriptions and access to sensor data.

Covariance intersection (CI) is an algorithm for combining two or more estimates of state variables in a Kalman filter when the correlation between them is unknown.

A resilient control system is one that maintains state awareness and an accepted level of operational normalcy in response to disturbances, including threats of an unexpected and malicious nature".

Adaptive collaborative control is the decision-making approach used in hybrid models consisting of finite-state machines with functional models as subcomponents to simulate behavior of systems formed through the partnerships of multiple agents for the execution of tasks and the development of work products. The term “collaborative control” originated from work developed in the late 1990s and early 2000 by Fong, Thorpe, and Baur (1999). It is important to note that according to Fong et al. in order for robots to function in collaborative control, they must be self-reliant, aware, and adaptive. In literature, the adjective “adaptive” is not always shown but is noted in the official sense as it is an important element of collaborative control. The adaptation of traditional applications of control theory in teleoperations sought initially to reduce the sovereignty of “humans as controllers/robots as tools” and had humans and robots working as peers, collaborating to perform tasks and to achieve common goals. Early implementations of adaptive collaborative control centered on vehicle teleoperation. Recent uses of adaptive collaborative control cover training, analysis, and engineering applications in teleoperations between humans and multiple robots, multiple robots collaborating among themselves, unmanned vehicle control, and fault tolerant controller design.

Fusion adaptive resonance theory (fusion ART) is a generalization of self-organizing neural networks known as the original Adaptive Resonance Theory models for learning recognition categories across multiple pattern channels. There is a separate stream of work on fusion ARTMAP, that extends fuzzy ARTMAP consisting of two fuzzy ART modules connected by an inter-ART map field to an extended architecture consisting of multiple ART modules.

<span class="mw-page-title-main">Pose tracking</span>

In virtual reality (VR) and augmented reality (AR), a pose tracking system detects the precise pose of head-mounted displays, controllers, other objects or body parts within Euclidean space. Pose tracking is often referred to as 6DOF tracking, for the six degrees of freedom in which the pose is often tracked.

<span class="mw-page-title-main">Image destriping</span> Image processing task

Image destriping is the process of removing stripes or streaks from images and videos without disrupting the original image/video. These artifacts plague a range of fields in scientific imaging including atomic force microscopy, light sheet fluorescence microscopy, and planetary satellite imaging.

References

  1. Klein, Lawrence A. (2004). Sensor and data fusion: A tool for information assessment and decision making. SPIE Press. p. 51. ISBN   978-0-8194-5435-5.
  2. Hall, David L.; Llinas, James (1997). "An introduction to multisensor data fusion". Proceedings of the IEEE. 85 (1): 6–23. doi:10.1109/5.554205. ISSN   0018-9219.
  3. 1 2 Blasch, Erik P.; Bossé, Éloi; Lambert, Dale A. (2012). High-Level Information Fusion Management and System Design. Norwood, MA: Artech House Publishers. ISBN   978-1-6080-7151-7.
  4. Liggins, Martin E.; Hall, David L.; Llinas, James (2008). Multisensor Data Fusion, Second Edition: Theory and Practice (Multisensor Data Fusion). CRC. ISBN   978-1-4200-5308-1.
  5. Blasch, E., Steinberg, A., Das, S., Llinas, J., Chong, C.-Y., Kessler, O., Waltz, E., White, F." (2013). Revisiting the JDL model for information Exploitation. International Conference on Information Fusion.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  6. Joshi, V., Rajamani, N., Takayuki, K., Prathapaneni, Subramaniam, L. V. (2013). Information Fusion Based Learning for Frugal Traffic State Sensing. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  7. Ciuonzo, D.; Papa, G.; Romano, G.; Salvo Rossi, P.; Willett, P. (2013-09-01). "One-Bit Decentralized Detection With a Rao Test for Multisensor Fusion". IEEE Signal Processing Letters. 20 (9): 861–864. arXiv: 1306.6141 . Bibcode:2013ISPL...20..861C. doi:10.1109/LSP.2013.2271847. ISSN   1070-9908. S2CID   6315906.
  8. Ciuonzo, D.; Salvo Rossi, P. (2014-02-01). "Decision Fusion With Unknown Sensor Detection Probability". IEEE Signal Processing Letters. 21 (2): 208–212. arXiv: 1312.2227 . Bibcode:2014ISPL...21..208C. doi:10.1109/LSP.2013.2295054. ISSN   1070-9908. S2CID   8761982.
  9. Ciuonzo, D.; De Maio, A.; Salvo Rossi, P. (2015-09-01). "A Systematic Framework for Composite Hypothesis Testing of Independent Bernoulli Trials". IEEE Signal Processing Letters. 22 (9): 1249–1253. Bibcode:2015ISPL...22.1249C. doi:10.1109/LSP.2015.2395811. ISSN   1070-9908. S2CID   15503268.
  10. Guiry, John J.; van de Ven, Pepijn; Nelson, John (2014-03-21). "Multi-Sensor Fusion for Enhanced Contextual Awareness of Everyday Activities with Ubiquitous Devices". Sensors. 14 (3): 5687–5701. Bibcode:2014Senso..14.5687G. doi: 10.3390/s140305687 . PMC   4004015 . PMID   24662406.
  11. Snidaro, Laurao; et, al. (2016). Context-Enhanced Information Fusion:Boosting Real-World Performance with Domain Knowledge. Switzerland, AG: Springer. ISBN   978-3-319-28971-7.
  12. Haghighat, Mohammad; Abdel-Mottaleb, Mohamed; Alhalabi, Wadee (2016). "Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition". IEEE Transactions on Information Forensics and Security. 11 (9): 1984–1996. doi:10.1109/TIFS.2016.2569061. S2CID   15624506.
  13. Ranftl, Sascha; Melito, Gian Marco; Badeli, Vahid; Reinbacher-Köstinger, Alice; Ellermann, Katrin; von der Linden, Wolfgang (2019-12-31). "Bayesian Uncertainty Quantification with Multi-Fidelity Data and Gaussian Processes for Impedance Cardiography of Aortic Dissection". Entropy. 22 (1): 58. Bibcode:2019Entrp..22...58R. doi: 10.3390/e22010058 . ISSN   1099-4300. PMC   7516489 . PMID   33285833.
  14. Bareinboim, Elias; Pearl, Judea (2016-07-05). "Causal inference and the data-fusion problem". Proceedings of the National Academy of Sciences. 113 (27): 7345–7352. doi:10.1073/pnas.1510507113. ISSN   0027-8424. PMC   4941504 . PMID   27382148.
  15. Li, Sijia; Luedtke, Alex (2023-11-15). "Efficient estimation under data fusion". Biometrika. 110 (4): 1041–1054. doi:10.1093/biomet/asad007. ISSN   0006-3444. PMC   10653189 . PMID   37982010.

Sources

General references

Bibliography