Interactive visual analysis

Last updated

Interactive Visual Analysis (IVA) is a set of techniques for combining the computational power of computers with the perceptive and cognitive capabilities of humans, in order to extract knowledge from large and complex datasets. The techniques rely heavily on user interaction and the human visual system, and exist in the intersection between visual analytics and big data. It is a branch of data visualization. IVA is a suitable technique for analyzing high-dimensional data that has a large number of data points, where simple graphing and non-interactive techniques give an insufficient understanding of the information. [1]

Contents

These techniques involve looking at datasets through different, correlated views and iteratively selecting and examining features the user finds interesting. The objective of IVA is to gain knowledge which is not readily apparent from a dataset, typically in tabular form. This can involve generating, testing or verifying hypotheses, or simply exploring the dataset to look for correlations between different variables.

History

Focus + Context visualization and its related techniques date back to the 1970s. [2] Early attempts at combining these techniques for Interactive Visual Analysis occur in the WEAVE visualization system for cardiac simulation [3] in the year 2000. SimVis appeared in 2003, [4] and multiple Ph. D. projects have explored the concept since then - notably Helmut Doleisch in 2004, [5] Johannes Kehrer in 2011 [6] and Zoltan Konyha in 2013. [7] ComVis, which is used in the visualization community, appeared in 2008. [8]

Basics

The objective of Interactive Visual Analysis is to discover information in data which is not readily apparent. The goal is to move from the data itself to the information contained in the data, ultimately uncovering knowledge which was not apparent from looking at the raw numbers.

The most basic form of IVA is to use coordinated multiple views [9] displaying different columns of our dataset. At least two views are required for IVA. The views are usually among the common tools of information visualization, such as histograms, scatterplots or parallel coordinates, but using volume rendered views is also possible if this is appropriate for the data. [6] Typically, one view will display the independent variables of the dataset (e.g. time or spatial location), while the others display the dependent variables (e.g. temperature, pressure or population density) in relation to each other. If the views are linked, the user can select data points in one view and have the corresponding data points automatically highlighted in the other views. This technique, which intuitively allows exploration of higher-dimensional properties of the data, is known as linking and brushing. [10] [11]

The selection made in one of the views doesn't have to be binary. Software packages for IVA can allow for a gradual “degree of interest” [5] [6] [12] in the selection, where data points are gradually highlighted as we move from low to high interest. This allows for an inherent “focus+context” [13] aspect to the search for information. For instance, when examining a tumor in a Magnetic resonance imaging dataset, the tissue surrounding the tumor might also be of some interest to the operator.

The IVA loop

Interactive Visual Analysis is an iterative process. Discoveries made after brushing of the data and looking at the linked views can be used as a starting point for repeating the process, leading to a form of information drill-down. As an example, consider the analysis of data from a simulation of a combustion engine. The user brushes a histogram of temperature distribution, and discovers that one specific part of one cylinder has dangerously high temperatures. This information can be used to formulate the hypothesis that all cylinders have a problem with heat dissipation. This could be verified by brushing the same region in all other cylinders and seeing in the temperature histogram that these cylinders also have higher temperatures than expected. [14]

Data model

The data source for IVA is usually tabular data where the data is represented in columns and rows. The data variables can be divided into two different categories: independent and dependent variables. The independent variables represent the domain of the observed values, such as for instance time and space. The dependent variables represent the data being observed, for instance temperature, pressure or height. [14]

IVA can help the user uncover information and knowledge about data sources that have fewer dimensions as well as datasets that have a very large number of dimensions. [2]

Levels of IVA

The IVA tools can be divided into several different levels of complexity. These levels provides the user with different interaction tools to analyze the data. For most uses, the first level will be sufficient and this is also the level that provides the user with the fastest response from the interaction. The higher levels make it possible to uncover more subtle relationships in the data. However, this requires more knowledge about the tools and the interaction process has a longer response time. [1]

Base level

The most simple form of IVA is the base level which consists of brushing and linking. Here the user can set up several views with different dataset variables and mark an interesting area in one of the views. The data points corresponding to the selection is marked automatically in the other views. A lot of information can be derived from this level of IVA. For datasets where the relationships between the variables are reasonably simple, this technique is usually sufficient for the user to achieve the required level of understanding. [7]

Second level

Brushing and linking with logical combination of brushes is a more advanced form of IVA. This makes it possible for the user to mark several areas in one or several views and combine these areas with the logical operators: and, or, not. This makes it is possible to explore deeper into the dataset and see more hidden information. [7] A simple example would be the analysis of weather data: The analyst might want to discover regions that both have warm temperatures and low precipitation.

Third level

The logical combination of selections might not be sufficient to uncover meaningful information from the data set. There are multiple techniques available that make hidden relationships in the data more apparent. One of these is attribute derivation. This allows the user to derive additional attributes from the data, such as derivatives, clustering information or other statistic properties. In principle, the operator can perform any set of calculations on the raw data. The derived attributes can then be linked and brushed like any other attribute. [7]

The second tool in level three of IVA is advanced brushing techniques, such as angular brushing, similarity brushing or percentile brushing. These brushing tools select data points in a more advanced fashion than plain "point and click" selection. Advanced brushing generates a faster response than attribute derivation, but has a higher learning curve and require a deeper understanding of the dataset. [7]

Fourth level

The fourth level of IVA is specific to each dataset and varies dependent on the dataset and the purpose of the analysis. Any calculated attribute which is specific to the data under consideration, belongs to this category. An example from the analysis of flow data would be the detection and categorization of vortexes or other structures present in the flow data. This means that fourth-level IVA techniques must be individually tailored to the specific application. After detection of higher-order features, the calculated attributes would be connected to the original data set and subjected to the normal technique of linking and brushing. [1]

Patterns of IVA

The "linking and brushing" (selection) concept of IVA can be used between different types of variables in the dataset. Which pattern we should use depends on which aspect of the correlations in the dataset are of interest. [1] [15]

Feature localization

Brushing data points from the set of dependent variables (e.g. temperature) and seeing where among the independent variables (e.g. space or time) these data points show up, is called "feature localization". With feature localization, the user can easily identify the location of features in the dataset. Examples from a meteorological dataset would be which regions have a warm climate or which times of the year have a lot of precipitation. [1] [15]

Local investigation

If independent variables are brushed and we look for the corresponding connection to a dependent view, this is termed "local investigation". This makes it possible to investigate the characteristics of for example a specific region or specific time. In the case of meteorological data, we could for instance discover the temperature distribution during the winter months. [1] [15]

Multivariate analysis

Brushing dependent variables and watching the connection to other dependent variables is called multivariate analysis. This could for example be used to find out if high temperatures are correlated with pressure by brushing high temperatures and watching a linked view of pressure distributions.

Since each of the linked views usually has two or more dimensions, multivariate analysis can implicitly uncover higher-dimensional features of the data which would not be readily apparent from e.g. a simple scatterplot. [1] [15]

See also

Related Research Articles

<span class="mw-page-title-main">Bar chart</span> Type of chart

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.

<span class="mw-page-title-main">Scientific visualization</span> Interdisciplinary branch of science concerned with presenting scientific data visually

Scientific visualization is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. It is also considered a subset of computer graphics, a branch of computer science. The purpose of scientific visualization is to graphically illustrate scientific data to enable scientists to understand, illustrate, and glean insight from their data. Research into how people read and misread various types of visualizations is helping to determine what types and features of visualizations are most understandable and effective in conveying information.

<span class="mw-page-title-main">Visualization (graphics)</span> Set of techniques for creating images, diagrams, or animations to communicate a message

Visualization or visualisation is any technique for creating images, diagrams, or animations to communicate a message. Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of humanity. from history include cave paintings, Egyptian hieroglyphs, Greek geometry, and Leonardo da Vinci's revolutionary methods of technical drawing for engineering and scientific purposes.

<span class="mw-page-title-main">Volume rendering</span> Representing a 3D-modeled object or dataset as a 2D projection

In scientific visualization and computer graphics, volume rendering is a set of techniques used to display a 2D projection of a 3D discretely sampled data set, typically a 3D scalar field.

<span class="mw-page-title-main">Parallel coordinates</span> Chart displaying multivariate data

Parallel coordinates are a common way of visualizing and analyzing high-dimensional datasets.

Geovisualization or geovisualisation, also known as cartographic visualization, refers to a set of tools and techniques supporting the analysis of geospatial data through the use of interactive visualization.

<span class="mw-page-title-main">Spatial analysis</span> Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.

<span class="mw-page-title-main">Data and information visualization</span> Visual representation of data

Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.

<span class="mw-page-title-main">Heat map</span> Data visualization technique

A heat map is a 2-dimensional data visualization technique that represents the magnitude of individual values within a dataset as a color. The variation in color may be by hue or intensity.

<span class="mw-page-title-main">Data fusion</span> Integration of multiple data sources to provide better information

Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source.

<span class="mw-page-title-main">ParaView</span> Scientific visualization software

ParaView is an open-source multiple-platform application for interactive, scientific visualization. It has a client–server architecture to facilitate remote visualization of datasets, and generates level of detail (LOD) models to maintain interactive frame rates for large datasets. It is an application built on top of the Visualization Toolkit (VTK) libraries. ParaView is an application designed for data parallelism on shared-memory or distributed-memory multicomputers and clusters. It can also be run as a single-computer application.

<span class="mw-page-title-main">GeoDa</span> Free geovisualization and analysis software

GeoDa is a free software package that conducts spatial data analysis, geovisualization, spatial autocorrelation and spatial modeling.

<span class="mw-page-title-main">Visual analytics</span>

Visual analytics is an outgrowth of the fields of information visualization and scientific visualization that focuses on analytical reasoning facilitated by interactive visual interfaces.

In databases, brushing and linking is the connection of two or more views of the same data, such that a change to the representation in one view affects the representation in the other. Brushing and linking is also an important technique in interactive visual analysis, a method for performing visual exploration and analysis of large, structured data sets.

In computing, 3D interaction is a form of human-machine interaction where users are able to move and perform interaction in 3D space. Both human and machine process information where the physical position of elements in the 3D space is relevant.

Cultural analytics refers to the use of computational, visualization, and big data methods for the exploration of contemporary and historical cultures. While digital humanities research has focused on text data, cultural analytics has a particular focus on massive cultural data sets of visual material – both digitized visual artifacts and contemporary visual and interactive media. Taking on the challenge of how to best explore large collections of rich cultural content, cultural analytics researchers developed new methods and intuitive visual techniques that rely on high-resolution visualization and digital image processing. These methods are used to address both the existing research questions in humanities, to explore new questions, and to develop new theoretical concepts that fit the mega-scale of digital culture in the early 21st century.

<span class="mw-page-title-main">Voreen</span> Volume visualization library and development platform

Voreen is an open-source volume visualization library and development platform. Through the use of GPU-based volume rendering techniques it allows high frame rates on standard graphics hardware to support interactive volume exploration.

The Trade Space Visualizer is a data visualization tool developed at the Applied Research Laboratory (ARL) at The Pennsylvania State University. Initial development started in 2002, and it is currently supported by a team at ARL/Penn State.

<span class="mw-page-title-main">Motion chart</span>

A motion chart is a dynamic bubble chart which allows efficient and interactive exploration and visualization of longitudinal multivariate data. Motion charts provide mechanisms for mapping ordinal, nominal and quantitative variables onto time, 2D coordinate axes, size, colors, glyphs and appearance characteristics, which facilitate the interactive display of multidimensional and temporal data.

Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems. These characteristics can include size or amount of data, completeness of the data, correctness of the data, possible relationships amongst data elements or files/tables in the data.

References

  1. 1 2 3 4 5 6 7 Interactive Visual Analysis of Scientific Data. Steffen Oeltze, Helmut Doleisch, Helwig Hauser, Gunther Weber. Presentation at IEEE VisWeek 2012, Seattle (WA), USA
  2. 1 2 Hauser, Helwig. "Generalizing focus+ context visualization." Scientific visualization: The visual extraction of knowledge from data. Springer Berlin Heidelberg, 2006. 305-327.
  3. Gresh, Donna L., et al. "WEAVE: A system for visually linking 3-D and statistical visualizations, applied to cardiac simulation and measurement data." Proceedings of the conference on Visualization'00. IEEE Computer Society Press, 2000.
  4. Doleisch, Helmut, Martin Gasser, and Helwig Hauser. "Interactive feature specification for focus+ context visualization of complex simulation data." Proceedings of the symposium on Data visualisation 2003. Eurographics Association, 2003.
  5. 1 2 Doleisch, Helmut. Visual analysis of complex simulation data using multiple heterogenous views. 2004.
  6. 1 2 3 Kehrer, Johannes. Interactive visual analysis of multi-faceted scientific data. PhD dissertation, Department of Informatics, University of Bergen, Norway, 2011.
  7. 1 2 3 4 5 Konyha, Zoltán, et al. "Interactive visual analysis of families of curves using data aggregation and derivation." Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies. ACM, 2012.
  8. Matkovic, Krešimir, et al. "ComVis: A coordinated multiple views system for prototyping new visualization technology." Information Visualisation, 2008. IV'08. 12th International Conference. IEEE, 2008
  9. Roberts, Jonathan C. "State of the art: Coordinated & multiple views in exploratory visualization." Coordinated and Multiple Views in Exploratory Visualization, 2007. CMV'07. Fifth International Conference on. IEEE, 2007.
  10. Martin, Allen R., and Matthew O. Ward. "High dimensional brushing for interactive exploration of multivariate data." Proceedings of the 6th Conference on Visualization'95. IEEE Computer Society, 1995.
  11. Keim, Daniel A. "Information visualization and visual data mining." Visualization and Computer Graphics, IEEE Transactions on 8.1 (2002): 1-8.
  12. Doleisch, Helmut, and Helwig Hauser. "Smooth brushing for focus+ context visualization of simulation data in 3D." Journal of WSCG 10.1 (2002): 147-154.
  13. Lamping, John, Ramana Rao, and Peter Pirolli. "A focus+ context technique based on hyperbolic geometry for visualizing large hierarchies." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co., 1995.
  14. 1 2 Konyha, Zoltan, et al. "Interactive visual analysis of families of function graphs." Visualization and Computer Graphics, IEEE Transactions on 12.6 (2006): 1373-1385.
  15. 1 2 3 4 Oeltze, Steffen, et al. "Interactive visual analysis of perfusion data." Visualization and Computer Graphics, IEEE Transactions on 13.6 (2007): 1392-1399.