Parallel coordinates

Last updated
Parallel coordinates ParCorFisherIris.png
Parallel coordinates
Parallel coordinate plot of the flea data in GGobi. Ggobi-flea2.png
Parallel coordinate plot of the flea data in GGobi.

Parallel Coordinates is a common method of visualizing high-dimensional datasets to analyze data that has multiple variables, or attributes.

Contents

To plot, or visualize, a set of points in n-dimensional space, n parallel axes lines are drawn over the background, typically vertically oriented and equally spaced. A point in n-dimensional space is represented as a single polyline with n vertices placed on the parallel axes; vertices correspond to each coordinate of the n-dimensional point.

This visualization is similar to time series visualization, except that Parallel Coordinates are applied to data which do not correspond with chronological time. Therefore, different axes arrangements can be of interest, including translating axes horizontally, or inverting.

History

The concept of Parallel Coordinates is often said to originate in 1885 by a French mathematician Philbert Maurice d'Ocagne [1] . d'Ocagne sought a way to provide graphical calculation of mathematical functions using alignment diagrams called nomograms which used parallel axes with different scales. For example, a three-variable equation could be solved using three parallel axes, marking known values on their scales, then drawing a line between them, with an unknown read from the scale at the point where the line intersects that scale.

The use of Parallel Coordinates as a visualization technique to show data is also often said to have originated earlier with Henry Gannett in work preceding the Statistical Atlas of the United States for the 1890 Census, for example his "General Summary, Showing the Rank of States, by Ratios, 1880", [2] that shows the rank of 10 measures (population, occupations, wealth, manufacturing, agriculture, and so forth) on parallel axes connected by lines for each state.

However, both d'Ocagne and Gannet were far preceded in this by André-Michel Guerry, [3] Plate IV, "Influence de l'Age", where he showed rankings of crimes against persons by age along parallel axes, connecting the same crime across age groups. [4]

Parallel Coordinates were popularised again 87 years later by Alfred Inselberg [5] in 1985 and systematically developed as a coordinate system starting from 1977. Some important applications are in collision avoidance algorithms for air traffic control (1987—3 USA patents), data mining (USA patent), computer vision (USA patent), Optimization, process control, more recently in intrusion detection and elsewhere.

Higher dimensions

On the plane with an XY Cartesian coordinate system, adding more dimensions in parallel coordinates (often abbreviated ||-coords, PCP, or PC) involves adding more axes. The value of parallel coordinates is that certain geometrical properties in high dimensions transform into easily seen 2D patterns. For example, a set of points on a line in n-space transforms to a set of polylines in parallel coordinates all intersecting at n  1 points. For n = 2 this yields a point-line duality pointing out why the mathematical foundations of parallel coordinates are developed in the projective rather than euclidean space. A pair of lines intersects at a unique point which has two coordinates and, therefore, can correspond to a unique line which is also specified by two parameters (or two points). By contrast, more than two points are required to specify a curve and also a pair of curves may not have a unique intersection. Hence by using curves in parallel coordinates instead of lines, the point line duality is lost together with all the other properties of projective geometry, and the known nice higher-dimensional patterns corresponding to (hyper)planes, curves, several smooth (hyper)surfaces, proximities, convexity and recently non-orientability. [6] The goal is to map n-dimensional relations into 2D patterns. Hence, parallel coordinates is not a point-to-point mapping but rather a nD subset to 2D subset mapping, there is no loss of information. Note: even a point in nD is not mapped into a point in 2D, but to a polygonal line—a subset of 2D.

Statistical considerations

Representative sample for parallel coordinates. Parallel coordinates-sample.png
Representative sample for parallel coordinates.

When used for statistical data visualisation there are three important considerations: the order, the rotation, and the scaling of the axes.

The order of the axes is critical for finding features, and in typical data analysis many reorderings will need to be tried. Some authors have come up with ordering heuristics which may create illuminating orderings. [7]

The rotation of the axes is a translation in the parallel coordinates and if the lines intersected outside the parallel axes it can be translated between them by rotations. The simplest example of this is rotating the axis by 180 degrees. [8]

Scaling is necessary because the plot is based on interpolation (linear combination) of consecutive pairs of variables. [8] Therefore, the variables must be in common scale, and there are many scaling methods to be considered as part of data preparation process that can reveal more informative views.

A smooth parallel coordinate plot is achieved with splines. [9] In the smooth plot, every observation is mapped into a parametric line (or curve), which is smooth, continuous on the axes, and orthogonal to each parallel axis. This design emphasizes the quantization level for each data attribute. [8]

Reading

Inselberg (Inselberg 1997) made a full review of how to visually read out parallel coordinates relational patterns. [10] When most lines between two parallel axis are somewhat parallel to each other, it suggests a positive relationship between these two dimensions. When lines cross in a kind of superposition of X-shapes, it's a negative relationship. When lines cross randomly or are parallel, it shows there is no particular relationship.

Limitations

In parallel coordinates, each axis can have at most two neighboring axes (one on the left, and one on the right). For a n-dimensional data set, at most n-1 relationships can be shown at a time without altering the approach. In time series visualization, there exists a natural predecessor and successor; therefore in this special case, there exists a preferred arrangement. However, when the axes do not have a unique order, finding a good axis arrangement requires the use of experimentation and feature engineering. To explore more relationships, axes may be reordered or restructured.

One approach arranges axes in 3-dimensional space (still in parallel, forming a Lattice graph), an axis can have more than two neighbors in a circle around the central attribute, and the arrangement problem can be improve by using a minimum spanning tree. [11] A prototype of this visualization is available as extension to the data mining software ELKI. However, the visualization is harder to interpret and interact with than a linear order.

Software

While there are a large number of papers about parallel coordinates, there are only few notable software publicly available to convert databases into parallel coordinates graphics. [12] Notable software are ELKI, GGobi, Mondrian, Orange and ROOT. Libraries include Protovis.js, D3.js provides basic examples. D3.Parcoords.js (a D3-based library) specifically dedicated to parallel coordinates graphic creation has also been published. The Python data structure and analysis library Pandas implements parallel coordinates plotting, using the plotting library matplotlib. [13]

Other visualizations for multivariate data

Related Research Articles

<span class="mw-page-title-main">Cartesian coordinate system</span> Most common coordinate system (geometry)

In geometry, a Cartesian coordinate system in a plane is a coordinate system that specifies each point uniquely by a pair of real numbers called coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, called coordinate lines, coordinate axes or just axes of the system. The point where they meet is called the origin and has (0, 0) as coordinates.

<span class="mw-page-title-main">Map projection</span> Systematic representation of the surface of a sphere or ellipsoid onto a plane

In cartography, a map projection is any of a broad set of transformations employed to represent the curved two-dimensional surface of a globe on a plane. In a map projection, coordinates, often expressed as latitude and longitude, of locations from the surface of the globe are transformed to coordinates on a plane. Projection is a necessary step in creating a two-dimensional map and is one of the essential elements of cartography.

<span class="mw-page-title-main">Perpendicular</span> Relationship between two lines that meet at a right angle (90 degrees)

In geometry, two geometric objects are perpendicular if their intersection forms right angles at the point of intersection called a foot. The condition of perpendicularity may be represented graphically using the perpendicular symbol, ⟂. Perpendicular intersections can happen between two lines, between a line and a plane, and between two planes.

<span class="mw-page-title-main">Coordinate system</span> Method for specifying point positions

In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space. The order of the coordinates is significant, and they are sometimes identified by their position in an ordered tuple and sometimes by a letter, as in "the x-coordinate". The coordinates are taken to be real numbers in elementary mathematics, but may be complex numbers or elements of a more abstract system such as a commutative ring. The use of a coordinate system allows problems in geometry to be translated into problems about numbers and vice versa; this is the basis of analytic geometry.

<span class="mw-page-title-main">Stereographic projection</span> Particular mapping that projects a sphere onto a plane

In mathematics, a stereographic projection is a perspective projection of the sphere, through a specific point on the sphere, onto a plane perpendicular to the diameter through the point. It is a smooth, bijective function from the entire sphere except the center of projection to the entire plane. It maps circles on the sphere to circles or lines on the plane, and is conformal, meaning that it preserves angles at which curves meet and thus locally approximately preserves shapes. It is neither isometric nor equiareal.

<span class="mw-page-title-main">Nomogram</span> Analog graphical calculator

A nomogram, also called a nomograph, alignment chart, or abac, is a graphical calculating device, a two-dimensional diagram designed to allow the approximate graphical computation of a mathematical function. The field of nomography was invented in 1884 by the French engineer Philbert Maurice d'Ocagne (1862–1938) and used extensively for many years to provide engineers with fast graphical calculations of complicated formulas to a practical precision. Nomograms use a parallel coordinate system invented by d'Ocagne rather than standard Cartesian coordinates.

<span class="mw-page-title-main">3D projection</span> Design technique

A 3D projection is a design technique used to display a three-dimensional (3D) object on a two-dimensional (2D) surface. These projections rely on visual perspective and aspect analysis to project a complex object for viewing capability on a simpler plane.

In geometry and topology, the line at infinity is a projective line that is added to the real (affine) plane in order to give closure to, and remove the exceptional cases from, the incidence properties of the resulting projective plane. The line at infinity is also called the ideal line.

<span class="mw-page-title-main">Real projective plane</span> Compact non-orientable two-dimensional manifold

In mathematics, the real projective plane is an example of a compact non-orientable two-dimensional manifold; in other words, a one-sided surface. It cannot be embedded in standard three-dimensional space without intersecting itself. It has basic applications to geometry, since the common construction of the real projective plane is as the space of lines in R3 passing through the origin. The real projective plane is then an extension of the (ordinary) plane — to every point (v1,v2) of the ordinary plane, the line spanned by (v1,v2,1) is associated (i.e., the real projective plane is the projective completion of the ordinary plane, cf. also the homogeneous coordinates below) while there are also some “points in the infinity”.

<span class="mw-page-title-main">Scientific visualization</span> Interdisciplinary branch of science concerned with presenting scientific data visually

Scientific visualization is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. It is also considered a subset of computer graphics, a branch of computer science. The purpose of scientific visualization is to graphically illustrate scientific data to enable scientists to understand, illustrate, and glean insight from their data. Research into how people read and misread various types of visualizations is helping to determine what types and features of visualizations are most understandable and effective in conveying information.

<span class="mw-page-title-main">Line (geometry)</span> Straight figure with zero width and depth

In geometry, a straight line, usually abbreviated line, is an infinitely long object with no width, depth, or curvature, an idealization of such physical objects as a straightedge, a taut string, or a ray of light. Lines are spaces of dimension one, which may be embedded in spaces of dimension two, three, or higher. The word line may also refer, in everyday life, to a line segment, which is a part of a line delimited by two points.

<span class="mw-page-title-main">Cross section (geometry)</span> Geometrical concept

In geometry and science, a cross section is the non-empty intersection of a solid body in three-dimensional space with a plane, or the analog in higher-dimensional spaces. Cutting an object into slices creates many parallel cross-sections. The boundary of a cross-section in three-dimensional space that is parallel to two of the axes, that is, parallel to the plane determined by these axes, is sometimes referred to as a contour line; for example, if a plane cuts through mountains of a raised-relief map parallel to the ground, the result is a contour line in two-dimensional space showing points on the surface of the mountains of equal elevation.

<span class="mw-page-title-main">Ternary plot</span> Barycentric plot on three variables

A ternary plot, ternary graph, triangle plot, simplex plot, or Gibbs triangle is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. It is used in physical chemistry, petrology, mineralogy, metallurgy, and other physical sciences to show the compositions of systems composed of three species. In population genetics, a triangle plot of genotype frequencies is called a de Finetti diagram. In game theory, it is often called a simplex plot. Ternary plots are tools for analyzing compositional data in the three-dimensional case.

Point plotting is an elementary mathematical skill required in analytic geometry. Invented by René Descartes and originally used to locate positions on military maps, this skill is now assumed of everyone who wants to locate grid 7A on any map.

<span class="mw-page-title-main">Three-dimensional space</span> Geometric model of the physical space

In geometry, a three-dimensional space is a mathematical space in which three values (coordinates) are required to determine the position of a point. Most commonly, it is the three-dimensional Euclidean space, that is, the Euclidean space of dimension three, which models physical space. More general three-dimensional spaces are called 3-manifolds. The term may also refer colloquially to a subset of space, a three-dimensional region, a solid figure.

<span class="mw-page-title-main">Bipolar cylindrical coordinates</span>

Bipolar cylindrical coordinates are a three-dimensional orthogonal coordinate system that results from projecting the two-dimensional bipolar coordinate system in the perpendicular -direction. The two lines of foci and of the projected Apollonian circles are generally taken to be defined by and , respectively, in the Cartesian coordinate system.

<span class="mw-page-title-main">Spacetime diagram</span> Graph of space and time in special relativity

A spacetime diagram is a graphical illustration of locations in space at various times, especially in the special theory of relativity. Spacetime diagrams can show the geometry underlying phenomena like time dilation and length contraction without mathematical equations.

<span class="mw-page-title-main">Pinhole camera model</span> Model of 3D points projected onto planar image via a lens-less aperture

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to focus light. The model does not include, for example, geometric distortions or blurring of unfocused objects caused by lenses and finite sized apertures. It also does not take into account that most practical cameras have only discrete image coordinates. This means that the pinhole camera model can only be used as a first order approximation of the mapping from a 3D scene to a 2D image. Its validity depends on the quality of the camera and, in general, decreases from the center of the image to the edges as lens distortion effects increase.

<span class="mw-page-title-main">Euclidean plane</span> Geometric model of the planar projection of the physical universe

In mathematics, a Euclidean plane is a Euclidean space of dimension two, denoted or . It is a geometric space in which two real numbers are required to determine the position of each point. It is an affine space, which includes in particular the concept of parallel lines. It has also metrical properties induced by a distance, which allows to define circles, and angle measurement.

<span class="mw-page-title-main">Plot (graphics)</span> Graphical technique for data sets

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

References

  1. Ocagne, M. (1885). Coordonnées Parallèles et Axiales: Méthode de transformation géométrique et procédé nouveau de calcul graphique déduits de la considération des coordonnées parallèlles. Gauthier-Villars. https://archive.org/details/coordonnesparal00ocaggoog }}
  2. Gannett, Henry. "General Summary Showing the Rank of States by Ratios 1880".{{cite journal}}: Cite journal requires |journal= (help)
  3. Guerry, A.-M. (1833). Essai sur la Statistique Morale de la France. Paris: Crochard.
  4. Friendly, M. (2022). The life and works of André-Michel Guerry, revisited. Sociological Spectrum, 42(4-6), 233–259. https://doi.org/10.1080/02732173.2022.2078450
  5. Inselberg, Alfred (1985). "The Plane with Parallel Coordinates". Visual Computer. 1 (4): 69–91. doi:10.1007/BF01898350. S2CID   15933827.
  6. Inselberg, Alfred (2009). Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications. Springer. ISBN   978-0387215075.
  7. Yang, Jing; Peng, Wei; Ward, Matthew O.; Rundensteiner, Elke A. (2003). "Interactive Hierarchical Dimension Ordering Spacing and Filtering for Exploration of High Dimensional Datasets" (PDF). IEEE Symposium on Information Visualization (INFOVIS 2003): 3–4.
  8. 1 2 3 Moustafa, Rida; Wegman, Edward J. (2006). "Multivariate continuous data – Parallel Coordinates". In Unwin, A.; Theus, M.; Hofmann, H. (eds.). Graphics of Large Datasets: Visualizing a Million. Springer. pp. 143–156. ISBN   978-0387329062.
  9. Moustafa, Rida; Wegman, Edward J. (2002). "On Some Generalizations of Parallel Coordinate Plots" (PDF). Seeing a Million, A Data Visualization Workshop, Rain Am Lech (Nr.), Germany. Archived from the original (PDF) on 2013-12-24.
  10. Inselberg, A. (1997), "Multidimensional detective", Information Visualization, 1997. Proceedings., IEEE Symposium on, pp. 100–107, CiteSeerX   10.1.1.457.3745 , doi:10.1109/INFVIS.1997.636793, ISBN   0-8186-8189-6, S2CID   1823293
  11. Elke Achtert, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek (2013). "Interactive data mining with 3D-parallel-coordinate-trees". Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York City, NY. pp. 1009–1012. doi:10.1145/2463676.2463696. ISBN   9781450320375. S2CID   14850709.{{cite book}}: CS1 maint: date and year (link) CS1 maint: location missing publisher (link) CS1 maint: multiple names: authors list (link)
  12. Kosara, Robert (2010). "Parallel Coordinates".
  13. Parallel Coordinates in Pandas

Further reading