Part of a series on Statistics |
Data and information visualization |
---|
Major dimensions |
Important figures |
Information graphic types |
Related topics |
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of objects in a set into a configuration of points mapped into an abstract Cartesian space. [1]
More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction.
Given a distance matrix with the distances between each pair of objects in a set, and a chosen number of dimensions, N, an MDS algorithm places each object into N-dimensional space (a lower-dimensional representation) such that the between-object distances are preserved as well as possible. For N = 1, 2, and 3, the resulting points can be visualized on a scatter plot. [2]
Core theoretical contributions to MDS were made by James O. Ramsay of McGill University, who is also regarded as the founder of functional data analysis. [3]
MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix:
It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain, [2] which is given by where denote vectors in N-dimensional space, denotes the scalar product between and , and are the elements of the matrix defined on step 2 of the following algorithm, which are computed from the distances.
It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Metric MDS minimizes the cost function called “stress” which is a residual sum of squares:
Metric scaling uses a power transformation with a user-controlled exponent : and for distance. In classical scaling Non-metric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities.
In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space.
Let be the dissimilarity between points . Let be the Euclidean distance between embedded points .
Now, for each choice of the embedded points and is a monotonically increasing function , define the "stress" function:
The factor of in the denominator is necessary to prevent a "collapse". Suppose we define instead , then it can be trivially minimized by setting , then collapse every point to the same point.
A few variants of this cost function exist. MDS programs automatically minimize stress in order to obtain the MDS solution.
The core of a non-metric MDS algorithm is a twofold optimization process. First the optimal monotonic transformation of the proximities has to be found. Secondly, the points of a configuration have to be optimally arranged, so that their distances match the scaled proximities as closely as possible.
NMDS needs to optimize two objectives simultaneously. This is usually done iteratively:
Louis Guttman's smallest space analysis (SSA) is an example of a non-metric MDS procedure.
An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. [5]
The data to be analyzed is a collection of objects (colors, faces, stocks, . . .) on which a distance function is defined,
These distances are the entries of the dissimilarity matrix
The goal of MDS is, given , to find vectors such that
where is a vector norm. In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function. [6] For example, when dealing with mixed-type data that contain numerical as well as categorical descriptors, Gower's distance is a common alternative.[ citation needed ]
In other words, MDS attempts to find a mapping from the objects into such that distances are preserved. If the dimension is chosen to be 2 or 3, we may plot the vectors to obtain a visualization of the similarities between the objects. Note that the vectors are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances .
(Note: The symbol indicates the set of real numbers, and the notation refers to the Cartesian product of copies of , which is an -dimensional vector space over the field of the real numbers.)
There are various approaches to determining the vectors . Usually, MDS is formulated as an optimization problem, where is found as a minimizer of some cost function, for example,
A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimizers can be stated analytically in terms of matrix eigendecompositions. [2]
There are several steps in conducting MDS research:
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:
In mathematics, and more specifically in linear algebra, a linear map is a mapping between two vector spaces that preserves the operations of vector addition and scalar multiplication. The same names and the same definition are also used for the more general case of modules over a ring; see Module homomorphism.
In differential geometry, a Riemannian manifold is a geometric space on which many geometric notions such as distance, angles, length, volume, and curvature are defined. Euclidean space, the -sphere, hyperbolic space, and smooth surfaces in three-dimensional space, such as ellipsoids and paraboloids, are all examples of Riemannian manifolds. Riemannian manifolds are named after German mathematician Bernhard Riemann, who first conceptualized them.
In the mathematical field of differential geometry, a metric tensor is an additional structure on a manifold M that allows defining distances and angles, just as the inner product on a Euclidean space allows defining distances and angles there. More precisely, a metric tensor at a point p of M is a bilinear form defined on the tangent space at p, and a metric field on M consists of a metric tensor at each point p of M that varies smoothly with p.
In differential geometry, the Ricci curvature tensor, named after Gregorio Ricci-Curbastro, is a geometric object which is determined by a choice of Riemannian or pseudo-Riemannian metric on a manifold. It can be considered, broadly, as a measure of the degree to which the geometry of a given metric tensor differs locally from that of ordinary Euclidean space or pseudo-Euclidean space.
Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.
Distance geometry is the branch of mathematics concerned with characterizing and studying sets of points based only on given values of the distances between pairs of points. More abstractly, it is the study of semimetric spaces and the isometric transformations between them. In this view, it can be considered as a subject within general topology.
In linear algebra, the Gram matrix of a set of vectors in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product . If the vectors are the columns of matrix then the Gram matrix is in the general case that the vector coordinates are complex numbers, which simplifies to for the case that the vector coordinates are real numbers.
In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.
In statistics, a rank correlation is any of several statistics that measure an ordinal association — the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.
In mathematics, a Euclidean distance matrix is an n×n matrix representing the spacing of a set of n points in Euclidean space. For points in k-dimensional space ℝk, the elements of their Euclidean distance matrix A are given by squares of distances between them. That is
In population biology and demography, generation time is the average time between two consecutive generations in the lineages of a population. In human populations, generation time typically has ranged from 20 to 30 years, with wide variation based on gender and society. Historians sometimes use this to date events, by converting generations into years to obtain rough estimates of time.
Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of -dimensional data items, a configuration of points in -dimensional space is sought that minimizes the so-called stress function . Usually is or , i.e. the matrix lists points in or dimensional Euclidean space so that the result may be visualised. The function is a cost or loss function that measures the squared differences between ideal distances and actual distances in r-dimensional space. It is defined as:
The intrinsic dimension for a data set can be thought of as the number of variables needed in a minimal representation of the data. Similarly, in signal processing of multidimensional signals, the intrinsic dimension of the signal describes how many variables are needed to generate a good approximation of the signal.
Sammon mapping or Sammon projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality by trying to preserve the structure of inter-point distances in high-dimensional space in the lower-dimension projection.
Isomap is a nonlinear dimensionality reduction method. It is one of several widely used low-dimensional embedding methods. Isomap is used for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points. The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point’s neighbors on the manifold. Isomap is highly efficient and generally applicable to a broad range of data sources and dimensionalities.
In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.
Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis. Since the 1970s, alpha has been used in content analysis where textual units are categorized by trained readers, in counseling and survey research where experts code open-ended interview data into analyzable terms, in psychological testing where alternative tests of the same phenomena need to be compared, or in observational studies where unstructured happenings are recorded for subsequent analysis.
In data mining and machine learning, kq-flats algorithm is an iterative method which aims to partition m observations into k clusters where each cluster is close to a q-flat, where q is a given integer.
Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification.
Abstract. Multidimensional scaling methods are now a common statistical tool in psychophysics and sensory analysis. The development of these methods is charted, from the original research of Torgerson (metric scaling), Shepard and Kruskal (non-metric scaling) through individual differences scaling and the maximum likelihood methods proposed by Ramsay.