Multidimensional scaling

Last updated
An example of classical multidimensional scaling applied to voting patterns in the United States House of Representatives. Each red dot represents one Republican member of the House, and each blue dot one Democrat. RecentVotes.svg
An example of classical multidimensional scaling applied to voting patterns in the United States House of Representatives. Each red dot represents one Republican member of the House, and each blue dot one Democrat.

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of objects in a set into a configuration of points mapped into an abstract Cartesian space. [1]

Contents

More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction.

Given a distance matrix with the distances between each pair of objects in a set, and a chosen number of dimensions, N, an MDS algorithm places each object into N-dimensional space (a lower-dimensional representation) such that the between-object distances are preserved as well as possible. For N = 1, 2, and 3, the resulting points can be visualized on a scatter plot. [2]

Core theoretical contributions to MDS were made by James O. Ramsay of McGill University, who is also regarded as the founder of functional data analysis. [3]

Types

MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix:

Classical multidimensional scaling

It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain, [2] which is given by

where denote vectors in N-dimensional space, denotes the scalar product between and , and are the elements of the matrix defined on step 2 of the following algorithm, which are computed from the distances.

Steps of a Classical MDS algorithm:
Classical MDS uses the fact that the coordinate matrix can be derived by eigenvalue decomposition from . And the matrix can be computed from proximity matrix by using double centering. [4]
  1. Set up the squared proximity matrix
  2. Apply double centering: using the centering matrix , where is the number of objects, is the identity matrix, and is an matrix of all ones.
  3. Determine the largest eigenvalues and corresponding eigenvectors of (where is the number of dimensions desired for the output).
  4. Now, , where is the matrix of eigenvectors and is the diagonal matrix of eigenvalues of .
Classical MDS assumes metric distances. So this is not applicable for direct dissimilarity ratings.

Metric multidimensional scaling (mMDS)

It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Metric MDS minimizes the cost function called “stress” which is a residual sum of squares:

Metric scaling uses a power transformation with a user-controlled exponent : and for distance. In classical scaling Non-metric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities.

Non-metric multidimensional scaling (NMDS)

In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space.

Let be the dissimilarity between points . Let be the Euclidean distance between embedded points .

Now, for each choice of the embedded points and is a monotonically increasing function , define the "stress" function:

The factor of in the denominator is necessary to prevent a "collapse". Suppose we define instead , then it can be trivially minimized by setting , then collapse every point to the same point.

A few variants of this cost function exist. MDS programs automatically minimize stress in order to obtain the MDS solution.

The core of a non-metric MDS algorithm is a twofold optimization process. First the optimal monotonic transformation of the proximities has to be found. Secondly, the points of a configuration have to be optimally arranged, so that their distances match the scaled proximities as closely as possible.

NMDS needs to optimize two objectives simultaneously. This is usually done iteratively:

  1. Initialize randomly, e. g. by sampling from a normal distribution.
  2. Do until a stopping criterion (for example, )
    1. Solve for by isotonic regression.
    2. Solve for by gradient descent or other methods.
  3. Return and

Louis Guttman's smallest space analysis (SSA) is an example of a non-metric MDS procedure.

Generalized multidimensional scaling (GMD)

An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. [5]

Details

The data to be analyzed is a collection of objects (colors, faces, stocks, . . .) on which a distance function is defined,

distance between -th and -th objects.

These distances are the entries of the dissimilarity matrix

The goal of MDS is, given , to find vectors such that

for all ,

where is a vector norm. In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function. [6]

In other words, MDS attempts to find a mapping from the objects into such that distances are preserved. If the dimension is chosen to be 2 or 3, we may plot the vectors to obtain a visualization of the similarities between the objects. Note that the vectors are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances .

(Note: The symbol indicates the set of real numbers, and the notation refers to the Cartesian product of copies of , which is an -dimensional vector space over the field of the real numbers.)

There are various approaches to determining the vectors . Usually, MDS is formulated as an optimization problem, where is found as a minimizer of some cost function, for example,

A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimizers can be stated analytically in terms of matrix eigendecompositions. [2]

Procedure

There are several steps in conducting MDS research:

  1. Formulating the problem – What variables do you want to compare? How many variables do you want to compare? What purpose is the study to be used for?
  2. Obtaining input data – For example, :- Respondents are asked a series of questions. For each product pair, they are asked to rate similarity (usually on a 7-point Likert scale from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as where Q is the number of questions and N is the number of brands. This approach is referred to as the “Perception data : direct approach”. There are two other approaches. There is the “Perception data : derived approach” in which products are decomposed into attributes that are rated on a semantic differential scale. The other is the “Preference data approach” in which respondents are asked their preference rather than similarity.
  3. Running the MDS statistical program – Software for running the procedure is available in many statistical software packages. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS [7] (which deals with ordinal data).
  4. Decide number of dimensions – The researcher must decide on the number of dimensions they want the computer to create. Interpretability of the MDS solution is often important, and lower dimensional solutions will typically be easier to interpret and visualize. However, dimension selection is also an issue of balancing underfitting and overfitting. Lower dimensional solutions may underfit by leaving out important dimensions of the dissimilarity data. Higher dimensional solutions may overfit to noise in the dissimilarity measurements. Model selection tools like AIC, BIC, Bayes factors, or cross-validation can thus be useful to select the dimensionality that balances underfitting and overfitting.
  5. Mapping the results and defining the dimensions – The statistical program (or a related module) will map the results. The map will plot each product (usually in two-dimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. How the dimensions of the embedding actually correspond to dimensions of system behavior, however, are not necessarily obvious. Here, a subjective judgment about the correspondence can be made (see perceptual mapping).
  6. Test the results for reliability and validity – Compute R-squared to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum acceptable level. [ citation needed ] An R-square of 0.8 is considered good for metric scaling and .9 is considered good for non-metric scaling. Other possible tests are Kruskal’s Stress, split data tests, data stability tests (i.e., eliminating one brand), and test-retest reliability.
  7. Report the results comprehensively – Along with the mapping, at least distance measure (e.g., Sorenson index, Jaccard index) and reliability (e.g., stress value) should be given. It is also very advisable to give the algorithm (e.g., Kruskal, Mather), which is often defined by the program used (sometimes replacing the algorithm report), if you have given a start configuration or had a random choice, the number of runs, the assessment of dimensionality, the Monte Carlo method results, the number of iterations, the assessment of stability, and the proportional variance of each axis (r-square).

Implementations

See also

Related Research Articles

<span class="mw-page-title-main">Gradient</span> Multivariate derivative (mathematics)

In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:

In mathematics, and more specifically in linear algebra, a linear map is a mapping between two vector spaces that preserves the operations of vector addition and scalar multiplication. The same names and the same definition are also used for the more general case of modules over a ring; see Module homomorphism.

In differential geometry, a Riemannian manifold or Riemannian space(M, g), so called after the German mathematician Bernhard Riemann, is a real, smooth manifold M equipped with a positive-definite inner product gp on the tangent space TpM at each point p.

In the mathematical field of differential geometry, a metric tensor is an additional structure on a manifold M that allows defining distances and angles, just as the inner product on a Euclidean space allows defining distances and angles there. More precisely, a metric tensor at a point p of M is a bilinear form defined on the tangent space at p, and a metric field on M consists of a metric tensor at each point p of M that varies smoothly with p.

<span class="mw-page-title-main">Exterior algebra</span> Algebra of a vector space

In mathematics, the exterior product or wedge product of vectors is an algebraic construction used in geometry to study areas, volumes, and their higher-dimensional analogs. The exterior product of two vectors u and v, denoted by uv, is called a bivector and lives in a space called the exterior square, a vector space that is distinct from the original space of vectors. The magnitude of uv can be interpreted as the area of the parallelogram with sides u and v, which in three dimensions can also be computed using the cross product of the two vectors. Like the cross product, the exterior product is anticommutative, meaning that uv = −(vu) for all vectors u and v, but, unlike the cross product, the exterior product is associative. One way to visualize a bivector is as a family of parallelograms all lying in the same plane, having the same area and orientation, which is a choice of rotational direction within the plane (clockwise or counterclockwise from some view).

<span class="mw-page-title-main">Eigenfunction</span> Mathematical function of a linear operator

In mathematics, an eigenfunction of a linear operator D defined on some function space is any non-zero function in that space that, when acted upon by D, is only multiplied by some scaling factor called an eigenvalue. As an equation, this condition can be written as

In differential geometry, the Ricci curvature tensor, named after Gregorio Ricci-Curbastro, is a geometric object which is determined by a choice of Riemannian or pseudo-Riemannian metric on a manifold. It can be considered, broadly, as a measure of the degree to which the geometry of a given metric tensor differs locally from that of ordinary Euclidean space or pseudo-Euclidean space.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

In linear algebra, the Gram matrix of a set of vectors in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product . If the vectors are the columns of matrix then the Gram matrix is in the general case that the vector coordinates are complex numbers, which simplifies to for the case that the vector coordinates are real numbers.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

<span class="mw-page-title-main">Isotonic regression</span> Type of numerical analysis

In statistics and numerical analysis, isotonic regression or monotonic regression is the technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing everywhere, and lies as close to the observations as possible.

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

In mathematics, a Euclidean distance matrix is an n×n matrix representing the spacing of a set of n points in Euclidean space. For points in k-dimensional space k, the elements of their Euclidean distance matrix A are given by squares of distances between them. That is

In population biology and demography, generation time is the average time between two consecutive generations in the lineages of a population. In human populations, generation time typically has ranged from 20 to 30 years, with wide variation based on gender and society. Historians sometimes use this to date events, by converting generations into years to obtain rough estimates of time.

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of -dimensional data items, a configuration of points in -dimensional space is sought that minimizes the so-called stress function . Usually is or , i.e. the matrix lists points in or dimensional Euclidean space so that the result may be visualised. The function is a cost or loss function that measures the squared differences between ideal distances and actual distances in r-dimensional space. It is defined as:

The intrinsic dimension for a data set can be thought of as the number of variables needed in a minimal representation of the data. Similarly, in signal processing of multidimensional signals, the intrinsic dimension of the signal describes how many variables are needed to generate a good approximation of the signal.

Sammon mapping or Sammon projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality by trying to preserve the structure of inter-point distances in high-dimensional space in the lower-dimension projection.

<span class="mw-page-title-main">Isomap</span>

Isomap is a nonlinear dimensionality reduction method. It is one of several widely used low-dimensional embedding methods. Isomap is used for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points. The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point’s neighbors on the manifold. Isomap is highly efficient and generally applicable to a broad range of data sources and dimensionalities.

Different definitions have been given for the dimension of a complex network or graph. For example, metric dimension is defined in terms of the resolving set for a graph. Dimension has also been defined based on the box covering method applied to graphs. Here we describe the definition based on the complex network zeta function. This generalises the definition based on the scaling property of the volume with distance. The best definition depends on the application.

Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis. Since the 1970s, alpha has been used in content analysis where textual units are categorized by trained readers, in counseling and survey research where experts code open-ended interview data into analyzable terms, in psychological testing where alternative tests of the same phenomena need to be compared, or in observational studies where unstructured happenings are recorded for subsequent analysis.

References

  1. Mead, A (1992). "Review of the Development of Multidimensional Scaling Methods". Journal of the Royal Statistical Society. Series D (The Statistician). 41 (1): 27–39. JSTOR   2348634. Abstract. Multidimensional scaling methods are now a common statistical tool in psychophysics and sensory analysis. The development of these methods is charted, from the original research of Torgerson (metric scaling), Shepard and Kruskal (non-metric scaling) through individual differences scaling and the maximum likelihood methods proposed by Ramsay.
  2. 1 2 3 Borg, I.; Groenen, P. (2005). Modern Multidimensional Scaling: theory and applications (2nd ed.). New York: Springer-Verlag. pp. 207–212. ISBN   978-0-387-94845-4.
  3. Genest, Christian; Nešlehová, Johanna G.; Ramsay, James O. (2014). "A Conversation with James O. Ramsay". International Statistical Review / Revue Internationale de Statistique. 82 (2): 161–183. JSTOR   43299752 . Retrieved 30 June 2021.
  4. Wickelmaier, Florian. "An introduction to MDS." Sound Quality Research Unit, Aalborg University, Denmark (2003): 46
  5. Bronstein AM, Bronstein MM, Kimmel R (January 2006). "Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching". Proc. Natl. Acad. Sci. U.S.A. 103 (5): 1168–72. Bibcode:2006PNAS..103.1168B. doi: 10.1073/pnas.0508601103 . PMC   1360551 . PMID   16432211.
  6. Kruskal, J. B., and Wish, M. (1978), Multidimensional Scaling, Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications.
  7. Kruskal, J. B. (1964). "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis". Psychometrika. 29 (1): 1–27. doi:10.1007/BF02289565. S2CID   48165675.
  8. Leeuw, Jan de; Mair, Patrick (2009). "Multidimensional Scaling Using Majorization: SMACOF in R". Journal of Statistical Software. 31 (3). doi: 10.18637/jss.v031.i03 . ISSN   1548-7660.

Bibliography