Isomap

Last updated January 05, 2024

Isomap is a nonlinear dimensionality reduction method. It is one of several widely used low-dimensional embedding methods.^[1] Isomap is used for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points. The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point’s neighbors on the manifold. Isomap is highly efficient and generally applicable to a broad range of data sources and dimensionalities.

Introduction

Isomap is one representative of isometric mapping methods, and extends metric multidimensional scaling (MDS) by incorporating the geodesic distances imposed by a weighted graph. To be specific, the classical scaling of metric MDS performs low-dimensional embedding based on the pairwise distance between data points, which is generally measured using straight-line Euclidean distance. Isomap is distinguished by its use of the geodesic distance induced by a neighborhood graph embedded in the classical scaling. This is done to incorporate manifold structure in the resulting embedding. Isomap defines the geodesic distance to be the sum of edge weights along the shortest path between two nodes (computed using Dijkstra's algorithm, for example). The top n eigenvectors of the geodesic distance matrix, represent the coordinates in the new n-dimensional Euclidean space.

Algorithm

A very high-level description of Isomap algorithm is given below.

Determine the neighbors of each point.
- All points in some fixed radius.
- K nearest neighbors.
Construct a neighborhood graph.
- Each point is connected to other if it is a K nearest neighbor.
- Edge length equal to Euclidean distance.
Compute shortest path between two nodes.
- Dijkstra's algorithm
- Floyd–Warshall algorithm
Compute lower-dimensional embedding.
- Multidimensional scaling

Extensions of ISOMAP

LandMark ISOMAP (L-ISOMAP): Landmark-Isomap is a variant of Isomap which is faster than Isomap. However, the accuracy of the manifold is compromised by a marginal factor. In this algorithm, n << N landmark points are used out of the total N data points and an nxN matrix of the geodesic distance between each data point to the landmark points is computed. Landmark-MDS (LMDS) is then applied on the matrix to find a Euclidean embedding of all the data points.^[2]

C Isomap : C-Isomap involves magnifying the regions of high density and shrink the regions of low density of data points in the manifold. Edge weights that are maximized in Multi-Dimensional Scaling(MDS) are modified, with everything else remaining unaffected.^[2]
Parallel Transport Unfolding : Replaces the Dijkstra path-based geodesic distance estimates with parallel transport based approximations instead, improving robustness to irregularity and voids in the sampling.^[3]

Possible issues

The connectivity of each data point in the neighborhood graph is defined as its nearest k Euclidean neighbors in the high-dimensional space. This step is vulnerable to "short-circuit errors" if k is too large with respect to the manifold structure or if noise in the data moves the points slightly off the manifold.^[4] Even a single short-circuit error can alter many entries in the geodesic distance matrix, which in turn can lead to a drastically different (and incorrect) low-dimensional embedding. Conversely, if k is too small, the neighborhood graph may become too sparse to approximate geodesic paths accurately. But improvements have been made to this algorithm to make it work better for sparse and noisy data sets.^[5]

Relationship with other methods

Following the connection between the classical scaling and PCA, metric MDS can be interpreted as kernel PCA. In a similar manner, the geodesic distance matrix in Isomap can be viewed as a kernel matrix. The doubly centered geodesic distance matrix K in Isomap is of the form

K=-{\frac {1}{2}}HD^{2}H\,

where $D^{2}=D_{ij}^{2}:=(D_{ij})^{2}$ is the elementwise square of the geodesic distance matrix D = [D_ij], H is the centering matrix, given by

H=I_{n}-{\frac {1}{N}}e_{N}e_{N}^{T},\quad {\text{where }}e_{N}=[1\ \dots \ 1]^{T}\in \mathbb {R} ^{N}.

However, the kernel matrix K is not always positive semidefinite. The main idea for kernel Isomap is to make this K as a Mercer kernel matrix (that is positive semidefinite) using a constant-shifting method, in order to relate it to kernel PCA such that the generalization property naturally emerges .^[6]

Related Research Articles

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in data analysis, visualization and data preprocessing. This is accomplished by linearly transforming the data onto a new coordinate system such that the directions capturing the largest variation in the data can be easily identified. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science.

In mathematics, an isometry is a distance-preserving transformation between metric spaces, usually assumed to be bijective. The word isometry is derived from the Ancient Greek: ἴσος isos meaning "equal", and μέτρον metron meaning "measure". If the transformation is from a metric space to itself, it is a kind of geometric transformation known as a motion.

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Multidimensional scaling</span> Set of related ordination techniques used in information visualization

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of $objects or individuals" into a configuration of points mapped into an abstract Cartesian space.$

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric. If there are $N$ elements, this matrix will have size $N \times N$ . In graph-theoretic applications, the elements are more often referred to as points, nodes or vertices.

Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data.

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression:

Medoids are representative objects of a data set or a cluster within a data set whose sum of dissimilarities to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always restricted to be members of the data set. Medoids are most commonly used on data when a mean or centroid cannot be defined, such as graphs. They are also used in contexts where the centroid is not representative of the dataset like in images, 3-D trajectories and gene expression. These are also of interest while wanting to find a representative using some distance other than squared euclidean distance.

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

In the field of multivariate statistics, kernel principal component analysis (kernel PCA) is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space.

<span class="mw-page-title-main">Euclidean shortest path</span> Problem of computing shortest paths around geometric obstacles

The Euclidean shortest path problem is a problem in computational geometry: given a set of polyhedral obstacles in a Euclidean space, and two points, find the shortest path between the points that does not intersect any of the obstacles.

<span class="mw-page-title-main">Spectral clustering</span> Clustering methods

In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional spaces of data are often encountered in areas such as medicine, where DNA microarray technology can produce many measurements at once, and the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the vocabulary.

Elastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are a system of elastic springs embedded in the data space. This system approximates a low-dimensional manifold. The elastic coefficients of this system allow the switch from completely unstructured k-means clustering to the estimators located closely to linear PCA manifolds. With some intermediate values of the elasticity coefficients, this system effectively approximates non-linear principal manifolds. This approach is based on a mechanical analogy between principal manifolds, that are passing through "the middle" of the data distribution, and elastic membranes and plates. The method was developed by A.N. Gorban, A.Y. Zinovyev and A.A. Pitenko in 1996–1998.

Manifold alignment is a class of machine learning algorithms that produce projections between sets of data, given that the original data sets lie on a common manifold. The concept was first introduced as such by Ham, Lee, and Saul in 2003, adding a manifold constraint to the general problem of correlating sets of high-dimensional vectors.

Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive.

t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. It is based on Stochastic Neighbor Embedding originally developed by Geoffrey Hinton and Sam Roweis, where Laurens van der Maaten proposed the t-distributed variant. It is a nonlinear dimensionality reduction technique for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

The following outline is provided as an overview of and topical guide to machine learning:

References

1 2 Tenenbaum, Joshua B.; Silva, Vin de; Langford, John C. (22 December 2000). "A Global Geometric Framework for Nonlinear Dimensionality Reduction". Science. 290 (5500): 2319–2323. doi:10.1126/science.290.5500.2319.
1 2 Silva, Vin; Tenenbaum, Joshua (2002). "Global Versus Local Methods in Nonlinear Dimensionality Reduction". Advances in Neural Information Processing Systems. MIT Press. 15.
↑ Budninskiy, Max; Yin, Gloria; Feng, Leman; Tong, Yiying; Desbrun, Mathieu (2019). "Parallel Transport Unfolding: A Connection-Based Manifold Learning Approach". SIAM Journal on Applied Algebra and Geometry. 3 (2): 266–291. arXiv: 1806.09039 . doi:10.1137/18M1196133. ISSN 2470-6566.
↑ M. Balasubramanian, E. L. Schwartz, The Isomap Algorithm and Topological Stability. Science 4 January 2002: Vol. 295 no. 5552 p. 7
↑ A. Saxena, A. Gupta and A. Mukerjee. Non-linear dimensionality reduction by locally linear Isomaps, . Lecture Notes in Computer Science, 3316:1038–1043, 2004.
↑ H. Choi, S. Choi, Robust Kernel Isomap, Pattern Recognition, Vol. 40, No. 3, pp. 853-862, 2007

External links

Isomap webpage at Stanford university

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 Tenenbaum, Joshua B.; Silva, Vin de; Langford, John C. (22 December 2000). "A Global Geometric Framework for Nonlinear Dimensionality Reduction". Science. 290 (5500): 2319–2323. doi:10.1126/science.290.5500.2319.

[mit-2] 1 2 Silva, Vin; Tenenbaum, Joshua (2002). "Global Versus Local Methods in Nonlinear Dimensionality Reduction". Advances in Neural Information Processing Systems. MIT Press. 15.

[3] Budninskiy, Max; Yin, Gloria; Feng, Leman; Tong, Yiying; Desbrun, Mathieu (2019). "Parallel Transport Unfolding: A Connection-Based Manifold Learning Approach". SIAM Journal on Applied Algebra and Geometry. 3 (2): 266–291. arXiv: 1806.09039 . doi:10.1137/18M1196133. ISSN 2470-6566.

[4] M. Balasubramanian, E. L. Schwartz, The Isomap Algorithm and Topological Stability. Science 4 January 2002: Vol. 295 no. 5552 p. 7

[5] A. Saxena, A. Gupta and A. Mukerjee. Non-linear dimensionality reduction by locally linear Isomaps, . Lecture Notes in Computer Science, 3316:1038–1043, 2004.

[6] H. Choi, S. Choi, Robust Kernel Isomap, Pattern Recognition, Vol. 40, No. 3, pp. 853-862, 2007

[1]

[2]

[3]

[4]

[5]

[6]