Semidefinite embedding

Last updated October 15, 2023

Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data.^[1]^[2]^[3]

Algorithm

MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps:^[5]

A neighbourhood graph is created. Each input is connected with its k-nearest input vectors (according to Euclidean distance metric) and all k-nearest neighbors are connected with each other. If the data is sampled well enough, the resulting graph is a discrete approximation of the underlying manifold.
The neighbourhood graph is "unfolded" with the help of semidefinite programming. Instead of learning the output vectors directly, the semidefinite programming aims to find an inner product matrix that maximizes the pairwise distances between any two inputs that are not connected in the neighbourhood graph while preserving the nearest neighbors distances.
The low-dimensional embedding is finally obtained by application of multidimensional scaling on the learned inner product matrix.

The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich.^[6]

Optimization formulation

Let $X\,\!$ be the original input and $Y\,\!$ be the embedding. If $i,j\,\!$ are two neighbors, then the local isometry constraint that needs to be satisfied is:^[7]^[8]^[9]

|X_{i}-X_{j}|^{2}=|Y_{i}-Y_{j}|^{2}\,\!

Let $G,K\,\!$ be the Gram matrices of $X\,\!$ and $Y\,\!$ (i.e.: $G_{ij}=X_{i}\cdot X_{j},K_{ij}=Y_{i}\cdot Y_{j}\,\!$ ). We can express the above constraint for every neighbor points $i,j\,\!$ in term of $G,K\,\!$ :^[10]^[11]

G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}\,\!

In addition, we also want to constrain the embedding $Y\,\!$ to center at the origin:^[12]^[13]^[14]

$0=|\sum _{i}Y_{i}|^{2}\Leftrightarrow (\sum _{i}Y_{i})\cdot (\sum _{i}Y_{i})\Leftrightarrow \sum _{i,j}Y_{i}\cdot Y_{j}\Leftrightarrow \sum _{i,j}K_{ij}$

As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is:^[15]^[16]^[17]

$T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}$

Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint ^[18]

Let $\tau =max\{\eta _{ij}|Y_{i}-Y_{j}|^{2}\}\,\!$ where $\eta _{ij}:={\begin{cases}1&{\mbox{if}}\ i{\mbox{ is a neighbour of }}j\\0&{\mbox{otherwise}}.\end{cases}}$

prevents the objective function from diverging (going to infinity).

Since the graph has N points, the distance between any two points $|Y_{i}-Y_{j}|^{2}\leq N\tau \,\!$ . We can then bound the objective function as follows:^[19]^[20]

T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\leq {\dfrac {1}{2N}}\sum _{i,j}(N\tau )^{2}={\dfrac {N^{3}\tau ^{2}}{2}}\,\!

The objective function can be rewritten purely in the form of the Gram matrix:^[21]^[22]^[23]

{\begin{aligned}T(Y)&{}={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\\&{}={\dfrac {1}{2N}}\sum _{i,j}(Y_{i}^{2}+Y_{j}^{2}-Y_{i}\cdot Y_{j}-Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-\sum _{i,j}Y_{i}\cdot Y_{j}-\sum _{i,j}Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-0-0)\\&{}={\dfrac {1}{N}}(\sum _{i}Y_{i}^{2})={\dfrac {1}{N}}(Tr(K))\\\end{aligned}}\,\!

Finally, the optimization can be formulated as:^[24]^[25]^[26]

${\begin{aligned}&{\text{Maximize}}&&Tr(\mathbf {K} )\\&{\text{subject to}}&&\mathbf {K} \succeq 0,\sum _{ij}\mathbf {K} _{ij}=0\\&{\text{and}}&&G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji},\forall i,j{\mbox{ where }}\eta _{ij}=1,\end{aligned}}$

After the Gram matrix $K\,\!$ is learned by semidefinite programming, the output $Y\,\!$ can be obtained via Cholesky decomposition.

In particular, the Gram matrix can be written as $K_{ij}=\sum _{\alpha =1}^{N}(\lambda _{\alpha }V_{\alpha i}V_{\alpha j})\,\!$ where $V_{\alpha i}\,\!$ is the i-th element of eigenvector $V_{\alpha }\,\!$ of the eigenvalue $\lambda _{\alpha }\,\!$ .^[27]^[28]

It follows that the $\alpha \,\!$ -th element of the output $Y_{i}\,\!$ is ${\sqrt {\lambda _{\alpha }}}V_{\alpha i}\,\!$ .^[29]^[30]

Notes

↑ Weinberger, Sha and Saul 2004a
↑ Weinberger and Saul 2004b
↑ Weinberger and Saul 2006
↑ Lawrence 2012 , page 1612
↑ Weinberger, Sha and Saul 2004a , page 7.
↑ Linial, London and Rabinovich 1995
↑ Weinberger, Sha and Saul 2004a , page 3, equation 8
↑ Weinberger and Saul 2004b , page 3, equation 2
↑ Weinberger and Saul 2006 , page 4, equation 2
↑ Weinberger, Sha and Saul 2004a , page 3, equation 9
↑ Weinberger and Saul 2004b , page 3, equation 3
↑ Weinberger, Sha and Saul 2004a , page 3, equation 6
↑ Weinberger and Saul 2004b , page 3, equation 5
↑ Weinberger and Saul 2006 , page 5, equation 8
↑ Weinberger, Sha and Saul 2004a , page 4, equation 10
↑ Weinberger and Saul 2004b , page 4, equation 6
↑ Weinberger and Saul 2006 , page 5, equation 4
↑ Weinberger and Saul 2004b , page 4, equation 7
↑ Weinberger and Saul 2004b , page 4, equation 8
↑ Weinberger and Saul 2006 , page 5, equation 6
↑ Weinberger, Sha and Saul 2004a , page 4, equation 11
↑ Weinberger and Saul 2004b , page 4, equation 9
↑ Weinberger and Saul 2006 , page 6, equations 10 to 13
↑ Weinberger, Sha and Saul 2004a , page 4, section 3.3
↑ Weinberger and Saul 2004b , page 4, equation 9
↑ Weinberger and Saul 2006 , page 6, equations 10 to 13
↑ Weinberger and Saul 2004b , page 4, equation 10
↑ Weinberger and Saul 2006 , page 7, equations 14
↑ Weinberger and Saul 2004b , page 4, equation 11
↑ Weinberger and Saul 2006 , page 7, equations 15

Related Research Articles

Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions $y (x)$ of Bessel's differential equation

In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices $A$ and $B$ is denoted as $AB$ .

In the mathematical field of differential geometry, a metric tensor is an additional structure on a manifold $M$ that allows defining distances and angles, just as the inner product on a Euclidean space allows defining distances and angles there. More precisely, a metric tensor at a point $p$ of $M$ is a bilinear form defined on the tangent space at $p$ , and a metric tensor on $M$ consists of a metric tensor at each point $p$ of $M$ that varies smoothly with $p$ .

In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and returns another function.

In mathematics and classical mechanics, the Poisson bracket is an important binary operation in Hamiltonian mechanics, playing a central role in Hamilton's equations of motion, which govern the time evolution of a Hamiltonian dynamical system. The Poisson bracket also distinguishes a certain class of coordinate transformations, called canonical transformations, which map canonical coordinate systems into canonical coordinate systems. A "canonical coordinate system" consists of canonical position and momentum variables that satisfy canonical Poisson bracket relations. The set of possible canonical transformations is always very rich. For instance, it is often possible to choose the Hamiltonian itself $as one of the new canonical momentum coordinates.$

In mathematics, the Hodge star operator or Hodge star is a linear map defined on the exterior algebra of a finite-dimensional oriented vector space endowed with a nondegenerate symmetric bilinear form. Applying the operator to an element of the algebra produces the Hodge dual of the element. This map was introduced by W. V. D. Hodge.

In mathematics and theoretical physics, the term quantum group denotes one of a few different kinds of noncommutative algebras with additional structure. These include Drinfeld–Jimbo type quantum groups, compact matrix quantum groups, and bicrossproduct quantum groups. Despite their name, they do not themselves have a natural group structure, though they are in some sense 'close' to a group.

<span class="mw-page-title-main">Cobb–Douglas production function</span> Macroeconomic formula that describes productivity

In economics and econometrics, the Cobb–Douglas production function is a particular functional form of the production function, widely used to represent the technological relationship between the amounts of two or more inputs and the amount of output that can be produced by those inputs. The Cobb–Douglas form is developed and tested against statistical evidence by Charles Cobb and Paul Douglas between 1927 and 1947; according to Douglas, the functional form itself was developed earlier by Philip Wicksteed.

In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇².

In atomic physics, hyperfine structure is defined by small shifts in otherwise degenerate energy levels and the resulting splittings in those energy levels of atoms, molecules, and ions, due to electromagnetic multipole interaction between the nucleus and electron clouds.

In mathematics, the determinant of an m×m skew-symmetric matrix can always be written as the square of a polynomial in the matrix entries, a polynomial with integer coefficients that only depends on m. When m is odd, the polynomial is zero. When m=2n is even, it is a nonzero polynomial of degree n, and is unique up to multiplication by ±1. The convention on skew-symmetric tridiagonal matrices, given below in the examples, then determines one specific polynomial, called the Pfaffian polynomial. The value of this polynomial, when applied to the entiries of a skew-symmetric matrix, is called the Pfaffian of that matrix. The term Pfaffian was introduced by Cayley (1852) who indirectly named them after Johann Friedrich Pfaff.

In the theory of partial differential equations, elliptic operators are differential operators that generalize the Laplace operator. They are defined by the condition that the coefficients of the highest-order derivatives be positive, which implies the key property that the principal symbol is invertible, or equivalently that there are no real characteristic directions.

The Gaussian integral, also known as the Euler–Poisson integral, is the integral of the Gaussian function $over the entire real line. Named after the German mathematician Carl Friedrich Gauss, the integral is$

<span class="mw-page-title-main">Curvilinear coordinates</span> Coordinate system whose directions vary in space

In geometry, curvilinear coordinates are a coordinate system for Euclidean space in which the coordinate lines may be curved. These coordinates may be derived from a set of Cartesian coordinates by using a transformation that is locally invertible at each point. This means that one can convert a point given in a Cartesian coordinate system to its curvilinear coordinates and back. The name curvilinear coordinates, coined by the French mathematician Lamé, derives from the fact that the coordinate surfaces of the curvilinear systems are curved.

In mathematics, a Kac–Moody algebra is a Lie algebra, usually infinite-dimensional, that can be defined by generators and relations through a generalized Cartan matrix. These algebras form a generalization of finite-dimensional semisimple Lie algebras, and many properties related to the structure of a Lie algebra such as its root system, irreducible representations, and connection to flag manifolds have natural analogues in the Kac–Moody setting.

In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices.

In mathematics, the spin representations are particular projective representations of the orthogonal or special orthogonal groups in arbitrary dimension and signature. More precisely, they are two equivalent representations of the spin groups, which are double covers of the special orthogonal groups. They are usually studied over the real or complex numbers, but they can be defined over other fields.

A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection. The process can describe the probabilistic dynamics in a finite population of constant size N in which two alleles A and B are competing for dominance. The two alleles are considered to be true replicators.

In mathematics, Ricci calculus constitutes the rules of index notation and manipulation for tensors and tensor fields on a differentiable manifold, with or without a metric tensor or connection. It is also the modern name for what used to be called the absolute differential calculus, developed by Gregorio Ricci-Curbastro in 1887–1896, and subsequently popularized in a paper written with his pupil Tullio Levi-Civita in 1900. Jan Arnoldus Schouten developed the modern notation and formalism for this mathematical framework, and made contributions to the theory, during its applications to general relativity and differential geometry in the early twentieth century.

Tau functions are an important ingredient in the modern mathematical theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form.

References

Linial, London and Rabinovich, Nathan, Eran and Yuri (1995). "The geometry of graphs and some of its algorithmic applications". Combinatorica. 15 (2): 215–245. doi:10.1007/BF01200757. S2CID 5071936.{{cite journal}}: CS1 maint: multiple names: authors list (link)
Weinberger, Sha and Saul, Kilian Q., Fei and Lawrence K. (4 July 2004a). Learning a kernel matrix for nonlinear dimensionality reduction. Proceedings of the Twenty First International Conference on Machine Learning (ICML 2004). Banff, Alberta, Canada.{{cite conference}}: CS1 maint: multiple names: authors list (link)
Weinberger and Saul, Kilian Q. and Lawrence K. (27 June 2004b). Unsupervised learning of image manifolds by semidefinite programming. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2.
Weinberger and Saul, Kilian Q. and Lawrence K. (1 May 2006). "Unsupervised learning of image manifolds by semidefinite programming" (PDF). International Journal of Computer Vision. 70: 77–90. doi:10.1007/s11263-005-4939-z. S2CID 291166.
Lawrence, Neil D (2012). "A unifying probabilistic perspective for spectral dimensionality reduction: insights and new models". Journal of Machine Learning Research . 13 (May): 1612. arXiv: 1010.4830 . Bibcode:2010arXiv1010.4830L.

Additional material

Kilian Q. Weinberger's MVU Matlab code

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Weinberger, Sha and Saul 2004a

[2] Weinberger and Saul 2004b

[3] Weinberger and Saul 2006

[4] Lawrence 2012 , page 1612

[5] Weinberger, Sha and Saul 2004a , page 7.

[6] Linial, London and Rabinovich 1995

[7] Weinberger, Sha and Saul 2004a , page 3, equation 8

[8] Weinberger and Saul 2004b , page 3, equation 2

[9] Weinberger and Saul 2006 , page 4, equation 2

[10] Weinberger, Sha and Saul 2004a , page 3, equation 9

[11] Weinberger and Saul 2004b , page 3, equation 3

[12] Weinberger, Sha and Saul 2004a , page 3, equation 6

[13] Weinberger and Saul 2004b , page 3, equation 5

[14] Weinberger and Saul 2006 , page 5, equation 8

[15] Weinberger, Sha and Saul 2004a , page 4, equation 10

[16] Weinberger and Saul 2004b , page 4, equation 6

[17] Weinberger and Saul 2006 , page 5, equation 4

[18] Weinberger and Saul 2004b , page 4, equation 7

[19] Weinberger and Saul 2004b , page 4, equation 8

[20] Weinberger and Saul 2006 , page 5, equation 6

[21] Weinberger, Sha and Saul 2004a , page 4, equation 11

[22] Weinberger and Saul 2004b , page 4, equation 9

[23] Weinberger and Saul 2006 , page 6, equations 10 to 13

[24] Weinberger, Sha and Saul 2004a , page 4, section 3.3

[25] Weinberger and Saul 2004b , page 4, equation 9

[26] Weinberger and Saul 2006 , page 6, equations 10 to 13

[27] Weinberger and Saul 2004b , page 4, equation 10

[28] Weinberger and Saul 2006 , page 7, equations 14

[29] Weinberger and Saul 2004b , page 4, equation 11

[30] Weinberger and Saul 2006 , page 7, equations 15

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]