Sinkhorn's theorem

Last updated May 08, 2023

Sinkhorn's theorem states that every square matrix with positive entries can be written in a certain standard form.

Theorem

If A is an n×n matrix with strictly positive elements, then there exist diagonal matrices D₁ and D₂ with strictly positive diagonal elements such that D₁AD₂ is doubly stochastic. The matrices D₁ and D₂ are unique modulo multiplying the first matrix by a positive number and dividing the second one by the same number.^[1]^[2]

Sinkhorn–Knopp algorithm

A simple iterative method to approach the double stochastic matrix is to alternately rescale all rows and all columns of A to sum to 1. Sinkhorn and Knopp presented this algorithm and analyzed its convergence.^[3] This is essentially the same as the Iterative proportional fitting algorithm, well known in survey statistics.

Analogues and extensions

The following analogue for unitary matrices is also true: for every unitary matrix U there exist two diagonal unitary matrices L and R such that LUR has each of its columns and rows summing to 1.^[4]

The following extension to maps between matrices is also true (see Theorem 5^[5] and also Theorem 4.7^[6]): given a Kraus operator that represents the quantum operation Φ mapping a density matrix into another,

S\mapsto \Phi (S)=\sum _{i}B_{i}SB_{i}^{*},

that is trace preserving,

\sum _{i}B_{i}^{*}B_{i}=I,

and, in addition, whose range is in the interior of the positive definite cone (strict positivity), there exist scalings x_j, for j in {0,1}, that are positive definite so that the rescaled Kraus operator

S\mapsto x_{1}\Phi (x_{0}^{-1}Sx_{0}^{-1})x_{1}=\sum _{i}(x_{1}B_{i}x_{0}^{-1})S(x_{1}B_{i}x_{0}^{-1})^{*}

is doubly stochastic. In other words, it is such that both,

x_{1}\Phi (x_{0}^{-1}Ix_{0}^{-1})x_{1}=I,

as well as for the adjoint,

x_{0}^{-1}\Phi ^{*}(x_{1}Ix_{1})x_{0}^{-1}=I,

where I denotes the identity operator.

Applications

In the 2010s Sinkhorn's theorem came to be used to find solutions of entropy-regularised optimal transport problems.^[7] This has been of interest in machine learning because such "Sinkhorn distances" can be used to evaluate the difference between data distributions and permutations.^[8]^[9]^[10] This improves the training of machine learning algorithms, in situations where maximum likelihood training may not be the best method.

Related Research Articles

In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized. This is extremely useful because computations involving a diagonalizable matrix can often be reduced to much simpler computations involving the corresponding diagonal matrix. The concept of diagonalization is relatively straightforward for operators on finite-dimensional vector spaces but requires some modification for operators on infinite-dimensional spaces. In general, the spectral theorem identifies a class of linear operators that can be modeled by multiplication operators, which are as simple as one can hope to find. In more abstract language, the spectral theorem is a statement about commutative C*-algebras. See also spectral theory for a historical perspective.

In linear algebra, an invertible complex square matrix $U$ is unitary if its conjugate transpose $U *$ is also its inverse, that is, if

In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition

Quantum decoherence is the loss of quantum coherence, the process in which a system's behaviour changes from that which can be explained by quantum mechanics to that which can be explained by classical mechanics. In quantum mechanics, particles such as electrons are described by a wave function, a mathematical representation of the quantum state of a system; a probabilistic interpretation of the wave function is used to explain various quantum effects. As long as there exists a definite phase relation between different states, the system is said to be coherent. A definite phase relationship is necessary to perform quantum computing on quantum information encoded in quantum states. Coherence is preserved under the laws of quantum physics.

In linear algebra, a tridiagonal matrix is a band matrix that has nonzero elements only on the main diagonal, the subdiagonal/lower diagonal, and the supradiagonal/upper diagonal. For example, the following matrix is tridiagonal:

In quantum mechanics, a quantum operation is a mathematical formalism used to describe a broad class of transformations that a quantum mechanical system can undergo. This was first discussed as a general stochastic transformation for a density matrix by George Sudarshan. The quantum operation formalism describes not only unitary time evolution or symmetry transformations of isolated systems, but also the effects of measurement and transient interactions with an environment. In the context of quantum computation, a quantum operation is called a quantum channel.

In linear algebra, the Gram matrix of a set of vectors $in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product . If the vectors are the columns of matrix then the Gram matrix is in the general case that the vector coordinates are complex numbers, which simplifies to for the case that the vector coordinates are real numbers.$

In mathematics, the polar decomposition of a square real or complex matrix $is a factorization of the form, where is an orthogonal matrix and is a positive semi-definite symmetric matrix, both square and of the same size.$

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle-particle interactions within the lattice.

In mathematics, especially in probability and combinatorics, a doubly stochastic matrix (also called bistochastic matrix) is a square matrix $of nonnegative real numbers, each of whose rows and columns sums to 1, i.e.,$

In mathematics, the square root of a matrix extends the notion of square root from numbers to matrices. A matrix $B$ is said to be a square root of $A$ if the matrix product $BB$ is equal to $A$ .

In mathematics, Stinespring's dilation theorem, also called Stinespring's factorization theorem, named after W. Forrest Stinespring, is a result from operator theory that represents any completely positive map on a C*-algebra as a composition of two completely positive maps each of which has a special form:

A *-representation of A on some auxiliary Hilbert space K followed by
An operator map of the form T ↦ V*TV.

In mathematics, Choi's theorem on completely positive maps is a result that classifies completely positive maps between finite-dimensional (matrix) C*-algebras. An infinite-dimensional algebraic generalization of Choi's theorem is known as Belavkin's "Radon–Nikodym" theorem for completely positive maps.

In mathematics, a unistochastic matrix is a doubly stochastic matrix whose entries are the squares of the absolute values of the entries of some unitary matrix.

In mathematics, an orthostochastic matrix is a doubly stochastic matrix whose entries are the squares of the absolute values of the entries of some orthogonal matrix.

<span class="mw-page-title-main">Matrix (mathematics)</span> Array of numbers

In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object.

In mathematics, particularly linear algebra, the Schur–Horn theorem, named after Issai Schur and Alfred Horn, characterizes the diagonal of a Hermitian matrix with given eigenvalues. It has inspired investigations and substantial generalizations in the setting of symplectic geometry. A few important generalizations are Kostant's convexity theorem, Atiyah–Guillemin–Sternberg convexity theorem, Kirwan convexity theorem.

In quantum information theory and quantum optics, the Schrödinger–HJW theorem is a result about the realization of a mixed state of a quantum system as an ensemble of pure quantum states and the relation between the corresponding purifications of the density operators. The theorem is named after physicists and mathematicians Erwin Schrödinger, Lane P. Hughston, Richard Jozsa and William Wootters. The result was also found independently by Nicolas Gisin, and by Nicolas Hadjisavvas building upon work by Ed Jaynes, while a significant part of it was likewise independently discovered by N. David Mermin. Thanks to its complicated history, it is also known by various other names such as the GHJW theorem, the HJW theorem, and the purification theorem.

Birkhoff's algorithm is an algorithm for decomposing a bistochastic matrix into a convex combination of permutation matrices. It was published by Garrett Birkhoff in 1946. It has many applications. One such application is for the problem of fair random assignment: given a randomized allocation of items, Birkhoff's algorithm can decompose it into a lottery on deterministic allocations.

References

↑ Sinkhorn, Richard. (1964). "A relationship between arbitrary positive matrices and doubly stochastic matrices." Ann. Math. Statist.35, 876–879. doi : 10.1214/aoms/1177703591
↑ Marshall, A.W., & Olkin, I. (1967). "Scaling of matrices to achieve specified row and column sums." Numerische Mathematik. 12(1), 83–90. doi : 10.1007/BF02170999
↑ Sinkhorn, Richard, & Knopp, Paul. (1967). "Concerning nonnegative matrices and doubly stochastic matrices". Pacific J. Math.21, 343–348.
↑ Idel, Martin; Wolf, Michael M. (2015). "Sinkhorn normal form for unitary matrices". Linear Algebra and Its Applications. 471: 76–84. arXiv: 1408.5728 . doi:10.1016/j.laa.2014.12.031. S2CID 119175915.
↑ Georgiou, Tryphon; Pavon, Michele (2015). "Positive contraction mappings for classical and quantum Schrödinger systems". Journal of Mathematical Physics. 56 (3): 033301–1–24. arXiv: 1405.6650 . Bibcode:2015JMP....56c3301G. doi:10.1063/1.4915289. S2CID 119707158.
↑ Gurvits, Leonid (2004). "Classical complexity and quantum entanglement". Journal of Computational Science. 69 (3): 448–484. doi: 10.1016/j.jcss.2004.06.003 .
↑ Cuturi, Marco (2013). "Sinkhorn distances: Lightspeed computation of optimal transport". Advances in neural information processing systems. pp. 2292–2300.
↑ Mensch, Arthur; Blondel, Mathieu; Peyre, Gabriel (2019). "Geometric losses for distributional learning". Proc ICML 2019. arXiv: 1905.06005 .
↑ Mena, Gonzalo; Belanger, David; Munoz, Gonzalo; Snoek, Jasper (2017). "Sinkhorn networks: Using optimal transport techniques to learn permutations". NIPS Workshop in Optimal Transport and Machine Learning.
↑ Kogkalidis, Konstantinos; Moortgat, Michael; Moot, Richard (2020). "Neural Proof Nets". Proceedings of the 24th Conference on Computational Natural Language Learning.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Sinkhorn-1] Sinkhorn, Richard. (1964). "A relationship between arbitrary positive matrices and doubly stochastic matrices." Ann. Math. Statist.35, 876–879. doi : 10.1214/aoms/1177703591

[Marshall-2] Marshall, A.W., & Olkin, I. (1967). "Scaling of matrices to achieve specified row and column sums." Numerische Mathematik. 12(1), 83–90. doi : 10.1007/BF02170999

[SinkhornKnopp-3] Sinkhorn, Richard, & Knopp, Paul. (1967). "Concerning nonnegative matrices and doubly stochastic matrices". Pacific J. Math.21, 343–348.

[IdelWolf-4] Idel, Martin; Wolf, Michael M. (2015). "Sinkhorn normal form for unitary matrices". Linear Algebra and Its Applications. 471: 76–84. arXiv: 1408.5728 . doi:10.1016/j.laa.2014.12.031. S2CID 119175915.

[GeoPav-5] Georgiou, Tryphon; Pavon, Michele (2015). "Positive contraction mappings for classical and quantum Schrödinger systems". Journal of Mathematical Physics. 56 (3): 033301–1–24. arXiv: 1405.6650 . Bibcode:2015JMP....56c3301G. doi:10.1063/1.4915289. S2CID 119707158.

[Gur-6] Gurvits, Leonid (2004). "Classical complexity and quantum entanglement". Journal of Computational Science. 69 (3): 448–484. doi: 10.1016/j.jcss.2004.06.003 .

[7] Cuturi, Marco (2013). "Sinkhorn distances: Lightspeed computation of optimal transport". Advances in neural information processing systems. pp. 2292–2300.

[8] Mensch, Arthur; Blondel, Mathieu; Peyre, Gabriel (2019). "Geometric losses for distributional learning". Proc ICML 2019. arXiv: 1905.06005 .

[9] Mena, Gonzalo; Belanger, David; Munoz, Gonzalo; Snoek, Jasper (2017). "Sinkhorn networks: Using optimal transport techniques to learn permutations". NIPS Workshop in Optimal Transport and Machine Learning.

[10] Kogkalidis, Konstantinos; Moortgat, Michael; Moot, Richard (2020). "Neural Proof Nets". Proceedings of the 24th Conference on Computational Natural Language Learning.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]