Doubly stochastic matrix

Last updated

In mathematics, especially in probability and combinatorics, a doubly stochastic matrix (also called bistochastic matrix) is a square matrix of nonnegative real numbers, each of whose rows and columns sums to 1, [1] i.e.,

Contents

Thus, a doubly stochastic matrix is both left stochastic and right stochastic. [1] [2]

Indeed, any matrix that is both left and right stochastic must be square: if every row sums to 1 then the sum of all entries in the matrix must be equal to the number of rows, and since the same holds for columns, the number of rows and columns must be equal. [1]

Birkhoff polytope

The class of doubly stochastic matrices is a convex polytope known as the Birkhoff polytope. Using the matrix entries as Cartesian coordinates, it lies in an -dimensional affine subspace of -dimensional Euclidean space defined by independent linear constraints specifying that the row and column sums all equal 1. (There are constraints rather than because one of these constraints is dependent, as the sum of the row sums must equal the sum of the column sums.) Moreover, the entries are all constrained to be non-negative and less than or equal to 1.

Birkhoff–von Neumann theorem

The Birkhoff–von Neumann theorem (often known simply as Birkhoff's theorem [3] [4] [5] ) states that the polytope is the convex hull of the set of permutation matrices, and furthermore that the vertices of are precisely the permutation matrices. In other words, if is a doubly stochastic matrix, then there exist and permutation matrices such that

(Such a decomposition of X is known as a 'convex combination'.) A proof of the theorem based on Hall's marriage theorem is given below.

This representation is known as the Birkhoff–von Neumann decomposition, and may not be unique. It is often described as a real-valued generalization of Kőnig's theorem, where the correspondence is established through adjacency matrices of graphs.

Other properties

Proof of the Birkhoff–von Neumann theorem

Let X be a doubly stochastic matrix. Then we will show that there exists a permutation matrix P such that xij  0 whenever pij  0. Thus if we let λ be the smallest xij corresponding to a non-zero pij, the difference X  λP will be a scalar multiple of a doubly stochastic matrix and will have at least one more zero cell than X. Accordingly we may successively reduce the number of non-zero cells in X by removing scalar multiples of permutation matrices until we arrive at the zero matrix, at which point we will have constructed a convex combination of permutation matrices equal to the original X. [3]

For instance if then , , and .

Proof: Construct a bipartite graph in which the rows of X are listed in one part and the columns in the other, and in which row i is connected to column j iff xij  0. Let A be any set of rows, and define A' as the set of columns joined to rows in A in the graph. We want to express the sizes |A| and |A'| of the two sets in terms of the xij.

For every i in A, the sum over j in A' of xij is 1, since all columns j for which xij  0 are included in A', and X is doubly stochastic; hence |A| is the sum over all i  A, j  A' of xij.

Meanwhile |A'| is the sum over all i (whether or not in A) and all j in A' of xij ; and this is the corresponding sum in which the i are limited to rows in A. Hence |A'|  |A|.

It follows that the conditions of Hall's marriage theorem are satisfied, and that we can therefore find a set of edges in the graph which join each row in X to exactly one (distinct) column. These edges define a permutation matrix whose non-zero cells correspond to non-zero cells in X.

Generalisations

There is a simple generalisation to matrices with more columns and rows such that the i th row sum is equal to ri (a positive integer), the column sums are equal to 1, and all cells are non-negative (the sum of the row sums being equal to the number of columns). Any matrix in this form can be expressed as a convex combination of matrices in the same form made up of 0s and 1s. The proof is to replace the i th row of the original matrix by ri separate rows, each equal to the original row divided by ri ; to apply Birkhoff's theorem to the resulting square matrix; and at the end to additively recombine the ri rows into a single i th row.

In the same way it is possible to replicate columns as well as rows, but the result of recombination is not necessarily limited to 0s and 1s. A different generalisation (with a significantly harder proof) has been put forward by R. M. Caron et al. [4]

See also

Related Research Articles

In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. The determinant of a matrix A is commonly denoted det(A), det A, or |A|. Its value characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the linear map represented by the matrix is an isomorphism. The determinant of a product of matrices is the product of their determinants.

In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.

<span class="mw-page-title-main">Cayley–Hamilton theorem</span> Every square matrix over a commutative ring satisfies its own characteristic equation

In linear algebra, the Cayley–Hamilton theorem states that every square matrix over a commutative ring satisfies its own characteristic equation.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.

In linear algebra, the permanent of a square matrix is a function of the matrix similar to the determinant. The permanent, as well as the determinant, is a polynomial in the entries of the matrix. Both are special cases of a more general function of a matrix called the immanant.

In mathematics, particularly in matrix theory, a permutation matrix is a square binary matrix that has exactly one entry of 1 in each row and each column with all other entries 0. Such a matrix P, say of size n × n, can represent a permutation of n elements. Pre-multiplying an n-row matrix M by such a permutation matrix, forming PM, results in permuting the rows of M, while post-multiplying an n-column matrix M, forming MP, permutes the columns of M.

In mathematics, specifically linear algebra, the Cauchy–Binet formula, named after Augustin-Louis Cauchy and Jacques Philippe Marie Binet, is an identity for the determinant of the product of two rectangular matrices of transpose shapes. It generalizes the statement that the determinant of a product of square matrices is equal to the product of their determinants. The formula is valid for matrices with the entries from any commutative ring.

In mathematics, the determinant of an m×m skew-symmetric matrix can always be written as the square of a polynomial in the matrix entries, a polynomial with integer coefficients that only depends on m. When m is odd, the polynomial is zero. When m is even, it is a nonzero polynomial of degree m/2, and is unique up to multiplication by ±1. The convention on skew-symmetric tridiagonal matrices, given below in the examples, then determines one specific polynomial, called the Pfaffian polynomial. The value of this polynomial, when applied to the entries of a skew-symmetric matrix, is called the Pfaffian of that matrix. The term Pfaffian was introduced by Cayley (1852), who indirectly named them after Johann Friedrich Pfaff.

In mathematics, Muirhead's inequality, named after Robert Franklin Muirhead, also known as the "bunching" method, generalizes the inequality of arithmetic and geometric means.

In matrix theory, the Perron–Frobenius theorem, proved by Oskar Perron (1907) and Georg Frobenius (1912), asserts that a real square matrix with positive entries has a unique eigenvalue of largest magnitude and that eigenvalue is real. The corresponding eigenvector can be chosen to have strictly positive components, and also asserts a similar statement for certain classes of nonnegative matrices. This theorem has important applications to probability theory ; to the theory of dynamical systems ; to economics ; to demography ; to social networks ; to Internet search engines (PageRank); and even to ranking of American football teams. The first to discuss the ordering of players within tournaments using Perron–Frobenius eigenvectors is Edmund Landau.

In mathematics, the Gershgorin circle theorem may be used to bound the spectrum of a square matrix. It was first published by the Soviet mathematician Semyon Aronovich Gershgorin in 1931. Gershgorin's name has been transliterated in several different ways, including Geršgorin, Gerschgorin, Gershgorin, Hershhorn, and Hirschhorn.

In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix is idempotent if and only if . For this product to be defined, must necessarily be a square matrix. Viewed this way, idempotent matrices are idempotent elements of matrix rings.

The Birkhoff polytopeBn is the convex polytope in RN whose points are the doubly stochastic matrices, i.e., the n × n matrices whose entries are non-negative real numbers and whose rows and columns each add up to 1. It is named after Garrett Birkhoff.

In mathematics, a unistochastic matrix is a doubly stochastic matrix whose entries are the squares of the absolute values of the entries of some unitary matrix.

In mathematics, the Robinson–Schensted–Knuth correspondence, also referred to as the RSK correspondence or RSK algorithm, is a combinatorial bijection between matrices A with non-negative integer entries and pairs (P,Q) of semistandard Young tableaux of equal shape, whose size equals the sum of the entries of A. More precisely the weight of P is given by the column sums of A, and the weight of Q by its row sums. It is a generalization of the Robinson–Schensted correspondence, in the sense that taking A to be a permutation matrix, the pair (P,Q) will be the pair of standard tableaux associated to the permutation under the Robinson–Schensted correspondence.

Sinkhorn's theorem states that every square matrix with positive entries can be written in a certain standard form.

Leonid Mirsky was a Russian-British mathematician who worked in number theory, linear algebra, and combinatorics. Mirsky's theorem is named after him.

In mathematics, Manin matrices, named after Yuri Manin who introduced them around 1987–88, are a class of matrices with elements in a not-necessarily commutative ring, which in a certain sense behave like matrices whose elements commute. In particular there is natural definition of the determinant for them and most linear algebra theorems like Cramer's rule, Cayley–Hamilton theorem, etc. hold true for them. Any matrix with commuting elements is a Manin matrix. These matrices have applications in representation theory in particular to Capelli's identity, Yangian and quantum integrable systems.

Georgy Petrovich Egorychev is a Russian mathematician, known for the Egorychev method.

Birkhoff's algorithm is an algorithm for decomposing a bistochastic matrix into a convex combination of permutation matrices. It was published by Garrett Birkhoff in 1946. It has many applications. One such application is for the problem of fair random assignment: given a randomized allocation of items, Birkhoff's algorithm can decompose it into a lottery on deterministic allocations.

References

  1. 1 2 3 Gagniuc, Paul A. (2017). Markov Chains: From Theory to Implementation and Experimentation. USA, NJ: John Wiley & Sons. pp. 9–11. ISBN   978-1-119-38755-8.
  2. Marshal, Olkin (1979). Inequalities: Theory of Majorization and Its Applications . pp.  8. ISBN   978-0-12-473750-1.
  3. 1 2 Birkhoff's theorem, notes by Gábor Hetyei.
  4. 1 2 R. M. Caron, Xin Li, P. Mikusiński, H. Sherwood, and M. D. Taylor, Nonsquare “doubly stochastic” matrices, in: Distributions with Fixed Marginals and Related Topics, IMS Lecture Notes – Monographs Series, edited by L. Rüschendorf, B. Schweizer, and M. D. Taylor, vol. 28, pp. 65-75 (1996) | DOI:10.1214/lnms/1215452610
  5. W. B. Jurkat and H. J. Ryser, "Term Ranks and Permanents of Nonnegative Matrices" (1967).
  6. van der Waerden, B. L. (1926), "Aufgabe 45", Jber. Deutsch. Math.-Verein., 35: 117.
  7. Gyires, B. (1980), "The common source of several inequalities concerning doubly stochastic matrices", Publicationes Mathematicae Institutum Mathematicum Universitatis Debreceniensis, 27 (3–4): 291–304, MR   0604006 .
  8. Egoryčev, G. P. (1980), Reshenie problemy van-der-Vardena dlya permanentov (in Russian), Krasnoyarsk: Akad. Nauk SSSR Sibirsk. Otdel. Inst. Fiz., p. 12, MR   0602332 . Egorychev, G. P. (1981), "Proof of the van der Waerden conjecture for permanents", Akademiya Nauk SSSR (in Russian), 22 (6): 65–71, 225, MR   0638007 . Egorychev, G. P. (1981), "The solution of van der Waerden's problem for permanents", Advances in Mathematics , 42 (3): 299–305, doi: 10.1016/0001-8708(81)90044-X , MR   0642395 .
  9. Falikman, D. I. (1981), "Proof of the van der Waerden conjecture on the permanent of a doubly stochastic matrix", Akademiya Nauk Soyuza SSR (in Russian), 29 (6): 931–938, 957, MR   0625097 .
  10. Fulkerson Prize, Mathematical Optimization Society, retrieved 2012-08-19.