Schur complement

Last updated May 25, 2024

The Schur complement of a block matrix, encountered in linear algebra and the theory of matrices, is defined as follows.

The Schur complement is named after Issai Schur ^[1] who used it to prove Schur's lemma, although it had been used previously.^[2] Emilie Virginia Haynsworth was the first to call it the Schur complement.^[3] The Schur complement is a key tool in the fields of numerical analysis, statistics, and matrix analysis. The Schur complement is sometimes referred to as the Feshbach map after a physicist Herman Feshbach.^[4]

Background

The Schur complement arises when performing a block Gaussian elimination on the matrix M. In order to eliminate the elements below the block diagonal, one multiplies the matrix M by a block lower triangular matrix on the right as follows:

{\begin{aligned}&M={\begin{bmatrix}A&B\\C&D\end{bmatrix}}\quad \to \quad {\begin{bmatrix}A&B\\C&D\end{bmatrix}}{\begin{bmatrix}I_{p}&0\\-D^{-1}C&I_{q}\end{bmatrix}}={\begin{bmatrix}A-BD^{-1}C&B\\0&D\end{bmatrix}},\end{aligned}}

where I_p denotes a p×p identity matrix. As a result, the Schur complement $M/D=A-BD^{-1}C$ appears in the upper-left p×p block.

Continuing the elimination process beyond this point (i.e., performing a block Gauss–Jordan elimination),

{\begin{aligned}&{\begin{bmatrix}A-BD^{-1}C&B\\0&D\end{bmatrix}}\quad \to \quad {\begin{bmatrix}I_{p}&-BD^{-1}\\0&I_{q}\end{bmatrix}}{\begin{bmatrix}A-BD^{-1}C&B\\0&D\end{bmatrix}}={\begin{bmatrix}A-BD^{-1}C&0\\0&D\end{bmatrix}},\end{aligned}}

leads to an LDU decomposition of M, which reads

{\begin{aligned}M&={\begin{bmatrix}A&B\\C&D\end{bmatrix}}={\begin{bmatrix}I_{p}&BD^{-1}\\0&I_{q}\end{bmatrix}}{\begin{bmatrix}A-BD^{-1}C&0\\0&D\end{bmatrix}}{\begin{bmatrix}I_{p}&0\\D^{-1}C&I_{q}\end{bmatrix}}.\end{aligned}}

Thus, the inverse of M may be expressed involving D⁻¹ and the inverse of Schur's complement, assuming it exists, as

{\begin{aligned}M^{-1}={\begin{bmatrix}A&B\\C&D\end{bmatrix}}^{-1}={}&\left({\begin{bmatrix}I_{p}&BD^{-1}\\0&I_{q}\end{bmatrix}}{\begin{bmatrix}A-BD^{-1}C&0\\0&D\end{bmatrix}}{\begin{bmatrix}I_{p}&0\\D^{-1}C&I_{q}\end{bmatrix}}\right)^{-1}\\={}&{\begin{bmatrix}I_{p}&0\\-D^{-1}C&I_{q}\end{bmatrix}}{\begin{bmatrix}\left(A-BD^{-1}C\right)^{-1}&0\\0&D^{-1}\end{bmatrix}}{\begin{bmatrix}I_{p}&-BD^{-1}\\0&I_{q}\end{bmatrix}}\\[4pt]={}&{\begin{bmatrix}\left(A-BD^{-1}C\right)^{-1}&-\left(A-BD^{-1}C\right)^{-1}BD^{-1}\\-D^{-1}C\left(A-BD^{-1}C\right)^{-1}&D^{-1}+D^{-1}C\left(A-BD^{-1}C\right)^{-1}BD^{-1}\end{bmatrix}}\\[4pt]={}&{\begin{bmatrix}\left(M/D\right)^{-1}&-\left(M/D\right)^{-1}BD^{-1}\\-D^{-1}C\left(M/D\right)^{-1}&D^{-1}+D^{-1}C\left(M/D\right)^{-1}BD^{-1}\end{bmatrix}}.\end{aligned}}

The above relationship comes from the elimination operations that involve D⁻¹ and M/D. An equivalent derivation can be done with the roles of A and D interchanged. By equating the expressions for M⁻¹ obtained in these two different ways, one can establish the matrix inversion lemma, which relates the two Schur complements of M: M/D and M/A (see "Derivation from LDU decomposition" in Woodbury matrix identity § Alternative proofs).

Properties

If p and q are both 1 (i.e., A, B, C and D are all scalars), we get the familiar formula for the inverse of a 2-by-2 matrix:

M^{-1}={\frac {1}{AD-BC}}\left[{\begin{matrix}D&-B\\-C&A\end{matrix}}\right]

provided that AD − BC is non-zero.

In general, if A is invertible, then

{\begin{aligned}M&={\begin{bmatrix}A&B\\C&D\end{bmatrix}}={\begin{bmatrix}I_{p}&0\\CA^{-1}&I_{q}\end{bmatrix}}{\begin{bmatrix}A&0\\0&D-CA^{-1}B\end{bmatrix}}{\begin{bmatrix}I_{p}&A^{-1}B\\0&I_{q}\end{bmatrix}},\\[4pt]M^{-1}&={\begin{bmatrix}A^{-1}+A^{-1}B(M/A)^{-1}CA^{-1}&-A^{-1}B(M/A)^{-1}\\-(M/A)^{-1}CA^{-1}&(M/A)^{-1}\end{bmatrix}}\end{aligned}}

whenever this inverse exists.

(Schur's formula) When A, respectively D, is invertible, the determinant of M is also clearly seen to be given by

\det(M)=\det(A)\det \left(D-CA^{-1}B\right)

, respectively

\det(M)=\det(D)\det \left(A-BD^{-1}C\right)

,

which generalizes the determinant formula for 2 × 2 matrices.

(Guttman rank additivity formula) If D is invertible, then the rank of M is given by

\operatorname {rank} (M)=\operatorname {rank} (D)+\operatorname {rank} \left(A-BD^{-1}C\right)

(Haynsworth inertia additivity formula) If A is invertible, then the inertia of the block matrix M is equal to the inertia of A plus the inertia of M/A.
(Quotient identity) $A/B=((A/C)/(B/C))$ .^[5]
The Schur complement of a Laplacian matrix is also a Laplacian matrix.^[6]

Application to solving linear equations

The Schur complement arises naturally in solving a system of linear equations such as^[7]

${\begin{bmatrix}A&B\\C&D\end{bmatrix}}{\begin{bmatrix}x\\y\end{bmatrix}}={\begin{bmatrix}u\\v\end{bmatrix}}$ .

Assuming that the submatrix $A$ is invertible, we can eliminate $x$ from the equations, as follows.

$x=A^{-1}(u-By).$

Substituting this expression into the second equation yields

\left(D-CA^{-1}B\right)y=v-CA^{-1}u.

We refer to this as the reduced equation obtained by eliminating $x$ from the original equation. The matrix appearing in the reduced equation is called the Schur complement of the first block $A$ in $M$ :

S\ {\overset {\underset {\mathrm {def} }{}}{=}}\ D-CA^{-1}B

.

Solving the reduced equation, we obtain

y=S^{-1}\left(v-CA^{-1}u\right).

Substituting this into the first equation yields

x=\left(A^{-1}+A^{-1}BS^{-1}CA^{-1}\right)u-A^{-1}BS^{-1}v.

We can express the above two equation as:

{\begin{bmatrix}x\\y\end{bmatrix}}={\begin{bmatrix}A^{-1}+A^{-1}BS^{-1}CA^{-1}&-A^{-1}BS^{-1}\\-S^{-1}CA^{-1}&S^{-1}\end{bmatrix}}{\begin{bmatrix}u\\v\end{bmatrix}}.

Therefore, a formulation for the inverse of a block matrix is:

{\begin{bmatrix}A&B\\C&D\end{bmatrix}}^{-1}={\begin{bmatrix}A^{-1}+A^{-1}BS^{-1}CA^{-1}&-A^{-1}BS^{-1}\\-S^{-1}CA^{-1}&S^{-1}\end{bmatrix}}={\begin{bmatrix}I_{p}&-A^{-1}B\\&I_{q}\end{bmatrix}}{\begin{bmatrix}A^{-1}&\\&S^{-1}\end{bmatrix}}{\begin{bmatrix}I_{p}&\\-CA^{-1}&I_{q}\end{bmatrix}}.

In particular, we see that the Schur complement is the inverse of the $2,2$ block entry of the inverse of $M$ .

In practice, one needs $A$ to be well-conditioned in order for this algorithm to be numerically accurate.

This method is useful in electrical engineering to reduce the dimension of a network's equations. It is especially useful when element(s) of the source vector are zero. For example, when $u$ or $v$ is zero, we can eliminate the associated rows of the coefficient matrix without any changes to the rest of the source vector. If $v$ is null then the above equation for $x$ reduces to $x=\left(A^{-1}+A^{-1}BS^{-1}CA^{-1}\right)u$ , thus reducing the dimension of the coefficient matrix while leaving $u$ unmodified. This is used to advantage in electrical engineering where it is referred to as node elimination or Kron reduction.

Applications to probability theory and statistics

Suppose the random column vectors X, Y live in Rⁿ and R^m respectively, and the vector (X, Y) in R^{n + m} has a multivariate normal distribution whose covariance is the symmetric positive-definite matrix

\Sigma =\left[{\begin{matrix}A&B\\B^{\mathrm {T} }&C\end{matrix}}\right],

where ${\textstyle A\in \mathbb {R} ^{n\times n}}$ is the covariance matrix of X, ${\textstyle C\in \mathbb {R} ^{m\times m}}$ is the covariance matrix of Y and ${\textstyle B\in \mathbb {R} ^{n\times m}}$ is the covariance matrix between X and Y.

Then the conditional covariance of X given Y is the Schur complement of C in ${\textstyle \Sigma }$ :^[8]

{\begin{aligned}\operatorname {Cov} (X\mid Y)&=A-BC^{-1}B^{\mathrm {T} }\\\operatorname {E} (X\mid Y)&=\operatorname {E} (X)+BC^{-1}(Y-\operatorname {E} (Y))\end{aligned}}

If we take the matrix $\Sigma$ above to be, not a covariance of a random vector, but a sample covariance, then it may have a Wishart distribution. In that case, the Schur complement of C in $\Sigma$ also has a Wishart distribution.^{[ citation needed ]}

Conditions for positive definiteness and semi-definiteness

Let X be a symmetric matrix of real numbers given by

X=\left[{\begin{matrix}A&B\\B^{\mathrm {T} }&C\end{matrix}}\right].

Then

If A is invertible, then X is positive definite if and only if A and its complement X/A are both positive definite:^[2]^: 34

X\succ 0\Leftrightarrow A\succ 0,X/A=C-B^{\mathrm {T} }A^{-1}B\succ 0.

If C is invertible, then X is positive definite if and only if C and its complement X/C are both positive definite:

X\succ 0\Leftrightarrow C\succ 0,X/C=A-BC^{-1}B^{\mathrm {T} }\succ 0.

If A is positive definite, then X is positive semi-definite if and only if the complement X/A is positive semi-definite:^[2]^: 34

{\text{If }}A\succ 0,{\text{ then }}X\succeq 0\Leftrightarrow X/A=C-B^{\mathrm {T} }A^{-1}B\succeq 0.

If C is positive definite, then X is positive semi-definite if and only if the complement X/C is positive semi-definite:

{\text{If }}C\succ 0,{\text{ then }}X\succeq 0\Leftrightarrow X/C=A-BC^{-1}B^{\mathrm {T} }\succeq 0.

The first and third statements can be derived^[7] by considering the minimizer of the quantity

u^{\mathrm {T} }Au+2v^{\mathrm {T} }B^{\mathrm {T} }u+v^{\mathrm {T} }Cv,\,

as a function of v (for fixed u).

Furthermore, since

\left[{\begin{matrix}A&B\\B^{\mathrm {T} }&C\end{matrix}}\right]\succ 0\Longleftrightarrow \left[{\begin{matrix}C&B^{\mathrm {T} }\\B&A\end{matrix}}\right]\succ 0

and similarly for positive semi-definite matrices, the second (respectively fourth) statement is immediate from the first (resp. third) statement.

There is also a sufficient and necessary condition for the positive semi-definiteness of X in terms of a generalized Schur complement.^[2] Precisely,

$X\succeq 0\Leftrightarrow A\succeq 0,C-B^{\mathrm {T} }A^{g}B\succeq 0,\left(I-AA^{g}\right)B=0\,$ and
$X\succeq 0\Leftrightarrow C\succeq 0,A-BC^{g}B^{\mathrm {T} }\succeq 0,\left(I-CC^{g}\right)B^{\mathrm {T} }=0,$

where $A^{g}$ denotes a generalized inverse of $A$ .

Related Research Articles

In mathematics, a symmetric matrix $with real entries is positive-definite if the real number is positive for every nonzero real column vector where is the transpose of . More generally, a Hermitian matrix is positive-definite if the real number is positive for every nonzero complex column vector where denotes the conjugate transpose of$

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.

In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally,

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

In mathematics, the orthogonal group in dimension $n$ , denoted $O(n)$ , is the group of distance-preserving transformations of a Euclidean space of dimension $n$ that preserve a fixed point, where the group operation is given by composing transformations. The orthogonal group is sometimes called the general orthogonal group, by analogy with the general linear group. Equivalently, it is the group of $n \times n$ orthogonal matrices, where the group operation is given by matrix multiplication (an orthogonal matrix is a real matrix whose inverse equals its transpose). The orthogonal group is an algebraic group and a Lie group. It is compact.

In the mathematical field of differential geometry, a metric tensor is an additional structure on a manifold $M$ that allows defining distances and angles, just as the inner product on a Euclidean space allows defining distances and angles there. More precisely, a metric tensor at a point $p$ of $M$ is a bilinear form defined on the tangent space at $p$ , and a metric field on $M$ consists of a metric tensor at each point $p$ of $M$ that varies smoothly with $p$ .

In linear algebra, an $n$ -by- $n$ square matrix $A$ is called invertible if there exists an $n$ -by- $n$ square matrix $B$ such that

<span class="mw-page-title-main">Block matrix</span> Matrix defined using smaller matrices called blocks

In mathematics, a block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into sections called blocks or submatrices.

In differential geometry, the first fundamental form is the inner product on the tangent space of a surface in three-dimensional Euclidean space which is induced canonically from the dot product of $R 3$ . It permits the calculation of curvature and metric properties of a surface such as length and area in a manner consistent with the ambient space. The first fundamental form is denoted by the Roman numeral $I$ ,

In linear algebra and functional analysis, a projection is a linear transformation $from a vector space to itself such that . That is, whenever is applied twice to any vector, it gives the same result as if it were applied once. It leaves its image unchanged. This definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.$

In mathematics, the determinant of an m×m skew-symmetric matrix can always be written as the square of a polynomial in the matrix entries, a polynomial with integer coefficients that only depends on m. When m is odd, the polynomial is zero. When m is even, it is a nonzero polynomial of degree m/2, and is unique up to multiplication by ±1. The convention on skew-symmetric tridiagonal matrices, given below in the examples, then determines one specific polynomial, called the Pfaffian polynomial. The value of this polynomial, when applied to the entries of a skew-symmetric matrix, is called the Pfaffian of that matrix. The term Pfaffian was introduced by Cayley, who indirectly named them after Johann Friedrich Pfaff.

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the part of the domain which the map maps to the zero vector. That is, given a linear map $L : V \to W$ between two vector spaces $V$ and $W$ , the kernel of $L$ is the vector space of all elements $v$ of $V$ such that $L (v) = 0$ , where $0$ denotes the zero vector in $W$ , or more symbolically:

In mathematics and theoretical physics, the Berezinian or superdeterminant is a generalization of the determinant to the case of supermatrices. The name is for Felix Berezin. The Berezinian plays a role analogous to the determinant when considering coordinate changes for integration on a supermanifold.

In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized inverse of a matrix is to obtain a matrix that can serve as an inverse in some sense for a wider class of matrices than invertible matrices. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix $.$

<span class="mw-page-title-main">Classical group</span>

In mathematics, the classical groups are defined as the special linear groups over the reals $, the complex numbers and the quaternions together with special automorphism groups of symmetric or skew-symmetric bilinear forms and Hermitian or skew-Hermitian sesquilinear forms defined on real, complex and quaternionic finite-dimensional vector spaces. Of these, the complex classical Lie groups are four infinite families of Lie groups that together with the exceptional groups exhaust the classification of simple Lie groups. The compact classical groups are compact real forms of the complex classical groups. The finite analogues of the classical groups are the classical groups of Lie type . The term "classical group" was coined by Hermann Weyl, it being the title of his 1939 monograph The Classical Groups .$

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

References

↑ Schur, J. (1917). "Über Potenzreihen die im Inneren des Einheitskreises beschränkt sind". J. reine u. angewandte Mathematik. 147: 205–232. doi:10.1515/crll.1917.147.205.
1 2 3 4 Zhang, Fuzhen (2005). Zhang, Fuzhen (ed.). The Schur Complement and Its Applications. Numerical Methods and Algorithms. Vol. 4. Springer. doi:10.1007/b105056. ISBN 0-387-24271-6.
↑ Haynsworth, E. V., "On the Schur Complement", Basel Mathematical Notes, #BNB 20, 17 pages, June 1968.
↑ Feshbach, Herman (1958). "Unified theory of nuclear reactions". Annals of Physics. 5 (4): 357–390. doi:10.1016/0003-4916(58)90007-1.
↑ Crabtree, Douglas E.; Haynsworth, Emilie V. (1969). "An identity for the Schur complement of a matrix". Proceedings of the American Mathematical Society. 22 (2): 364–366. doi: 10.1090/S0002-9939-1969-0255573-1 . ISSN 0002-9939. S2CID 122868483.
↑ Devriendt, Karel (2022). "Effective resistance is more than distance: Laplacians, Simplices and the Schur complement". Linear Algebra and Its Applications. 639: 24–49. arXiv: 2010.04521 . doi:10.1016/j.laa.2022.01.002. S2CID 222272289.
1 2 Boyd, S. and Vandenberghe, L. (2004), "Convex Optimization", Cambridge University Press (Appendix A.5.5)
↑ von Mises, Richard (1964). "Chapter VIII.9.3". Mathematical theory of probability and statistics . Academic Press. ISBN 978-1483255385.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Schur, J. (1917). "Über Potenzreihen die im Inneren des Einheitskreises beschränkt sind". J. reine u. angewandte Mathematik. 147: 205–232. doi:10.1515/crll.1917.147.205.

[Zhang_2005-2] 1 2 3 4 Zhang, Fuzhen (2005). Zhang, Fuzhen (ed.). The Schur Complement and Its Applications. Numerical Methods and Algorithms. Vol. 4. Springer. doi:10.1007/b105056. ISBN 0-387-24271-6.

[3] Haynsworth, E. V., "On the Schur Complement", Basel Mathematical Notes, #BNB 20, 17 pages, June 1968.

[4] Feshbach, Herman (1958). "Unified theory of nuclear reactions". Annals of Physics. 5 (4): 357–390. doi:10.1016/0003-4916(58)90007-1.

[5] Crabtree, Douglas E.; Haynsworth, Emilie V. (1969). "An identity for the Schur complement of a matrix". Proceedings of the American Mathematical Society. 22 (2): 364–366. doi: 10.1090/S0002-9939-1969-0255573-1 . ISSN 0002-9939. S2CID 122868483.

[6] Devriendt, Karel (2022). "Effective resistance is more than distance: Laplacians, Simplices and the Schur complement". Linear Algebra and Its Applications. 639: 24–49. arXiv: 2010.04521 . doi:10.1016/j.laa.2022.01.002. S2CID 222272289.

[Boyd_2004-7] 1 2 Boyd, S. and Vandenberghe, L. (2004), "Convex Optimization", Cambridge University Press (Appendix A.5.5)

[von_Mises_1964-8] von Mises, Richard (1964). "Chapter VIII.9.3". Mathematical theory of probability and statistics . Academic Press. ISBN 978-1483255385.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]