Sylvester equation

Last updated December 30, 2024

In mathematics, in the field of control theory, a Sylvester equation is a matrix equation of the form:^[1]

It is named after English mathematician James Joseph Sylvester. Then given matrices A, B, and C, the problem is to find the possible matrices X that obey this equation. All matrices are assumed to have coefficients in the complex numbers. For the equation to make sense, the matrices must have appropriate sizes, for example they could all be square matrices of the same size. But more generally, A and B must be square matrices of sizes n and m respectively, and then X and C both have n rows and m columns.

A Sylvester equation has a unique solution for X exactly when there are no common eigenvalues of A and −B. More generally, the equation AX + XB = C has been considered as an equation of bounded operators on a (possibly infinite-dimensional) Banach space. In this case, the condition for the uniqueness of a solution X is almost the same: There exists a unique solution X exactly when the spectra of A and −B are disjoint.^[2]

Existence and uniqueness of the solutions

Using the Kronecker product notation and the vectorization operator $\operatorname {vec}$ , we can rewrite Sylvester's equation in the form

(I_{m}\otimes A+B^{T}\otimes I_{n})\operatorname {vec} X=\operatorname {vec} C,

where $A$ is of dimension $n\!\times \!n$ , $B$ is of dimension $m\!\times \!m$ , $X$ of dimension $n\!\times \!m$ and $I_{k}$ is the $k\times k$ identity matrix. In this form, the equation can be seen as a linear system of dimension $mn\times mn$ .^[3]

Theorem. Given matrices $A\in \mathbb {C} ^{n\times n}$ and $B\in \mathbb {C} ^{m\times m}$ , the Sylvester equation $AX+XB=C$ has a unique solution $X\in \mathbb {C} ^{n\times m}$ for any $C\in \mathbb {C} ^{n\times m}$ if and only if $A$ and $-B$ do not share any eigenvalue.

Proof. The equation $AX+XB=C$ is a linear system with $mn$ unknowns and the same number of equations. Hence it is uniquely solvable for any given $C$ if and only if the homogeneous equation $AX+XB=0$ admits only the trivial solution $0$ .

(i) Assume that $A$ and $-B$ do not share any eigenvalue. Let $X$ be a solution to the abovementioned homogeneous equation. Then $AX=X(-B)$ , which can be lifted to $A^{k}X=X(-B)^{k}$ for each $k\geq 0$ by mathematical induction. Consequently, $p(A)X=Xp(-B)$ for any polynomial $p$ . In particular, let $p$ be the characteristic polynomial of $A$ . Then $p(A)=0$ due to the Cayley–Hamilton theorem; meanwhile, the spectral mapping theorem tells us $\sigma (p(-B))=p(\sigma (-B)),$ where $\sigma (\cdot )$ denotes the spectrum of a matrix. Since $A$ and $-B$ do not share any eigenvalue, $p(\sigma (-B))$ does not contain zero, and hence $p(-B)$ is nonsingular. Thus $X=0$ as desired. This proves the "if" part of the theorem.

(ii) Now assume that $A$ and $-B$ share an eigenvalue $\lambda$ . Let $u$ be a corresponding right eigenvector for $A$ , $v$ be a corresponding left eigenvector for $-B$ , and $X=u{v}^{*}$ . Then $X\neq 0$ , and $AX+XB=A(uv^{*})-(uv^{*})(-B)=\lambda uv^{*}-\lambda uv^{*}=0.$ Hence $X$ is a nontrivial solution to the aforesaid homogeneous equation, justifying the "only if" part of the theorem. Q.E.D.

As an alternative to the spectral mapping theorem, the nonsingularity of $p(-B)$ in part (i) of the proof can also be demonstrated by the Bézout's identity for coprime polynomials. Let $q$ be the characteristic polynomial of $-B$ . Since $A$ and $-B$ do not share any eigenvalue, $p$ and $q$ are coprime. Hence there exist polynomials $f$ and $g$ such that $p(z)f(z)+q(z)g(z)\equiv 1$ . By the Cayley–Hamilton theorem, $q(-B)=0$ . Thus $p(-B)f(-B)=I$ , implying that $p(-B)$ is nonsingular.

The theorem remains true for real matrices with the caveat that one considers their complex eigenvalues. The proof for the "if" part is still applicable; for the "only if" part, note that both $\mathrm {Re} (uv^{*})$ and $\mathrm {Im} (uv^{*})$ satisfy the homogenous equation $AX+XB=0$ , and they cannot be zero simultaneously.

Roth's removal rule

Given two square complex matrices A and B, of size n and m, and a matrix C of size n by m, then one can ask when the following two square matrices of size n + m are similar to each other: ${\begin{bmatrix}A&C\\0&B\end{bmatrix}}$ and ${\begin{bmatrix}A&0\\0&B\end{bmatrix}}$ . The answer is that these two matrices are similar exactly when there exists a matrix X such that AX − XB = C. In other words, X is a solution to a Sylvester equation. This is known as Roth's removal rule.^[4]

One easily checks one direction: If AX − XB = C then

{\begin{bmatrix}I_{n}&X\\0&I_{m}\end{bmatrix}}{\begin{bmatrix}A&C\\0&B\end{bmatrix}}{\begin{bmatrix}I_{n}&-X\\0&I_{m}\end{bmatrix}}={\begin{bmatrix}A&0\\0&B\end{bmatrix}}.

Roth's removal rule does not generalize to infinite-dimensional bounded operators on a Banach space.^[5] Nevertheless, Roth's removal rule generalizes to the systems of Sylvester equations.^[6]

Numerical solutions

A classical algorithm for the numerical solution of the Sylvester equation is the Bartels–Stewart algorithm, which consists of transforming $A$ and $B$ into Schur form by a QR algorithm, and then solving the resulting triangular system via back-substitution. This algorithm, whose computational cost is ${\mathcal {O}}(n^{3})$ arithmetical operations,^{[ citation needed ]} is used, among others, by LAPACK and the lyap function in GNU Octave.^[7] See also the sylvester function in that language.^[8]^[9] In some specific image processing applications, the derived Sylvester equation has a closed form solution.^[10]

Notes

↑ This equation is also commonly written in the equivalent form of AX − XB = C.
↑ Bhatia and Rosenthal, 1997
↑ However, rewriting the equation in this form is not advised for the numerical solution since this version is costly to solve and can be ill-conditioned.
↑ Gerrish, F; Ward, A.G.B (Nov 1998). "Sylvester's matrix equation and Roth's removal rule". The Mathematical Gazette. 82 (495): 423–430. doi:10.2307/3619888. JSTOR 3619888. S2CID 126229881.
↑ Bhatia and Rosenthal, p.3
↑ Dmytryshyn, Andrii; Kågström, Bo (2015). "Coupled Sylvester-type Matrix Equations and Block Diagonalization". SIAM Journal on Matrix Analysis and Applications . 36 (2): 580–593. CiteSeerX 10.1.1.710.6894 . doi:10.1137/151005907.
↑ "Function Reference: Lyap".
↑ "Functions of a Matrix (GNU Octave (version 4.4.1))".
↑ The syl command is deprecated since GNU Octave Version 4.0
↑ Wei, Q.; Dobigeon, N.; Tourneret, J.-Y. (2015). "Fast Fusion of Multi-Band Images Based on Solving a Sylvester Equation". IEEE . 24 (11): 4109–4121. arXiv: 1502.03121 . Bibcode:2015ITIP...24.4109W. doi:10.1109/TIP.2015.2458572. PMID 26208345. S2CID 665111.

Related Research Articles

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In linear algebra, an invertible matrix is a square matrix which has an inverse. In other words, if some other matrix is multiplied by the invertible matrix, the result can be multiplied by an inverse to undo the operation. An invertible matrix multiplied by its inverse yields the identity matrix. Invertible matrices are the same size as their inverse.

In linear algebra, a square matrix $is called diagonalizable or non-defective if it is similar to a diagonal matrix. That is, if there exists an invertible matrix and a diagonal matrix such that . This is equivalent to . This property exists for any linear map: for a finite-dimensional vector space, a linear map is called diagonalizable if there exists an ordered basis of consisting of eigenvectors of . These definitions are equivalent: if has a matrix representation as above, then the column vectors of form a basis consisting of eigenvectors of, and the diagonal entries of are the corresponding eigenvalues of; with respect to this eigenvector basis, is represented by .$

In mathematics, a quadratic form is a polynomial with terms all of degree two. For example,

In mathematics, and in particular linear algebra, the Moore–Penrose inverse⁠ $⁠$ of a matrix ⁠ $⁠$ , often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Arne Bjerhammar in 1951, and Roger Penrose in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.

In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called lower triangular if all the entries above the main diagonal are zero. Similarly, a square matrix is called upper triangular if all the entries below the main diagonal are zero.

In numerical analysis, one of the most important problems is designing efficient and stable algorithms for finding the eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors.

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

The Lyapunov equation, named after the Russian mathematician Aleksandr Lyapunov, is a matrix equation used in the stability analysis of linear dynamical systems.

In linear algebra, an eigenvector or characteristic vector is a vector that has its direction unchanged by a given linear transformation. More precisely, an eigenvector, $, of a linear transformation,, is scaled by a constant factor,, when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .$

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

In mathematics, a system of equations is considered overdetermined if there are more equations than unknowns. An overdetermined system is almost always inconsistent when constructed with random coefficients. However, an overdetermined system will have solutions in some cases, for example if some equation occurs several times in the system, or if some equations are linear combinations of the others.

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

A differential equation is a mathematical equation for an unknown function of one or several variables that relates the values of the function itself and its derivatives of various orders. A matrix differential equation contains more than one function stacked into vector form with a matrix relating the functions to their derivatives.

<span class="mw-page-title-main">Matrix (mathematics)</span> Array of numbers

In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions, with elements or entries arranged in rows and columns, which is used to represent a mathematical object or property of such an object.

In linear algebra, two matrices $and are said to commute if, or equivalently if their commutator is zero. Matrices that commute with matrix are called the commutant of matrix .$

A matrix difference equation is a difference equation in which the value of a vector of variables at one point in time is related to its own value at one or more previous points in time, using matrices. The order of the equation is the maximum time gap between any two indicated values of the variable vector. For example,

In numerical linear algebra, the Bartels–Stewart algorithm is used to numerically solve the Sylvester matrix equation $. Developed by R.H. Bartels and G.W. Stewart in 1971, it was the first numerically stable method that could be systematically applied to solve such equations. The algorithm works by using the real Schur decompositions of and to transform into a triangular system that can then be solved using forward or backward substitution. In 1979, G. Golub, C. Van Loan and S. Nash introduced an improved version of the algorithm, known as the Hessenberg-Schur algorithm. It remains a standard approach for solving Sylvester equations when is of small to moderate size.$

In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.

References

Sylvester, J. (1884). "Sur l'equations en matrices $px=xq$ ". C. R. Acad. Sci. Paris . 99 (2): 67–71, 115–116.
Bartels, R. H.; Stewart, G. W. (1972). "Solution of the matrix equation $AX+XB=C$ ". Comm. ACM . 15 (9): 820–826. doi: 10.1145/361573.361582 . S2CID 12957010.
Bhatia, R.; Rosenthal, P. (1997). "How and why to solve the operator equation $AX-XB=Y$ ?". Bull. London Math. Soc. 29 (1): 1–21. doi:10.1112/S0024609396001828. S2CID 122259404.
Dmytryshyn, Andrii; Kågström, Bo (2015). "Coupled Sylvester-type Matrix Equations and Block Diagonalization". SIAM Journal on Matrix Analysis and Applications . 36 (2): 580–593. CiteSeerX 10.1.1.710.6894 . doi:10.1137/151005907.
Lee, S.-G.; Vu, Q.-P. (2011). "Simultaneous solutions of Sylvester equations and idempotent matrices separating the joint spectrum". Linear Algebra Appl. 435 (9): 2097–2109. doi: 10.1016/j.laa.2010.09.034 .
Wei, Q.; Dobigeon, N.; Tourneret, J.-Y. (2015). "Fast Fusion of Multi-Band Images Based on Solving a Sylvester Equation". IEEE Transactions on Image Processing . 24 (11): 4109–4121. arXiv: 1502.03121 . Bibcode:2015ITIP...24.4109W. doi:10.1109/TIP.2015.2458572. PMID 26208345. S2CID 665111.
Birkhoff and MacLane. A survey of Modern Algebra. Macmillan. pp. 213, 299.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] This equation is also commonly written in the equivalent form of AX − XB = C.

[2] Bhatia and Rosenthal, 1997

[3] However, rewriting the equation in this form is not advised for the numerical solution since this version is costly to solve and can be ill-conditioned.

[4] Gerrish, F; Ward, A.G.B (Nov 1998). "Sylvester's matrix equation and Roth's removal rule". The Mathematical Gazette. 82 (495): 423–430. doi:10.2307/3619888. JSTOR 3619888. S2CID 126229881.

[5] Bhatia and Rosenthal, p.3

[6] Dmytryshyn, Andrii; Kågström, Bo (2015). "Coupled Sylvester-type Matrix Equations and Block Diagonalization". SIAM Journal on Matrix Analysis and Applications . 36 (2): 580–593. CiteSeerX 10.1.1.710.6894 . doi:10.1137/151005907.

[7] "Function Reference: Lyap".

[8] "Functions of a Matrix (GNU Octave (version 4.4.1))".

[9] The syl command is deprecated since GNU Octave Version 4.0

[10] Wei, Q.; Dobigeon, N.; Tourneret, J.-Y. (2015). "Fast Fusion of Multi-Band Images Based on Solving a Sylvester Equation". IEEE . 24 (11): 4109–4121. arXiv: 1502.03121 . Bibcode:2015ITIP...24.4109W. doi:10.1109/TIP.2015.2458572. PMID 26208345. S2CID 665111.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]