Eigendecomposition of a matrix

Last updated

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

Contents

Fundamental theory of matrix eigenvectors and eigenvalues

A (nonzero) vector v of dimension N is an eigenvector of a square N × N matrix A if it satisfies a linear equation of the form

for some scalar λ. Then λ is called the eigenvalue corresponding to v. Geometrically speaking, the eigenvectors of A are the vectors that A merely elongates or shrinks, and the amount that they elongate/shrink by is the eigenvalue. The above equation is called the eigenvalue equation or the eigenvalue problem.

This yields an equation for the eigenvalues

We call p(λ) the characteristic polynomial, and the equation, called the characteristic equation, is an Nth-order polynomial equation in the unknown λ. This equation will have Nλ distinct solutions, where 1 ≤ NλN. The set of solutions, that is, the eigenvalues, is called the spectrum of A. [1] [2] [3]

If the field of scalars is algebraically closed, then we can factor p as

The integer ni is termed the algebraic multiplicity of eigenvalue λi. The algebraic multiplicities sum to N:

For each eigenvalue λi, we have a specific eigenvalue equation

There will be 1 ≤ mini linearly independent solutions to each eigenvalue equation. The linear combinations of the mi solutions (except the one which gives the zero vector) are the eigenvectors associated with the eigenvalue λi. The integer mi is termed the geometric multiplicity of λi. It is important to keep in mind that the algebraic multiplicity ni and geometric multiplicity mi may or may not be equal, but we always have mini. The simplest case is of course when mi = ni = 1. The total number of linearly independent eigenvectors, Nv, can be calculated by summing the geometric multiplicities

The eigenvectors can be indexed by eigenvalues, using a double index, with vij being the jth eigenvector for the ith eigenvalue. The eigenvectors can also be indexed using the simpler notation of a single index vk, with k = 1, 2, ..., Nv.

Eigendecomposition of a matrix

Let A be a square n × n matrix with n linearly independent eigenvectors qi (where i = 1, ..., n). Then A can be factorized as

where Q is the square n × n matrix whose ith column is the eigenvector qi of A, and Λ is the diagonal matrix whose diagonal elements are the corresponding eigenvalues, Λii = λi. Note that only diagonalizable matrices can be factorized in this way. For example, the defective matrix (which is a shear matrix) cannot be diagonalized.

The n eigenvectors qi are usually normalized, but they need not be. A non-normalized set of n eigenvectors, vi can also be used as the columns of Q. That can be understood by noting that the magnitude of the eigenvectors in Q gets canceled in the decomposition by the presence of Q−1. If one of the eigenvalues λi has multiple linearly independent eigenvectors (that is, the geometric multiplicity of λi is greater than 1), then these eigenvectors for this eigenvalue λi can be chosen to be mutually orthogonal; however, if two eigenvectors belong to two different eigenvalues, it may be impossible for them to be orthogonal to each other (see Example below). One special case is that if A is a normal matrix, then by the spectral theorem, it's always possible to diagonalize A in an orthonormal basis {qi}.

The decomposition can be derived from the fundamental property of eigenvectors:

The linearly independent eigenvectors qi with nonzero eigenvalues form a basis (not necessarily orthonormal) for all possible products Ax, for xCn, which is the same as the image (or range) of the corresponding matrix transformation, and also the column space of the matrix A. The number of linearly independent eigenvectors qi with nonzero eigenvalues is equal to the rank of the matrix A, and also the dimension of the image (or range) of the corresponding matrix transformation, as well as its column space.

The linearly independent eigenvectors qi with an eigenvalue of zero form a basis (which can be chosen to be orthonormal) for the null space (also known as the kernel) of the matrix transformation A.

Example

The 2 × 2 real matrix A

may be decomposed into a diagonal matrix through multiplication of a non-singular matrix B

Then

for some real diagonal matrix .

Multiplying both sides of the equation on the left by B:

The above equation can be decomposed into two simultaneous equations:

Factoring out the eigenvalues x and y:

Letting

this gives us two vector equations:

And can be represented by a single vector equation involving two solutions as eigenvalues:

where λ represents the two eigenvalues x and y, and u represents the vectors a and b.

Shifting λu to the left hand side and factoring u out

Since B is non-singular, it is essential that u is nonzero. Therefore,

Thus

giving us the solutions of the eigenvalues for the matrix A as λ = 1 or λ = 3, and the resulting diagonal matrix from the eigendecomposition of A is thus .

Putting the solutions back into the above simultaneous equations

Solving the equations, we have

Thus the matrix B required for the eigendecomposition of A is

that is:

Matrix inverse via eigendecomposition

If a matrix A can be eigendecomposed and if none of its eigenvalues are zero, then A is invertible and its inverse is given by

If is a symmetric matrix, since is formed from the eigenvectors of , is guaranteed to be an orthogonal matrix, therefore . Furthermore, because Λ is a diagonal matrix, its inverse is easy to calculate:

Practical implications

When eigendecomposition is used on a matrix of measured, real data, the inverse may be less valid when all eigenvalues are used unmodified in the form above. This is because as eigenvalues become relatively small, their contribution to the inversion is large. Those near zero or at the "noise" of the measurement system will have undue influence and could hamper solutions (detection) using the inverse. [4]

Two mitigations have been proposed: truncating small or zero eigenvalues, and extending the lowest reliable eigenvalue to those below it. See also Tikhonov regularization as a statistically motivated but biased method for rolling off eigenvalues as they become dominated by noise.

The first mitigation method is similar to a sparse sample of the original matrix, removing components that are not considered valuable. However, if the solution or detection process is near the noise level, truncating may remove components that influence the desired solution.

The second mitigation extends the eigenvalue so that lower values have much less influence over inversion, but do still contribute, such that solutions near the noise will still be found.

The reliable eigenvalue can be found by assuming that eigenvalues of extremely similar and low value are a good representation of measurement noise (which is assumed low for most systems).

If the eigenvalues are rank-sorted by value, then the reliable eigenvalue can be found by minimization of the Laplacian of the sorted eigenvalues: [5]

where the eigenvalues are subscripted with an s to denote being sorted. The position of the minimization is the lowest reliable eigenvalue. In measurement systems, the square root of this reliable eigenvalue is the average noise over the components of the system.

Functional calculus

The eigendecomposition allows for much easier computation of power series of matrices. If f(x) is given by

then we know that

Because Λ is a diagonal matrix, functions of Λ are very easy to calculate:

The off-diagonal elements of f(Λ) are zero; that is, f(Λ) is also a diagonal matrix. Therefore, calculating f(A) reduces to just calculating the function on each of the eigenvalues.

A similar technique works more generally with the holomorphic functional calculus, using

from above. Once again, we find that

Examples

which are examples for the functions . Furthermore, is the matrix exponential.

Decomposition for special matrices

Subsets of important classes of matrices Taxonomy of Complex Matrices.svg
Subsets of important classes of matrices

When A is normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

Normal matrices

A complex-valued square matrix A is normal (meaning A*A = AA*, where A* is the conjugate transpose) if and only if it can be decomposed as

where U is a unitary matrix (meaning U* = U−1) and Λ = diag(λ1, ..., λn) is a diagonal matrix. [6] The columns u1, ..., un of U form an orthonormal basis and are eigenvectors of A with corresponding eigenvalues λ1, ..., λn.

If A is restricted to be a Hermitian matrix (A = A*), then Λ has only real valued entries. If A is restricted to a unitary matrix, then Λ takes all its values on the complex unit circle, that is, |λi| = 1.

Real symmetric matrices

As a special case, for every n × n real symmetric matrix, the eigenvalues are real and the eigenvectors can be chosen real and orthonormal. Thus a real symmetric matrix A can be decomposed as

where Q is an orthogonal matrix whose columns are the real, orthonormal eigenvectors of A, and Λ is a diagonal matrix whose entries are the eigenvalues of A. [7]

Useful facts

Useful facts regarding eigenvalues

Useful facts regarding eigenvectors

Useful facts regarding eigendecomposition

Useful facts regarding matrix inverse

Numerical computations

Numerical computation of eigenvalues

Suppose that we want to compute the eigenvalues of a given matrix. If the matrix is small, we can compute them symbolically using the characteristic polynomial. However, this is often impossible for larger matrices, in which case we must use a numerical method.

In practice, eigenvalues of large matrices are not computed using the characteristic polynomial. Computing the polynomial becomes expensive in itself, and exact (symbolic) roots of a high-degree polynomial can be difficult to compute and express: the Abel–Ruffini theorem implies that the roots of high-degree (5 or above) polynomials cannot in general be expressed simply using nth roots. Therefore, general algorithms to find eigenvectors and eigenvalues are iterative.

Iterative numerical algorithms for approximating roots of polynomials exist, such as Newton's method, but in general it is impractical to compute the characteristic polynomial and then apply these methods. One reason is that small round-off errors in the coefficients of the characteristic polynomial can lead to large errors in the eigenvalues and eigenvectors: the roots are an extremely ill-conditioned function of the coefficients. [8]

A simple and accurate iterative method is the power method: a random vector v is chosen and a sequence of unit vectors is computed as

This sequence will almost always converge to an eigenvector corresponding to the eigenvalue of greatest magnitude, provided that v has a nonzero component of this eigenvector in the eigenvector basis (and also provided that there is only one eigenvalue of greatest magnitude). This simple algorithm is useful in some practical applications; for example, Google uses it to calculate the page rank of documents in their search engine. [9] Also, the power method is the starting point for many more sophisticated algorithms. For instance, by keeping not just the last vector in the sequence, but instead looking at the span of all the vectors in the sequence, one can get a better (faster converging) approximation for the eigenvector, and this idea is the basis of Arnoldi iteration. [8] Alternatively, the important QR algorithm is also based on a subtle transformation of a power method. [8]

Numerical computation of eigenvectors

Once the eigenvalues are computed, the eigenvectors could be calculated by solving the equation

using Gaussian elimination or any other method for solving matrix equations.

However, in practical large-scale eigenvalue methods, the eigenvectors are usually computed in other ways, as a byproduct of the eigenvalue computation. In power iteration, for example, the eigenvector is actually computed before the eigenvalue (which is typically computed by the Rayleigh quotient of the eigenvector). [8] In the QR algorithm for a Hermitian matrix (or any normal matrix), the orthonormal eigenvectors are obtained as a product of the Q matrices from the steps in the algorithm. [8] (For more general matrices, the QR algorithm yields the Schur decomposition first, from which the eigenvectors can be obtained by a backsubstitution procedure. [10] ) For Hermitian matrices, the Divide-and-conquer eigenvalue algorithm is more efficient than the QR algorithm if both eigenvectors and eigenvalues are desired. [8]

Additional topics

Generalized eigenspaces

Recall that the geometric multiplicity of an eigenvalue can be described as the dimension of the associated eigenspace, the nullspace of λIA. The algebraic multiplicity can also be thought of as a dimension: it is the dimension of the associated generalized eigenspace (1st sense), which is the nullspace of the matrix (λIA)k for any sufficiently large k. That is, it is the space of generalized eigenvectors (first sense), where a generalized eigenvector is any vector which eventually becomes 0 if λIA is applied to it enough times successively. Any eigenvector is a generalized eigenvector, and so each eigenspace is contained in the associated generalized eigenspace. This provides an easy proof that the geometric multiplicity is always less than or equal to the algebraic multiplicity.

This usage should not be confused with the generalized eigenvalue problem described below.

Conjugate eigenvector

A conjugate eigenvector or coneigenvector is a vector sent after transformation to a scalar multiple of its conjugate, where the scalar is called the conjugate eigenvalue or coneigenvalue of the linear transformation. The coneigenvectors and coneigenvalues represent essentially the same information and meaning as the regular eigenvectors and eigenvalues, but arise when an alternative coordinate system is used. The corresponding equation is

For example, in coherent electromagnetic scattering theory, the linear transformation A represents the action performed by the scattering object, and the eigenvectors represent polarization states of the electromagnetic wave. In optics, the coordinate system is defined from the wave's viewpoint, known as the Forward Scattering Alignment (FSA), and gives rise to a regular eigenvalue equation, whereas in radar, the coordinate system is defined from the radar's viewpoint, known as the Back Scattering Alignment (BSA), and gives rise to a coneigenvalue equation.

Generalized eigenvalue problem

A generalized eigenvalue problem (second sense) is the problem of finding a (nonzero) vector v that obeys

where A and B are matrices. If v obeys this equation, with some λ, then we call v the generalized eigenvector of A and B (in the second sense), and λ is called the generalized eigenvalue of A and B (in the second sense) which corresponds to the generalized eigenvector v. The possible values of λ must obey the following equation

If n linearly independent vectors {v1, …, vn} can be found, such that for every i ∈ {1, …, n}, Avi = λiBvi, then we define the matrices P and D such that

Then the following equality holds

And the proof is

And since P is invertible, we multiply the equation from the right by its inverse, finishing the proof.

The set of matrices of the form AλB, where λ is a complex number, is called a pencil; the term matrix pencil can also refer to the pair (A, B) of matrices. [11]

If B is invertible, then the original problem can be written in the form

which is a standard eigenvalue problem. However, in most situations it is preferable not to perform the inversion, but rather to solve the generalized eigenvalue problem as stated originally. This is especially important if A and B are Hermitian matrices, since in this case B−1A is not generally Hermitian and important properties of the solution are no longer apparent.

If A and B are both symmetric or Hermitian, and B is also a positive-definite matrix, the eigenvalues λi are real and eigenvectors v1 and v2 with distinct eigenvalues are B-orthogonal (v1*Bv2 = 0). [12] In this case, eigenvectors can be chosen so that the matrix P defined above satisfies

or ,

and there exists a basis of generalized eigenvectors (it is not a defective problem). [11] This case is sometimes called a Hermitian definite pencil or definite pencil. [11]

See also

Notes

  1. Golub & Van Loan (1996 , p. 310)
  2. Kreyszig (1972 , p. 273)
  3. Nering (1970 , p. 270)
  4. Hayde, A. F.; Twede, D. R. (2002). Shen, Sylvia S. (ed.). "Observations on relationship between eigenvalues, instrument noise and detection performance". Imaging Spectrometry VIII. Proceedings of SPIE. 4816: 355. Bibcode:2002SPIE.4816..355H. doi:10.1117/12.453777. S2CID   120953647.
  5. Twede, D. R.; Hayden, A. F. (2004). Shen, Sylvia S; Lewis, Paul E (eds.). "Refinement and generalization of the extension method of covariance matrix inversion by regularization". Imaging Spectrometry IX. Proceedings of SPIE. 5159: 299. Bibcode:2004SPIE.5159..299T. doi:10.1117/12.506993. S2CID   123123072.
  6. Horn & Johnson (1985), p. 133, Theorem 2.5.3
  7. Horn & Johnson (1985), p. 136, Corollary 2.5.11
  8. 1 2 3 4 5 6 Trefethen, Lloyd N.; Bau, David (1997). Numerical Linear Algebra. SIAM. ISBN   978-0-89871-361-9.
  9. Ipsen, Ilse, and Rebecca M. Wills, Analysis and Computation of Google's PageRank Archived 2018-09-21 at the Wayback Machine , 7th IMACS International Symposium on Iterative Methods in Scientific Computing, Fields Institute, Toronto, Canada, 5–8 May 2005.
  10. Quarteroni, Alfio; Sacco, Riccardo; Saleri, Fausto (2000). "section 5.8.2". Numerical Mathematics. Springer. p. 15. ISBN   978-0-387-98959-4.
  11. 1 2 3 Bai, Z.; Demmel, J.; Dongarra, J.; Ruhe, A.; Van Der Vorst, H., eds. (2000). "Generalized Hermitian Eigenvalue Problems". Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Philadelphia: SIAM. ISBN   978-0-89871-471-5. Archived from the original on 2010-08-21. Retrieved 2022-09-09.
  12. Parlett, Beresford N. (1998). The symmetric eigenvalue problem (Reprint. ed.). Philadelphia: Society for Industrial and Applied Mathematics. p. 345. doi:10.1137/1.9781611971163. ISBN   978-0-89871-402-9.

Related Research Articles

<span class="mw-page-title-main">Square matrix</span> Matrix with the same number of rows and columns

In mathematics, a square matrix is a matrix with the same number of rows and columns. An n-by-n matrix is known as a square matrix of order . Any two square matrices of the same order can be added and multiplied.

In mathematics, a complex square matrix A is normal if it commutes with its conjugate transpose A*:

Ray transfer matrix analysis is a mathematical form for performing ray tracing calculations in sufficiently simple problems which can be solved considering only paraxial rays. Each optical element is described by a 2×2 ray transfer matrix which operates on a vector describing an incoming light ray to calculate the outgoing ray. Multiplication of the successive matrices thus yields a concise ray transfer matrix describing the entire optical system. The same mathematics is also used in accelerator physics to track particles through the magnet installations of a particle accelerator, see electron optics.

In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition

In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal matrix is , while an example of a 3×3 diagonal matrix is. An identity matrix of any size, or any multiple of it is a diagonal matrix called scalar matrix, for example, . In geometry, a diagonal matrix may be used as a scaling matrix, since matrix multiplication with it results in changing scale (size) and possibly also shape; only a scalar matrix results in uniform change in scale.

In mathematics, a Hermitian matrix is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the i-th row and j-th column is equal to the complex conjugate of the element in the j-th row and i-th column, for all indices i and j:

<span class="mw-page-title-main">Eigenfunction</span> Mathematical function of a linear operator

In mathematics, an eigenfunction of a linear operator D defined on some function space is any non-zero function in that space that, when acted upon by D, is only multiplied by some scaling factor called an eigenvalue. As an equation, this condition can be written as

In linear algebra, a square matrix  is called diagonalizable or non-defective if it is similar to a diagonal matrix. That is, if there exists an invertible matrix  and a diagonal matrix such that . This is equivalent to . This property exists for any linear map: for a finite-dimensional vector space , a linear map  is called diagonalizable if there exists an ordered basis of  consisting of eigenvectors of . These definitions are equivalent: if  has a matrix representation as above, then the column vectors of  form a basis consisting of eigenvectors of , and the diagonal entries of  are the corresponding eigenvalues of ; with respect to this eigenvector basis,  is represented by .

<span class="mw-page-title-main">Jordan normal form</span> Form of a matrix indicating its eigenvalues and their algebraic multiplicities

In linear algebra, a Jordan normal form, also known as a Jordan canonical form (JCF), is an upper triangular matrix of a particular form called a Jordan matrix representing a linear operator on a finite-dimensional vector space with respect to some basis. Such a matrix has each non-zero off-diagonal entry equal to 1, immediately above the main diagonal, and with identical diagonal entries to the left and below them.

In the mathematical discipline of linear algebra, the Schur decomposition or Schur triangulation, named after Issai Schur, is a matrix decomposition. It allows one to write an arbitrary complex square matrix as unitarily equivalent to an upper triangular matrix whose diagonal elements are the eigenvalues of the original matrix.

In numerical analysis, one of the most important problems is designing efficient and stable algorithms for finding the eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors.

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In applied mathematics, in particular the context of nonlinear system analysis, a phase plane is a visual display of certain characteristics of certain kinds of differential equations; a coordinate plane with axes being the values of the two state variables, say (x, y), or (q, p) etc. (any pair of variables). It is a two-dimensional case of the general n-dimensional phase space.

In linear algebra, it is often important to know which vectors have their directions unchanged by a given linear transformation. An eigenvector or characteristic vector is such a vector. Thus an eigenvector of a linear transformation is scaled by a constant factor when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .

In geometry and linear algebra, a principal axis is a certain line in a Euclidean space associated with an ellipsoid or hyperboloid, generalizing the major and minor axes of an ellipse or hyperbola. The principal axis theorem states that the principal axes are perpendicular, and gives a constructive procedure for finding them.

In mathematics, the quadratic eigenvalue problem (QEP), is to find scalar eigenvalues , left eigenvectors and right eigenvectors such that

In the mathematical field of linear algebra, an arrowhead matrix is a square matrix containing zeros in all entries except for the first row, first column, and main diagonal, these entries can be any number. In other words, the matrix has the form

A differential equation is a mathematical equation for an unknown function of one or several variables that relates the values of the function itself and its derivatives of various orders. A matrix differential equation contains more than one function stacked into vector form with a matrix relating the functions to their derivatives.

A matrix difference equation is a difference equation in which the value of a vector of variables at one point in time is related to its own value at one or more previous points in time, using matrices. The order of the equation is the maximum time gap between any two indicated values of the variable vector. For example,

In mathematics, particularly in linear algebra and applications, matrix analysis is the study of matrices and their algebraic properties. Some particular topics out of many include; operations defined on matrices, functions of matrices, and the eigenvalues of matrices.

References