Divide-and-conquer eigenvalue algorithm

Last updated June 25, 2024

Divide-and-conquer eigenvalue algorithms are a class of eigenvalue algorithms for Hermitian or real symmetric matrices that have recently (circa 1990s) become competitive in terms of stability and efficiency with more traditional algorithms such as the QR algorithm. The basic concept behind these algorithms is the divide-and-conquer approach from computer science. An eigenvalue problem is divided into two problems of roughly half the size, each of these are solved recursively, and the eigenvalues of the original problem are computed from the results of these smaller problems.

Background

As with most eigenvalue algorithms for Hermitian matrices, divide-and-conquer begins with a reduction to tridiagonal form. For an $m\times m$ matrix, the standard method for this, via Householder reflections, takes ${\frac {4}{3}}m^{3}$ floating point operations, or ${\frac {8}{3}}m^{3}$ if eigenvectors are needed as well. There are other algorithms, such as the Arnoldi iteration, which may do better for certain classes of matrices; we will not consider this further here.

In certain cases, it is possible to deflate an eigenvalue problem into smaller problems. Consider a block diagonal matrix

T={\begin{bmatrix}T_{1}&0\\0&T_{2}\end{bmatrix}}.

The eigenvalues and eigenvectors of $T$ are simply those of $T_{1}$ and $T_{2}$ , and it will almost always be faster to solve these two smaller problems than to solve the original problem all at once. This technique can be used to improve the efficiency of many eigenvalue algorithms, but it has special significance to divide-and-conquer.

For the rest of this article, we will assume the input to the divide-and-conquer algorithm is an $m\times m$ real symmetric tridiagonal matrix $T$ . The algorithm can be modified for Hermitian matrices.

Divide

The divide part of the divide-and-conquer algorithm comes from the realization that a tridiagonal matrix is "almost" block diagonal.

The size of submatrix $T_{1}$ we will call $n\times n$ , and then $T_{2}$ is $(m-n)\times (m-n)$ . $T$ is almost block diagonal regardless of how $n$ is chosen. For efficiency we typically choose $n\approx m/2$ .

We write $T$ as a block diagonal matrix, plus a rank-1 correction:

The only difference between $T_{1}$ and ${\hat {T}}_{1}$ is that the lower right entry $t_{nn}$ in ${\hat {T}}_{1}$ has been replaced with $t_{nn}-\beta$ and similarly, in ${\hat {T}}_{2}$ the top left entry $t_{n+1,n+1}$ has been replaced with $t_{n+1,n+1}-\beta$ .

The remainder of the divide step is to solve for the eigenvalues (and if desired the eigenvectors) of ${\hat {T}}_{1}$ and ${\hat {T}}_{2}$ , that is to find the diagonalizations ${\hat {T}}_{1}=Q_{1}D_{1}Q_{1}^{T}$ and ${\hat {T}}_{2}=Q_{2}D_{2}Q_{2}^{T}$ . This can be accomplished with recursive calls to the divide-and-conquer algorithm, although practical implementations often switch to the QR algorithm for small enough submatrices.

Conquer

The conquer part of the algorithm is the unintuitive part. Given the diagonalizations of the submatrices, calculated above, how do we find the diagonalization of the original matrix?

First, define $z^{T}=(q_{1}^{T},q_{2}^{T})$ , where $q_{1}^{T}$ is the last row of $Q_{1}$ and $q_{2}^{T}$ is the first row of $Q_{2}$ . It is now elementary to show that

T={\begin{bmatrix}Q_{1}&\\&Q_{2}\end{bmatrix}}\left({\begin{bmatrix}D_{1}&\\&D_{2}\end{bmatrix}}+\beta zz^{T}\right){\begin{bmatrix}Q_{1}^{T}&\\&Q_{2}^{T}\end{bmatrix}}

The remaining task has been reduced to finding the eigenvalues of a diagonal matrix plus a rank-one correction. Before showing how to do this, let us simplify the notation. We are looking for the eigenvalues of the matrix $D+ww^{T}$ , where $D$ is diagonal with distinct entries and $w$ is any vector with nonzero entries. In this case $w={\sqrt {|\beta |}}\cdot z$ .

The case of a zero entry is simple, since if w_i is zero, ( $e_{i}$ ,d_i) is an eigenpair ( $e_{i}$ is in the standard basis) of $D+ww^{T}$ since $(D+ww^{T})e_{i}=De_{i}=d_{i}e_{i}$ .

If $\lambda$ is an eigenvalue, we have:

(D+ww^{T})q=\lambda q

where $q$ is the corresponding eigenvector. Now

(D-\lambda I)q+w(w^{T}q)=0

q+(D-\lambda I)^{-1}w(w^{T}q)=0

w^{T}q+w^{T}(D-\lambda I)^{-1}w(w^{T}q)=0

Keep in mind that $w^{T}q$ is a nonzero scalar. Neither $w$ nor $q$ are zero. If $w^{T}q$ were to be zero, $q$ would be an eigenvector of $D$ by $(D+ww^{T})q=\lambda q$ . If that were the case, $q$ would contain only one nonzero position since $D$ is distinct diagonal and thus the inner product $w^{T}q$ can not be zero after all. Therefore, we have:

1+w^{T}(D-\lambda I)^{-1}w=0

or written as a scalar equation,

1+\sum _{j=1}^{m}{\frac {w_{j}^{2}}{d_{j}-\lambda }}=0.

This equation is known as the secular equation. The problem has therefore been reduced to finding the roots of the rational function defined by the left-hand side of this equation.

All general eigenvalue algorithms must be iterative,^{[ citation needed ]} and the divide-and-conquer algorithm is no different. Solving the nonlinear secular equation requires an iterative technique, such as the Newton–Raphson method. However, each root can be found in O(1) iterations, each of which requires $\Theta (m)$ flops (for an $m$ -degree rational function), making the cost of the iterative part of this algorithm $\Theta (m^{2})$ .

Analysis

W will use the master theorem for divide-and-conquer recurrences to analyze the running time. Remember that above we stated we choose $n\approx m/2$ . We can write the recurrence relation:

T(m)=2\times T\left({\frac {m}{2}}\right)+\Theta (m^{2})

In the notation of the Master theorem, $a=b=2$ and thus $\log _{b}a=1$ . Clearly, $\Theta (m^{2})=\Omega (m^{1})$ , so we have

T(m)=\Theta (m^{2})

Above, we pointed out that reducing a Hermitian matrix to tridiagonal form takes ${\frac {4}{3}}m^{3}$ flops. This dwarfs the running time of the divide-and-conquer part, and at this point it is not clear what advantage the divide-and-conquer algorithm offers over the QR algorithm (which also takes $\Theta (m^{2})$ flops for tridiagonal matrices).

The advantage of divide-and-conquer comes when eigenvectors are needed as well. If this is the case, reduction to tridiagonal form takes ${\frac {8}{3}}m^{3}$ , but the second part of the algorithm takes $\Theta (m^{3})$ as well. For the QR algorithm with a reasonable target precision, this is $\approx 6m^{3}$ , whereas for divide-and-conquer it is $\approx {\frac {4}{3}}m^{3}$ . The reason for this improvement is that in divide-and-conquer, the $\Theta (m^{3})$ part of the algorithm (multiplying $Q$ matrices) is separate from the iteration, whereas in QR, this must occur in every iterative step. Adding the ${\frac {8}{3}}m^{3}$ flops for the reduction, the total improvement is from $\approx 9m^{3}$ to $\approx 4m^{3}$ flops.

Practical use of the divide-and-conquer algorithm has shown that in most realistic eigenvalue problems, the algorithm actually does better than this. The reason is that very often the matrices $Q$ and the vectors $z$ tend to be numerically sparse, meaning that they have many entries with values smaller than the floating point precision, allowing for numerical deflation, i.e. breaking the problem into uncoupled subproblems.

Variants and implementation

The algorithm presented here is the simplest version. In many practical implementations, more complicated rank-1 corrections are used to guarantee stability; some variants even use rank-2 corrections.^{[ citation needed ]}

There exist specialized root-finding techniques for rational functions that may do better than the Newton-Raphson method in terms of both performance and stability. These can be used to improve the iterative part of the divide-and-conquer algorithm.

The divide-and-conquer algorithm is readily parallelized, and linear algebra computing packages such as LAPACK contain high-quality parallel implementations.^{[ citation needed ]}

Related Research Articles

In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally,

Ray transfer matrix analysis is a mathematical form for performing ray tracing calculations in sufficiently simple problems which can be solved considering only paraxial rays. Each optical element is described by a 2×2 ray transfer matrix which operates on a vector describing an incoming light ray to calculate the outgoing ray. Multiplication of the successive matrices thus yields a concise ray transfer matrix describing the entire optical system. The same mathematics is also used in accelerator physics to track particles through the magnet installations of a particle accelerator, see electron optics.

In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition

In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal matrix is $, while an example of a 3\times3 diagonal matrix is . An identity matrix of any size, or any multiple of it is a diagonal matrix called a scalar matrix, for example, . In geometry, a diagonal matrix may be used as a scaling matrix, since matrix multiplication with it results in changing scale (size) and possibly also shape; only a scalar matrix results in uniform change in scale.$

In linear algebra, a square matrix $is called diagonalizable or non-defective if it is similar to a diagonal matrix. That is, if there exists an invertible matrix and a diagonal matrix such that . This is equivalent to . This property exists for any linear map: for a finite-dimensional vector space, a linear map is called diagonalizable if there exists an ordered basis of consisting of eigenvectors of . These definitions are equivalent: if has a matrix representation as above, then the column vectors of form a basis consisting of eigenvectors of, and the diagonal entries of are the corresponding eigenvalues of; with respect to this eigenvector basis, is represented by .$

In linear algebra, a Jordan normal form, also known as a Jordan canonical form (JCF), is an upper triangular matrix of a particular form called a Jordan matrix representing a linear operator on a finite-dimensional vector space with respect to some basis. Such a matrix has each non-zero off-diagonal entry equal to 1, immediately above the main diagonal, and with identical diagonal entries to the left and below them.

In the mathematical discipline of linear algebra, the Schur decomposition or Schur triangulation, named after Issai Schur, is a matrix decomposition. It allows one to write an arbitrary complex square matrix as unitarily equivalent to an upper triangular matrix whose diagonal elements are the eigenvalues of the original matrix.

In numerical analysis, one of the most important problems is designing efficient and stable algorithms for finding the eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors.

In linear algebra, a Hessenberg matrix is a special kind of square matrix, one that is "almost" triangular. To be exact, an upper Hessenberg matrix has zero entries below the first subdiagonal, and a lower Hessenberg matrix has zero entries above the first superdiagonal. They are named after Karl Hessenberg.

In linear algebra, a tridiagonal matrix is a band matrix that has nonzero elements only on the main diagonal, the subdiagonal/lower diagonal, and the supradiagonal/upper diagonal. For example, the following matrix is tridiagonal:

In numerical analysis, inverse iteration is an iterative eigenvalue algorithm. It allows one to find an approximate eigenvector when an approximation to a corresponding eigenvalue is already known. The method is conceptually similar to the power method. It appears to have originally been developed to compute resonance frequencies in the field of structural mechanics.

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In linear algebra, it is often important to know which vectors have their directions unchanged by a given linear transformation. An eigenvector or characteristic vector is such a vector. More precisely, an eigenvector $of a linear transformation is scaled by a constant factor when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .$

The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the $"most useful" eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than . Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.$

In numerical linear algebra, the Jacobi eigenvalue algorithm is an iterative method for the calculation of the eigenvalues and eigenvectors of a real symmetric matrix. It is named after Carl Gustav Jacob Jacobi, who first proposed the method in 1846, but only became widely used in the 1950s with the advent of computers.

In numerical linear algebra, a Jacobi rotation is a rotation, Q_kℓ, of a 2-dimensional linear subspace of an n-dimensional inner product space, chosen to zero a symmetric pair of off-diagonal entries of an n×n real symmetric matrix, A, when applied as a similarity transformation:

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

In the mathematical field of linear algebra, an arrowhead matrix is a square matrix containing zeros in all entries except for the first row, first column, and main diagonal, these entries can be any number. In other words, the matrix has the form

In linear algebra, two matrices $and are said to commute if, or equivalently if their commutator is zero. A set of matrices is said to commute if they commute pairwise, meaning that every pair of matrices in the set commutes.$

References

Demmel, James W. (1997), Applied Numerical Linear Algebra, Philadelphia, PA: Society for Industrial and Applied Mathematics, ISBN 0-89871-389-7, MR 1463942 .
Cuppen, J.J.M. (1981). "A Divide and Conquer Method for the Symmetric Tridiagonal Eigenproblem". Numerische Mathematik . 36 (2): 177–195. doi:10.1007/BF01396757. S2CID 120504744.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	ATLAS MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software