Arnoldi iteration

Last updated February 20, 2024

In numerical linear algebra, the Arnoldi iteration is an eigenvalue algorithm and an important example of an iterative method. Arnoldi finds an approximation to the eigenvalues and eigenvectors of general (possibly non-Hermitian) matrices by constructing an orthonormal basis of the Krylov subspace, which makes it particularly useful when dealing with large sparse matrices.

The Arnoldi method belongs to a class of linear algebra algorithms that give a partial result after a small number of iterations, in contrast to so-called direct methods which must complete to give any useful results (see for example, Householder transformation). The partial result in this case being the first few vectors of the basis the algorithm is building.

When applied to Hermitian matrices it reduces to the Lanczos algorithm. The Arnoldi iteration was invented by W. E. Arnoldi in 1951.^[1]

Krylov subspaces and the power iteration

An intuitive method for finding the largest (in absolute value) eigenvalue of a given m × m matrix $A$ is the power iteration: starting with an arbitrary initial vector b, calculate Ab, A²b, A³b, ... normalizing the result after every application of the matrix A.

This sequence converges to the eigenvector corresponding to the eigenvalue with the largest absolute value, $\lambda _{1}$ . However, much potentially useful computation is wasted by using only the final result, $A^{n-1}b$ . This suggests that instead, we form the so-called Krylov matrix:

K_{n}={\begin{bmatrix}b&Ab&A^{2}b&\cdots &A^{n-1}b\end{bmatrix}}.

The columns of this matrix are not in general orthogonal, but we can extract an orthogonal basis, via a method such as Gram–Schmidt orthogonalization. The resulting set of vectors is thus an orthogonal basis of the Krylov subspace , ${\mathcal {K}}_{n}$ . We may expect the vectors of this basis to span good approximations of the eigenvectors corresponding to the $n$ largest eigenvalues, for the same reason that $A^{n-1}b$ approximates the dominant eigenvector.

The Arnoldi iteration

The Arnoldi iteration uses the modified Gram–Schmidt process to produce a sequence of orthonormal vectors, q₁, q₂, q₃, ..., called the Arnoldi vectors, such that for every n, the vectors q₁, ..., q_n span the Krylov subspace ${\mathcal {K}}_{n}$ . Explicitly, the algorithm is as follows:

Start with an arbitrary vector q₁ with norm 1. Repeat fork = 2, 3, ...   q_k := Aq_k−1forjfrom 1 tok − 1     h_j,k−1 :=  q_j^*q_kq_k := q_k − h_j,k−1q_jh_k,k−1 := ‖q_k‖q_k := q_k / h_k,k−1

The j-loop projects out the component of $q_{k}$ in the directions of $q_{1},\dots ,q_{k-1}$ . This ensures the orthogonality of all the generated vectors.

The algorithm breaks down when q_k is the zero vector. This happens when the minimal polynomial of A is of degree k. In most applications of the Arnoldi iteration, including the eigenvalue algorithm below and GMRES, the algorithm has converged at this point.

Every step of the k-loop takes one matrix-vector product and approximately 4mk floating point operations.

In the programming language Python with support of the NumPy library:

importnumpyasnpdefarnoldi_iteration(A,b,n:int):"""Compute a basis of the (n + 1)-Krylov subspace of the matrix A.    This is the space spanned by the vectors {b, Ab, ..., A^n b}.    Parameters    ----------    A : array_like        An m × m array.    b : array_like        Initial vector (length m).    n : int        One less than the dimension of the Krylov subspace, or equivalently the *degree* of the Krylov space. Must be >= 1.    Returns    -------    Q : numpy.array        An m x (n + 1) array, where the columns are an orthonormal basis of the Krylov subspace.    h : numpy.array        An (n + 1) x n array. A on basis Q. It is upper Hessenberg.    """eps=1e-12h=np.zeros((n+1,n))Q=np.zeros((A.shape[0],n+1))# Normalize the input vectorQ[:,0]=b/np.linalg.norm(b,2)# Use it as the first Krylov vectorforkinrange(1,n+1):v=np.dot(A,Q[:,k-1])# Generate a new candidate vectorforjinrange(k):# Subtract the projections on previous vectorsh[j,k-1]=np.dot(Q[:,j].conj(),v)v=v-h[j,k-1]*Q[:,j]h[k,k-1]=np.linalg.norm(v,2)ifh[k,k-1]>eps:# Add the produced vector to the list, unlessQ[:,k]=v/h[k,k-1]else:# If that happens, stop iterating.returnQ,hreturnQ,h

Properties of the Arnoldi iteration

Let Q_n denote the m-by-n matrix formed by the first n Arnoldi vectors q₁, q₂, ..., q_n, and let H_n be the (upper Hessenberg) matrix formed by the numbers h_j,k computed by the algorithm:

H_{n}=Q_{n}^{*}AQ_{n}.

The orthogonalization method has to be specifically chosen such that the lower Arnoldi/Krylov components are removed from higher Krylov vectors. As $Aq_{i}$ can be expressed in terms of q₁, ..., q_i+1 by construction, they are orthogonal to q_i+2, ..., q_n,

We then have

H_{n}={\begin{bmatrix}h_{1,1}&h_{1,2}&h_{1,3}&\cdots &h_{1,n}\\h_{2,1}&h_{2,2}&h_{2,3}&\cdots &h_{2,n}\\0&h_{3,2}&h_{3,3}&\cdots &h_{3,n}\\\vdots &\ddots &\ddots &\ddots &\vdots \\0&\cdots &0&h_{n,n-1}&h_{n,n}\end{bmatrix}}.

The matrix H_n can be viewed as A in the subspace ${\mathcal {K}}_{n}$ with the Arnoldi vectors as an orthogonal basis; A is orthogonally projected onto ${\mathcal {K}}_{n}$ . The matrix H_n can be characterized by the following optimality condition. The characteristic polynomial of H_n minimizes ||p(A)q₁||₂ among all monic polynomials of degree n. This optimality problem has a unique solution if and only if the Arnoldi iteration does not break down.

The relation between the Q matrices in subsequent iterations is given by

AQ_{n}=Q_{n+1}{\tilde {H}}_{n}

where

{\tilde {H}}_{n}={\begin{bmatrix}h_{1,1}&h_{1,2}&h_{1,3}&\cdots &h_{1,n}\\h_{2,1}&h_{2,2}&h_{2,3}&\cdots &h_{2,n}\\0&h_{3,2}&h_{3,3}&\cdots &h_{3,n}\\\vdots &\ddots &\ddots &\ddots &\vdots \\\vdots &&0&h_{n,n-1}&h_{n,n}\\0&\cdots &\cdots &0&h_{n+1,n}\end{bmatrix}}

is an (n+1)-by-n matrix formed by adding an extra row to H_n.

Finding eigenvalues with the Arnoldi iteration

The idea of the Arnoldi iteration as an eigenvalue algorithm is to compute the eigenvalues in the Krylov subspace. The eigenvalues of H_n are called the Ritz eigenvalues. Since H_n is a Hessenberg matrix of modest size, its eigenvalues can be computed efficiently, for instance with the QR algorithm, or somewhat related, Francis' algorithm. Also Francis' algorithm itself can be considered to be related to power iterations, operating on nested Krylov subspace. In fact, the most basic form of Francis' algorithm appears to be to choose b to be equal to Ae₁, and extending n to the full dimension of A. Improved versions include one or more shifts, and higher powers of A may be applied in a single steps.^[2]

This is an example of the Rayleigh-Ritz method.

It is often observed in practice that some of the Ritz eigenvalues converge to eigenvalues of A. Since H_n is n-by-n, it has at most n eigenvalues, and not all eigenvalues of A can be approximated. Typically, the Ritz eigenvalues converge to the largest eigenvalues of A. To get the smallest eigenvalues of A, the inverse (operation) of A should be used instead. This can be related to the characterization of H_n as the matrix whose characteristic polynomial minimizes ||p(A)q₁|| in the following way. A good way to get p(A) small is to choose the polynomial p such that p(x) is small whenever x is an eigenvalue of A. Hence, the zeros of p (and thus the Ritz eigenvalues) will be close to the eigenvalues of A.

However, the details are not fully understood yet. This is in contrast to the case where A is Hermitian. In that situation, the Arnoldi iteration becomes the Lanczos iteration, for which the theory is more complete.

Arnoldi iteration demonstrating convergence of Ritz values (red) to the eigenvalues (black) of a 400x400 matrix, composed of uniform random values on the domain [-0.5 +0.5 Arnoldi Iteration.gif — Arnoldi iteration demonstrating convergence of Ritz values (red) to the eigenvalues (black) of a 400x400 matrix, composed of uniform random values on the domain [-0.5 +0.5

]

Restarted Arnoldi iteration

Due to practical storage consideration, common implementations of Arnoldi methods typically restart after a fixed number of iterations. One approach is the Implicitly Restarted Arnoldi Method (IRAM)^[3] by Lehoucq and Sorensen, which was popularized in the free and open source software package ARPACK. ^[4] Another approach is the Krylov-Schur Algorithm by G. W. Stewart, which is more stable and simpler to implement than IRAM.^[5]

Related Research Articles

In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process or Gram-Schmidt algorithm is a way of making two or more vectors perpendicular to each other.

In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.

In mathematics, matrix addition is the operation of adding two matrices by adding the corresponding entries together.

In linear algebra, a Toeplitz matrix or diagonal-constant matrix, named after Otto Toeplitz, is a matrix in which each descending diagonal from left to right is constant. For instance, the following matrix is a Toeplitz matrix:

In mathematics, a square matrix is a matrix with the same number of rows and columns. An n-by-n matrix is known as a square matrix of order $.$ Any two square matrices of the same order can be added and multiplied.

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthonormal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares (LLS) problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.

In linear algebra, a Jordan normal form, also known as a Jordan canonical form (JCF), is an upper triangular matrix of a particular form called a Jordan matrix representing a linear operator on a finite-dimensional vector space with respect to some basis. Such a matrix has each non-zero off-diagonal entry equal to 1, immediately above the main diagonal, and with identical diagonal entries to the left and below them.

In numerical linear algebra, the QR algorithm or QR iteration is an eigenvalue algorithm: that is, a procedure to calculate the eigenvalues and eigenvectors of a matrix. The QR algorithm was developed in the late 1950s by John G. F. Francis and by Vera N. Kublanovskaya, working independently. The basic idea is to perform a QR decomposition, writing the matrix as a product of an orthogonal matrix and an upper triangular matrix, multiply the factors in the reverse order, and iterate.

In linear algebra, a nilpotent matrix is a square matrix N such that

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In linear algebra, it is often important to know which vectors have their directions unchanged by a linear transformation. An eigenvector or characteristic vector is such a vector. Thus an eigenvector $of a linear transformation is scaled by a constant factor when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .$

In linear algebra, the order-rKrylov subspace generated by an n-by-n matrix A and a vector b of dimension n is the linear subspace spanned by the images of b under the first r powers of A, that is,

The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the $"most useful" eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than . Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.$

In numerical linear algebra, the Jacobi method is an iterative algorithm for determining the solutions of a strictly diagonally dominant system of linear equations. Each diagonal element is solved for, and an approximate value is plugged in. The process is then iterated until it converges. This algorithm is a stripped-down version of the Jacobi transformation method of matrix diagonalization. The method is named after Carl Gustav Jacob Jacobi.

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

In mathematics, power iteration is an eigenvalue algorithm: given a diagonalizable matrix $, the algorithm will produce a number, which is the greatest eigenvalue of, and a nonzero vector, which is a corresponding eigenvector of, that is, . The algorithm is also known as the Von Mises iteration .$

Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible.

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

In statistics and signal processing, the orthogonality principle is a necessary and sufficient condition for the optimality of a Bayesian estimator. Loosely stated, the orthogonality principle says that the error vector of the optimal estimator is orthogonal to any possible estimator. The orthogonality principle is most commonly stated for linear estimators, but more general formulations are possible. Since the principle is a necessary and sufficient condition for optimality, it can be used to find the minimum mean square error estimator.

In mathematics, a pseudoreflection is an invertible linear transformation of a finite-dimensional vector space such that it is not the identity transformation, has a finite (multiplicative) order, and fixes a hyperplane. The concept of pseudoreflection generalizes the concepts of reflection and complex reflection and is simply called reflection by some mathematicians. It plays an important role in Invariant theory of finite groups, including the Chevalley-Shephard-Todd theorem.

References

↑ Arnoldi, W. E. (1951). "The principle of minimized iterations in the solution of the matrix eigenvalue problem". Quarterly of Applied Mathematics. 9 (1): 17–29. doi: 10.1090/qam/42792 . ISSN 0033-569X.
↑ David S. Watkins. Francis' Algorithm Washington State University. Retrieved 14 December 2022
↑ R. B. Lehoucq & D. C. Sorensen (1996). "Deflation Techniques for an Implicitly Restarted Arnoldi Iteration". SIAM Journal on Matrix Analysis and Applications. 17 (4): 789–821. doi:10.1137/S0895479895281484. hdl: 1911/101832 .
↑ R. B. Lehoucq; D. C. Sorensen & C. Yang (1998). "ARPACK Users Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods". SIAM. Archived from the original on 2007-06-26. Retrieved 2007-06-30.
↑ Stewart, G. W. (2002). "A Krylov--Schur Algorithm for Large Eigenproblems". SIAM Journal on Matrix Analysis and Applications. 23 (3): 601–614. doi:10.1137/S0895479800371529. ISSN 0895-4798.

W. E. Arnoldi, "The principle of minimized iterations in the solution of the matrix eigenvalue problem," Quarterly of Applied Mathematics, volume 9, pages 17–29, 1951.
Yousef Saad, Numerical Methods for Large Eigenvalue Problems, Manchester University Press, 1992. ISBN 0-7190-3386-1.
Lloyd N. Trefethen and David Bau, III, Numerical Linear Algebra, Society for Industrial and Applied Mathematics, 1997. ISBN 0-89871-361-7.
Jaschke, Leonhard: Preconditioned Arnoldi Methods for Systems of Nonlinear Equations. (2004). ISBN 2-84976-001-3
Implementation: Matlab comes with ARPACK built-in. Both stored and implicit matrices can be analyzed through the eigs() function.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Arnoldi, W. E. (1951). "The principle of minimized iterations in the solution of the matrix eigenvalue problem". Quarterly of Applied Mathematics. 9 (1): 17–29. doi: 10.1090/qam/42792 . ISSN 0033-569X.

[2] David S. Watkins. Francis' Algorithm Washington State University. Retrieved 14 December 2022

[3] R. B. Lehoucq & D. C. Sorensen (1996). "Deflation Techniques for an Implicitly Restarted Arnoldi Iteration". SIAM Journal on Matrix Analysis and Applications. 17 (4): 789–821. doi:10.1137/S0895479895281484. hdl: 1911/101832 .

[4] R. B. Lehoucq; D. C. Sorensen & C. Yang (1998). "ARPACK Users Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods". SIAM. Archived from the original on 2007-06-26. Retrieved 2007-06-30.

[5] Stewart, G. W. (2002). "A Krylov--Schur Algorithm for Large Eigenproblems". SIAM Journal on Matrix Analysis and Applications. 23 (3): 601–614. doi:10.1137/S0895479800371529. ISSN 0895-4798.

[1]

[2]

[3]

[4]

[5]

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	ATLAS MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software