Generalized minimal residual method

Last updated

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

Contents

The GMRES method was developed by Yousef Saad and Martin H. Schultz in 1986. [1] It is a generalization and improvement of the MINRES method due to Paige and Saunders in 1975. [2] [3] The MINRES method requires that the matrix is symmetric, but has the advantage that it only requires handling of three vectors. GMRES is a special case of the DIIS method developed by Peter Pulay in 1980. DIIS is applicable to non-linear systems.

The method

Denote the Euclidean norm of any vector v by . Denote the (square) system of linear equations to be solved by The matrix A is assumed to be invertible of size m-by-m. Furthermore, it is assumed that b is normalized, i.e., that .

The n-th Krylov subspace for this problem is where is the initial error given an initial guess . Clearly if .

GMRES approximates the exact solution of by the vector that minimizes the Euclidean norm of the residual .

The vectors might be close to linearly dependent, so instead of this basis, the Arnoldi iteration is used to find orthonormal vectors which form a basis for . In particular, .

Therefore, the vector can be written as with , where is the m-by-n matrix formed by . In other words, finding the n-th approximation of the solution (i.e., ) is reduced to finding the vector , which is determined via minimizing the residue as described below.

The Arnoldi process also constructs , an ()-by- upper Hessenberg matrix which satisfies an equality which is used to simplify the calculation of (see § Solving the least squares problem). Note that, for symmetric matrices, a symmetric tri-diagonal matrix is actually achieved, resulting in the MINRES method.

Because columns of are orthonormal, we have where is the first vector in the standard basis of , and being the first trial residual vector (usually ). Hence, can be found by minimizing the Euclidean norm of the residual This is a linear least squares problem of size n.

This yields the GMRES method. On the -th iteration:

  1. calculate with the Arnoldi method;
  2. find the which minimizes ;
  3. compute ;
  4. repeat if the residual is not yet small enough.

At every iteration, a matrix-vector product must be computed. This costs about floating-point operations for general dense matrices of size , but the cost can decrease to for sparse matrices. In addition to the matrix-vector product, floating-point operations must be computed at the n -th iteration.

Convergence

The nth iterate minimizes the residual in the Krylov subspace . Since every subspace is contained in the next subspace, the residual does not increase. After m iterations, where m is the size of the matrix A, the Krylov space Km is the whole of Rm and hence the GMRES method arrives at the exact solution. However, the idea is that after a small number of iterations (relative to m), the vector xn is already a good approximation to the exact solution.

This does not happen in general. Indeed, a theorem of Greenbaum, Pták and Strakoš states that for every nonincreasing sequence a1, ..., am1, am = 0, one can find a matrix A such that the rn = an for all n, where rn is the residual defined above. In particular, it is possible to find a matrix for which the residual stays constant for m  1 iterations, and only drops to zero at the last iteration.

In practice, though, GMRES often performs well. This can be proven in specific situations. If the symmetric part of A, that is , is positive definite, then where and denote the smallest and largest eigenvalue of the matrix , respectively. [4]

If A is symmetric and positive definite, then we even have where denotes the condition number of A in the Euclidean norm.

In the general case, where A is not positive definite, we have where Pn denotes the set of polynomials of degree at most n with p(0) = 1, V is the matrix appearing in the spectral decomposition of A, and σ(A) is the spectrum of A. Roughly speaking, this says that fast convergence occurs when the eigenvalues of A are clustered away from the origin and A is not too far from normality. [5]

All these inequalities bound only the residuals instead of the actual error, that is, the distance between the current iterate xn and the exact solution.

Extensions of the method

Like other iterative methods, GMRES is usually combined with a preconditioning method in order to speed up convergence.

The cost of the iterations grow as O(n2), where n is the iteration number. Therefore, the method is sometimes restarted after a number, say k, of iterations, with xk as initial guess. The resulting method is called GMRES(k) or Restarted GMRES. For non-positive definite matrices, this method may suffer from stagnation in convergence as the restarted subspace is often close to the earlier subspace.

The shortcomings of GMRES and restarted GMRES are addressed by the recycling of Krylov subspace in the GCRO type methods such as GCROT and GCRODR. [6] Recycling of Krylov subspaces in GMRES can also speed up convergence when sequences of linear systems need to be solved. [7]

Comparison with other solvers

The Arnoldi iteration reduces to the Lanczos iteration for symmetric matrices. The corresponding Krylov subspace method is the minimal residual method (MinRes) of Paige and Saunders. Unlike the unsymmetric case, the MinRes method is given by a three-term recurrence relation. It can be shown that there is no Krylov subspace method for general matrices, which is given by a short recurrence relation and yet minimizes the norms of the residuals, as GMRES does.

Another class of methods builds on the unsymmetric Lanczos iteration, in particular the BiCG method. These use a three-term recurrence relation, but they do not attain the minimum residual, and hence the residual does not decrease monotonically for these methods. Convergence is not even guaranteed.

The third class is formed by methods like CGS and BiCGSTAB. These also work with a three-term recurrence relation (hence, without optimality) and they can even terminate prematurely without achieving convergence. The idea behind these methods is to choose the generating polynomials of the iteration sequence suitably.

None of these three classes is the best for all matrices; there are always examples in which one class outperforms the other. Therefore, multiple solvers are tried in practice to see which one is the best for a given problem.

Solving the least squares problem

One part of the GMRES method is to find the vector which minimizes Note that is an (n + 1)-by-n matrix, hence it gives an over-constrained linear system of n+1 equations for n unknowns.

The minimum can be computed using a QR decomposition: find an (n + 1)-by-(n + 1) orthogonal matrix Ωn and an (n + 1)-by-n upper triangular matrix such that The triangular matrix has one more row than it has columns, so its bottom row consists of zero. Hence, it can be decomposed as where is an n-by-n (thus square) triangular matrix.

The QR decomposition can be updated cheaply from one iteration to the next, because the Hessenberg matrices differ only by a row of zeros and a column: where hn+1 = (h1,n+1, ..., hn+1,n+1)T. This implies that premultiplying the Hessenberg matrix with Ωn, augmented with zeroes and a row with multiplicative identity, yields almost a triangular matrix: This would be triangular if σ is zero. To remedy this, one needs the Givens rotation where With this Givens rotation, we form Indeed, is a triangular matrix with .

Given the QR decomposition, the minimization problem is easily solved by noting that Denoting the vector by with gnRn and γnR, this is The vector y that minimizes this expression is given by Again, the vectors are easy to update. [8]

Example code

Regular GMRES (MATLAB / GNU Octave)

function[x, e] = gmres(A, b, x, max_iterations, threshold)n=length(A);m=max_iterations;% use x as the initial vectorr=b-A*x;b_norm=norm(b);error=norm(r)/b_norm;% initialize the 1D vectorssn=zeros(m,1);cs=zeros(m,1);%e1 = zeros(n, 1);e1=zeros(m+1,1);e1(1)=1;e=[error];r_norm=norm(r);Q(:,1)=r/r_norm;% Note: this is not the beta scalar in section "The method" above but% the beta scalar multiplied by e1beta=r_norm*e1;fork=1:m% run arnoldi[H(1:k+1,k),Q(:,k+1)]=arnoldi(A,Q,k);% eliminate the last element in H ith row and update the rotation matrix[H(1:k+1,k),cs(k),sn(k)]=apply_givens_rotation(H(1:k+1,k),cs,sn,k);% update the residual vectorbeta(k+1)=-sn(k)*beta(k);beta(k)=cs(k)*beta(k);error=abs(beta(k+1))/b_norm;% save the errore=[e;error];if(error<=threshold)break;endend% if threshold is not reached, k = m at this point (and not m+1) % calculate the resulty=H(1:k,1:k)\beta(1:k);x=x+Q(:,1:k)*y;end%----------------------------------------------------%%                  Arnoldi Function                  %%----------------------------------------------------%function[h, q] = arnoldi(A, Q, k)q=A*Q(:,k);% Krylov Vectorfori=1:k% Modified Gram-Schmidt, keeping the Hessenberg matrixh(i)=q'*Q(:,i);q=q-h(i)*Q(:,i);endh(k+1)=norm(q);q=q/h(k+1);end%---------------------------------------------------------------------%%                  Applying Givens Rotation to H col                  %%---------------------------------------------------------------------%function[h, cs_k, sn_k] = apply_givens_rotation(h, cs, sn, k)% apply for ith columnfori=1:k-1temp=cs(i)*h(i)+sn(i)*h(i+1);h(i+1)=-sn(i)*h(i)+cs(i)*h(i+1);h(i)=temp;end% update the next sin cos values for rotation[cs_k,sn_k]=givens_rotation(h(k),h(k+1));% eliminate H(i + 1, i)h(k)=cs_k*h(k)+sn_k*h(k+1);h(k+1)=0.0;end%%----Calculate the Givens rotation matrix----%%function[cs, sn] = givens_rotation(v1, v2)%  if (v1 == 0)%    cs = 0;%    sn = 1;%  elset=sqrt(v1^2+v2^2);%    cs = abs(v1) / t;%    sn = cs * v2 / v1;cs=v1/t;% see http://www.netlib.org/eispack/comqr.fsn=v2/t;%  endend

See also

Related Research Articles

In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the n-th approximation is derived from the previous ones.

In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal for the theorem to apply, nor do they need to be independent and identically distributed.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In numerical linear algebra, the Arnoldi iteration is an eigenvalue algorithm and an important example of an iterative method. Arnoldi finds an approximation to the eigenvalues and eigenvectors of general matrices by constructing an orthonormal basis of the Krylov subspace, which makes it particularly useful when dealing with large sparse matrices.

<span class="mw-page-title-main">Conjugate gradient method</span> Mathematical optimization algorithm

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

<span class="mw-page-title-main">Real coordinate space</span> Space formed by the n-tuples of real numbers

In mathematics, the real coordinate space or real coordinate n-space, of dimension n, denoted Rn or , is the set of all ordered n-tuples of real numbers, that is the set of all sequences of n real numbers, also known as coordinate vectors. Special cases are called the real lineR1, the real coordinate planeR2, and the real coordinate three-dimensional spaceR3. With component-wise addition and scalar multiplication, it is a real vector space.

<span class="mw-page-title-main">Ordinary least squares</span> Method for estimating the unknown parameters in a linear regression model

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.

In linear algebra, the order-rKrylov subspace generated by an n-by-n matrix A and a vector b of dimension n is the linear subspace spanned by the images of b under the first r powers of A, that is,

The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the "most useful" eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than . Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.

In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix is idempotent if and only if . For this product to be defined, must necessarily be a square matrix. Viewed this way, idempotent matrices are idempotent elements of matrix rings.

In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.

In numerical linear algebra, the alternating-direction implicit (ADI) method is an iterative method used to solve Sylvester matrix equations. It is a popular method for solving the large matrix equations that arise in systems theory and control, and can be formulated to construct solutions in a memory-efficient, factored form. It is also used to numerically solve parabolic and elliptic partial differential equations, and is a classic method used for modeling heat conduction and solving the diffusion equation in two or more dimensions. It is an example of an operator splitting method.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().

In numerical linear algebra, the conjugate gradient method is an iterative method for numerically solving the linear system

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

The conjugate residual method is an iterative numeric method used for solving systems of linear equations. It's a Krylov subspace method very similar to the much more popular conjugate gradient method, with similar construction and convergence properties.

In mathematics, Anderson acceleration, also called Anderson mixing, is a method for the acceleration of the convergence rate of fixed-point iterations. Introduced by Donald G. Anderson, this technique can be used to find the solution to fixed point equations often arising in the field of computational science.

<span class="mw-page-title-main">Minimal residual method</span> Computational method

The Minimal Residual Method or MINRES is a Krylov subspace method for the iterative solution of symmetric linear equation systems. It was proposed by mathematicians Christopher Conway Paige and Michael Alan Saunders in 1975.

References

  1. Saad, Youcef; Schultz, Martin H. (1986). "GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems". SIAM Journal on Scientific and Statistical Computing. 7 (3): 856–869. doi:10.1137/0907058. ISSN   0196-5204.
  2. Paige and Saunders, "Solution of Sparse Indefinite Systems of Linear Equations", SIAM J. Numer. Anal., vol 12, page 617 (1975) https://doi.org/10.1137/0712047
  3. Nifa, Naoufal (2017). Solveurs performants pour l'optimisation sous contraintes en identification de paramètres [Efficient solvers for constrained optimization in parameter identification problems] (Thesis) (in French).
  4. Eisenstat, Elman & Schultz 1983 , Thm 3.3. NB all results for GCR also hold for GMRES, cf. Saad & Schultz 1986
  5. Trefethen, Lloyd N.; Bau, David, III. (1997). Numerical Linear Algebra. Philadelphia: Society for Industrial and Applied Mathematics. Theorem 35.2. ISBN   978-0-89871-361-9.{{cite book}}: CS1 maint: multiple names: authors list (link)
  6. Amritkar, Amit; de Sturler, Eric; Świrydowicz, Katarzyna; Tafti, Danesh; Ahuja, Kapil (2015). "Recycling Krylov subspaces for CFD applications and a new hybrid recycling solver". Journal of Computational Physics. 303: 222. arXiv: 1501.03358 . Bibcode:2015JCoPh.303..222A. doi:10.1016/j.jcp.2015.09.040. S2CID   2933274.
  7. Gaul, André (2014). Recycling Krylov subspace methods for sequences of linear systems (Ph.D.). TU Berlin. doi:10.14279/depositonce-4147.
  8. Stoer, Josef; Bulirsch, Roland (2002). Introduction to Numerical Analysis. Texts in Applied Mathematics (3rd ed.). New York: Springer. §8.7.2. ISBN   978-0-387-95452-3.