MINRES

Last updated August 09, 2022

The Minimal Residual Method or MINRES is a Krylov subspace method for the iterative solution of symmetric linear equation systems. It was proposed by mathematicians Christopher Conway Paige and Michael Alan Saunders in 1975.^[1]

GMRES vs. MINRES

The GMRES method is essentially a generalization of MINRES for arbitrary matrices. Both minimize the 2-norm of the residual and do the same calculations in exact arithmetic when the matrix is symmetric. MINRES is a short-recurrence method with a constant memory requirement, whereas GMRES requires storing the whole Krylov space, so its memory requirement is roughly proportional to the number of iterations. On the other hand, GMRES tends to suffer less from loss of orthogonality.^[1]^[2]

Therefore, MINRES tends to be used only when there is not enough memory for GMRES and the matrix is symmetric. Even then, sometimes other methods are preferred, particularly CG for positive-definite matrices.

Properties of the MINRES method

The MINRES method iteratively calculates an approximate solution of a linear system of equations of the form

Ax=b

,

where $A\in \mathbb {R} ^{n\times n}$ is a symmetric matrix and $b\in \mathbb {R} ^{n}$ a vector.

For this, the norm of the residual $r(x):=b-Ax$ in a $k$ -dimensional Krylov subspace

V_{k}=x_{0}+\operatorname {span} \{r_{0},Ar_{0}\ldots ,A^{k-1}r_{0}\}.

is minimized. Here $x_{0}\in \mathbb {R} ^{n}$ is an initial value (often zero) and $r_{0}:=r(x_{0})$ .

More precisely, we define the approximate solutions $x_{k}$ through

x_{k}:=\mathrm {argmin} _{x\in V_{k}}\|r(x)\|

,

where $\|\cdot \|$ is the standard Euclidean norm on $\mathbb {R} ^{n}$ .

Because of the symmetry of $A$ , unlike in the GMRES method, it is possible to carry out this minimization process recursively, storing only two previous steps (short recurrence). This saves memory.

MINRES algorithm

Note: The MINRES method is more complicated than the algebraically equivalent Conjugate Residual method. The Conjugate Residual (CR) method was therefore produced below as a substitute. It differs from MINRES in that in MINRES, the columns of a basis of the Krylov space (denoted below by $p_{k}$ ) can be orthogonalized, whereas in CR their images (below labeled with $s_{k}$ ) can be orthogonalized via the Lanczos recursion. There are more efficient and preconditioned variants with fewer AXPYs. Compare with the article.

First you choose $x_{0}\in \mathbb {R} ^{n}$ arbitrary and compute

r_{0}=b-Ax_{0}

p_{0}=r_{0}

s_{0}=Ap_{0}

Then we iterate for $k=1,2,\dots$ in the following steps:

Compute $x_{k},r_{k}$ through

\alpha _{k-1}={\frac {\langle r_{k-1},s_{k-1}\rangle }{\langle s_{k-1},s_{k-1}\rangle }}

x_{k}=x_{k-1}+\alpha _{k-1}p_{k-1}

r_{k}=r_{k-1}-\alpha _{k-1}s_{k-1}

if

\|r_{k}\|

is smaller than a specified tolerance, the algorithm is interrupted with the approximate solution

x_{k}

. Otherwise, a new descent direction

p_{k}

is calculated through

p_{k}\leftarrow s_{k-1}

s_{k}\leftarrow As_{k-1}

for $l=1,2$ (the step $l=2$ is not carried out in the first iteration step) calculate:

\beta _{k,l}={\frac {\langle s_{k},s_{k-l}\rangle }{\langle s_{k-l},s_{k-l}\rangle }}

p_{k}\leftarrow p_{k}-\beta _{k,l}p_{k-l}

s_{k}\leftarrow s_{k}-\beta _{k,l}s_{k-l}

Convergence rate of the MINRES method

In the case of positive definite matrices, the convergence rate of the MINRES method can be estimated in a way similar to that of the CG method.^[3] In contrast to the CG method, however, the estimation does not apply to the errors of the iterates, but to the residual. The following applies:

\|r_{k}\|\leq 2\left({\frac {{\sqrt {\kappa (A)}}-1}{{\sqrt {\kappa (A)}}+1}}\right)^{k}\|r_{0}\|

,

where $\kappa (A)$ is the condition number of matrix $A$ . Because $A$ is normal, we have

\kappa (A)={\frac {\left|\lambda _{\text{max}}(A)\right|}{\left|\lambda _{\text{min}}(A)\right|}},

where

\lambda _{\text{max}}(A)

and

\lambda _{\text{min}}(A)

are maximal and minimal eigenvalues of

A

, respectively.

Implementation in GNU Octave / MATLAB

function[x, r] = minres(A, b, x0, maxit, tol)x=x0;r=b-A*x0;p0=r;s0=A*p0;p1=p0;s1=s0;foriter=[1:maxit]p2=p1;p1=p0;s2=s1;s1=s0;alpha=r'*s1/(s1'*s1);x+=alpha*p1;r-=alpha*s1;if(r'*r<tol^2)breakendp0=s1;s0=A*s1;beta1=s0'*s1/(s1'*s1);p0-=beta1*p1;s0-=beta1*s1;ifiter>1beta2=s0'*s2/(s2'*s2);p0-=beta2*p2;s0-=beta2*s2;endendend

Related Research Articles

In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the n-th approximation is derived from the previous ones. A specific implementation of an iterative method, including the termination criteria, is an algorithm of the iterative method. An iterative method is called convergent if the corresponding sequence converges for given initial approximations. A mathematically rigorous convergence analysis of an iterative method is usually performed; however, heuristic-based iterative methods are also common.

In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process is a method for orthonormalizing a set of vectors in an inner product space, most commonly the Euclidean space $R n$ equipped with the standard inner product. The Gram–Schmidt process takes a finite, linearly independent set of vectors $S = {v 1, ..., v k}$ for $k \leq n$ and generates an orthogonal set $S' = {u 1, ..., u k}$ that spans the same k-dimensional subspace of Rⁿ as S.

In Riemannian geometry, the sectional curvature is one of the ways to describe the curvature of Riemannian manifolds with dimension greater than 1. The sectional curvature K(σ_p) depends on a two-dimensional linear subspace σ_p of the tangent space at a point p of the manifold. It can be defined geometrically as the Gaussian curvature of the surface which has the plane σ_p as a tangent plane at p, obtained from geodesics which start at p in the directions of σ_p. The sectional curvature is a real-valued function on the 2-Grassmannian bundle over the manifold.

In physics, a lattice model is a physical model that is defined on a lattice, as opposed to the continuum of space or spacetime. Lattice models originally occurred in the context of condensed matter physics, where the atoms of a crystal automatically form a lattice. Currently, lattice models are quite popular in theoretical physics, for many reasons. Some models are exactly solvable, and thus offer insight into physics beyond what can be learned from perturbation theory. Lattice models are also ideal for study by the methods of computational physics, as the discretization of any continuum model automatically turns it into a lattice model. The exact solution to many of these models includes the presence of solitons. Techniques for solving these include the inverse scattering transform and the method of Lax pairs, the Yang–Baxter equation and quantum groups. The solution of these models has given insights into the nature of phase transitions, magnetization and scaling behaviour, as well as insights into the nature of quantum field theory. Physical lattice models frequently occur as an approximation to a continuum theory, either to give an ultraviolet cutoff to the theory to prevent divergences or to perform numerical computations. An example of a continuum theory that is widely studied by lattice models is the QCD lattice model, a discretization of quantum chromodynamics. However, digital physics considers nature fundamentally discrete at the Planck scale, which imposes upper limit to the density of information, aka Holographic principle. More generally, lattice gauge theory and lattice field theory are areas of study. Lattice models are also used to simulate the structure and dynamics of polymers.

In mathematics, the Rayleigh quotient for a given complex Hermitian matrix M and nonzero vector x is defined as:

In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces. It can be viewed as the starting point of many results of similar nature.

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the sum, and thus minimizing the sum. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

The Kaczmarz method or Kaczmarz's algorithm is an iterative algorithm for solving linear equation systems $. It was first discovered by the Polish mathematician Stefan Kaczmarz, and was rediscovered in the field of image reconstruction from projections by Richard Gordon, Robert Bender, and Gabor Herman in 1970, where it is called the Algebraic Reconstruction Technique (ART). ART includes the positivity constraint, making it nonlinear.$

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

In the mathematical field of set theory, the proper forcing axiom (PFA) is a significant strengthening of Martin's axiom, where forcings with the countable chain condition (ccc) are replaced by proper forcings.

In computational chemistry, a constraint algorithm is a method for satisfying the Newtonian motion of a rigid body which consists of mass points. A restraint algorithm is used to ensure that the distance between mass points is maintained. The general steps involved are: (i) choose novel unconstrained coordinates, (ii) introduce explicit constraint forces, (iii) minimize constraint forces implicitly by the technique of Lagrange multipliers or projection methods.

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem. It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.

Given a Hilbert space with a tensor product structure a product numerical range is defined as a numerical range with respect to the subset of product vectors. In some situations, especially in the context of quantum mechanics product numerical range is known as local numerical range

For computer science, in statistical learning theory, a representer theorem is any of several related results stating that a minimizer $of a regularized empirical risk functional defined over a reproducing kernel Hilbert space can be represented as a finite linear combination of kernel products evaluated on the input points in the training set data.$

Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.

In probability theory and directional statistics, a wrapped asymmetric Laplace distribution is a wrapped probability distribution that results from the "wrapping" of the asymmetric Laplace distribution around the unit circle. For the symmetric case, the distribution becomes a wrapped Laplace distribution. The distribution of the ratio of two circular variates (Z) from two different wrapped exponential distributions will have a wrapped asymmetric Laplace distribution. These distributions find application in stochastic modelling of financial data.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. The power of quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

Tau functions are an important ingredient in the modern theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form. The term Tau function, or $-function$ , was first used systematically by Mikio Sato and his students in the specific context of the Kadomtsev–Petviashvili equation, and related integrable hierarchies. It is a central ingredient in the theory of solitons. Tau functions also appear as matrix model partition functions in the spectral theory of Random Matrices, and may also serve as generating functions, in the sense of combinatorics and enumerative geometry, especially in relation to moduli spaces of Riemann surfaces, and enumeration of branched coverings, or so-called Hurwitz numbers.

References

1 2 Christopher C. Paige, Michael A. Saunders (1975). "Solution of sparse indefinite systems of linear equations". SIAM Journal on Numerical Analysis. 12 (4).
↑ Nifa, M. Naoufal. "Effcient solvers for constrained optimization in parameter identification problems" (PDF) (Doctoral Thesis). pp. 51–52.{{cite web}}: CS1 maint: url-status (link)
↑ Sven Gross, Arnold Reusken. Numerical Methods for Two-phase Incompressible Flows. section 5.2: Springer. ISBN 978-3-642-19685-0.{{cite book}}: CS1 maint: location (link)

External links

Minimal Residual Method, Wolfram MathWorld, Jul 26, 2022.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[PS-1] 1 2 Christopher C. Paige, Michael A. Saunders (1975). "Solution of sparse indefinite systems of linear equations". SIAM Journal on Numerical Analysis. 12 (4).

[2] Nifa, M. Naoufal. "Effcient solvers for constrained optimization in parameter identification problems" (PDF) (Doctoral Thesis). pp. 51–52.{{cite web}}: CS1 maint: url-status (link)

[3] Sven Gross, Arnold Reusken. Numerical Methods for Two-phase Incompressible Flows. section 5.2: Springer. ISBN 978-3-642-19685-0.{{cite book}}: CS1 maint: location (link)

[1]

[2]

[3]