Minimal residual method

Last updated November 13, 2024

The Minimal Residual Method or MINRES is a Krylov subspace method for the iterative solution of symmetric linear equation systems. It was proposed by mathematicians Christopher Conway Paige and Michael Alan Saunders in 1975.^[1]

GMRES vs. MINRES

The GMRES method is essentially a generalization of MINRES for arbitrary matrices. Both minimize the 2-norm of the residual and do the same calculations in exact arithmetic when the matrix is symmetric. MINRES is a short-recurrence method with a constant memory requirement, whereas GMRES requires storing the whole Krylov space, so its memory requirement is roughly proportional to the number of iterations. On the other hand, GMRES tends to suffer less from loss of orthogonality.^[1]^[2]

Properties of the MINRES method

The MINRES method iteratively calculates an approximate solution of a linear system of equations of the form $Ax=b,$ where $A\in \mathbb {R} ^{n\times n}$ is a symmetric matrix and $b\in \mathbb {R} ^{n}$ a vector.

For this, the norm of the residual $r(x):=b-Ax$ in a $k$ -dimensional Krylov subspace $V_{k}=x_{0}+\operatorname {span} \{r_{0},Ar_{0}\ldots ,A^{k-1}r_{0}\}$ is minimized. Here $x_{0}\in \mathbb {R} ^{n}$ is an initial value (often zero) and $r_{0}:=r(x_{0})$ .

More precisely, we define the approximate solutions $x_{k}$ through $x_{k}:=\mathrm {argmin} _{x\in V_{k}}\|r(x)\|,$ where $\|\cdot \|$ is the standard Euclidean norm on $\mathbb {R} ^{n}$ .

Because of the symmetry of $A$ , unlike in the GMRES method, it is possible to carry out this minimization process recursively, storing only two previous steps (short recurrence). This saves memory.

MINRES algorithm

Note: The MINRES method is more complicated than the algebraically equivalent Conjugate Residual method. The Conjugate Residual (CR) method was therefore produced below as a substitute. It differs from MINRES in that in MINRES, the columns of a basis of the Krylov space (denoted below by $p_{k}$ ) can be orthogonalized, whereas in CR their images (below labeled with $s_{k}$ ) can be orthogonalized via the Lanczos recursion. There are more efficient and preconditioned variants with fewer AXPYs. Compare with the article.

First you choose $x_{0}\in \mathbb {R} ^{n}$ arbitrary and compute ${\begin{aligned}r_{0}&=b-Ax_{0}\\p_{0}&=r_{0}\\s_{0}&=Ap_{0}\end{aligned}}$

Then we iterate for $k=1,2,\dots$ in the following steps:

Compute $x_{k},r_{k}$ through $\alpha _{k-1}={\frac {\langle r_{k-1},s_{k-1}\rangle }{\langle s_{k-1},s_{k-1}\rangle }}$ $x_{k}=x_{k-1}+\alpha _{k-1}p_{k-1}$ $r_{k}=r_{k-1}-\alpha _{k-1}s_{k-1}$ if $\|r_{k}\|$ is smaller than a specified tolerance, the algorithm is interrupted with the approximate solution $x_{k}$ . Otherwise, a new descent direction $p_{k}$ is calculated through $p_{k}\leftarrow s_{k-1}$ $s_{k}\leftarrow As_{k-1}$
for $l=1,2$ (the step $l=2$ is not carried out in the first iteration step) calculate: $\beta _{k,l}={\frac {\langle s_{k},s_{k-l}\rangle }{\langle s_{k-l},s_{k-l}\rangle }}$ $p_{k}\leftarrow p_{k}-\beta _{k,l}p_{k-l}$ $s_{k}\leftarrow s_{k}-\beta _{k,l}s_{k-l}$

Convergence rate of the MINRES method

In the case of positive definite matrices, the convergence rate of the MINRES method can be estimated in a way similar to that of the CG method.^[3] In contrast to the CG method, however, the estimation does not apply to the errors of the iterates, but to the residual. The following applies:

$\|r_{k}\|\leq 2\left({\frac {{\sqrt {\kappa (A)}}-1}{{\sqrt {\kappa (A)}}+1}}\right)^{k}\|r_{0}\|,$

where $\kappa (A)$ is the condition number of matrix $A$ . Because $A$ is normal, we have $\kappa (A)={\frac {\left|\lambda _{\text{max}}(A)\right|}{\left|\lambda _{\text{min}}(A)\right|}},$ where $\lambda _{\text{max}}(A)$ and $\lambda _{\text{min}}(A)$ are maximal and minimal eigenvalues of $A$ , respectively.

Implementation in GNU Octave / MATLAB

function[x, r] = minres(A, b, x0, maxit, tol)x=x0;r=b-A*x0;p0=r;s0=A*p0;p1=p0;s1=s0;foriter=1:maxitp2=p1;p1=p0;s2=s1;s1=s0;alpha=r'*s1/(s1'*s1);x=x+alpha*p1;r=r-alpha*s1;if(r'*r<tol^2)breakendp0=s1;s0=A*s1;beta1=s0'*s1/(s1'*s1);p0=p0-beta1*p1;s0=s0-beta1*s1;ifiter>1beta2=s0'*s2/(s2'*s2);p0=p0-beta2*p2;s0=s0-beta2*s2;endendend

Related Research Articles

<span class="mw-page-title-main">Inner product space</span> Generalization of the dot product; used to define Hilbert spaces

In mathematics, an inner product space is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often denoted with angle brackets such as in $. Inner products allow formal definitions of intuitive geometric notions, such as lengths, angles, and orthogonality of vectors. Inner product spaces generalize Euclidean vector spaces, in which the inner product is the dot product or scalar product of Cartesian coordinates. Inner product spaces of infinite dimension are widely used in functional analysis. Inner product spaces over the field of complex numbers are sometimes referred to as unitary spaces . The first usage of the concept of a vector space with an inner product is due to Giuseppe Peano, in 1898.$

In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the i-th approximation is derived from the previous ones.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

In Riemannian geometry, the sectional curvature is one of the ways to describe the curvature of Riemannian manifolds. The sectional curvature K(σ_p) depends on a two-dimensional linear subspace σ_p of the tangent space at a point p of the manifold. It can be defined geometrically as the Gaussian curvature of the surface which has the plane σ_p as a tangent plane at p, obtained from geodesics which start at p in the directions of σ_p. The sectional curvature is a real-valued function on the 2-Grassmannian bundle over the manifold.

Differential geometry of curves is the branch of geometry that deals with smooth curves in the plane and the Euclidean space by methods of differential and integral calculus.

In mathematics, the Radon transform is the integral transform which takes a function f defined on the plane to a function Rf defined on the (two-dimensional) space of lines in the plane, whose value at a particular line is equal to the line integral of the function over that line. The transform was introduced in 1917 by Johann Radon, who also provided a formula for the inverse transform. Radon further included formulas for the transform in three dimensions, in which the integral is taken over planes. It was later generalized to higher-dimensional Euclidean spaces and more broadly in the context of integral geometry. The complex analogue of the Radon transform is known as the Penrose transform. The Radon transform is widely applicable to tomography, the creation of an image from the projection data associated with cross-sectional scans of an object.

The density matrix renormalization group (DMRG) is a numerical variational technique devised to obtain the low-energy physics of quantum many-body systems with high accuracy. As a variational method, DMRG is an efficient algorithm that attempts to find the lowest-energy matrix product state wavefunction of a Hamiltonian. It was invented in 1992 by Steven R. White and it is nowadays the most efficient method for 1-dimensional systems.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In optimization, a descent direction is a vector $that points towards a local minimum of an objective function .$

In linear algebra, the order-rKrylov subspace generated by an n-by-n matrix A and a vector b of dimension n is the linear subspace spanned by the images of b under the first r powers of A, that is,

Harmonic balance is a method used to calculate the steady-state response of nonlinear differential equations, and is mostly applied to nonlinear electrical circuits. It is a frequency domain method for calculating the steady state, as opposed to the various time-domain steady-state methods. The name "harmonic balance" is descriptive of the method, which starts with Kirchhoff's Current Law written in the frequency domain and a chosen number of harmonics. A sinusoidal signal applied to a nonlinear component in a system will generate harmonics of the fundamental frequency. Effectively the method assumes a linear combination of sinusoids can represent the solution, then balances current and voltage sinusoids to satisfy Kirchhoff's law. The method is commonly used to simulate circuits which include nonlinear elements, and is most applicable to systems with feedback in which limit cycles occur.

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

In arithmetic, a complex-base system is a positional numeral system whose radix is an imaginary or complex number.

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

In cryptography, learning with errors (LWE) is a mathematical problem that is widely used to create secure encryption algorithms. It is based on the idea of representing secret information as a set of equations with errors. In other words, LWE is a way to hide the value of a secret by introducing noise to it. In more technical terms, it refers to the computational problem of inferring a linear $-ary function over a finite ring from given samples some of which may be erroneous. The LWE problem is conjectured to be hard to solve, and thus to be useful in cryptography.$

Coherent states have been introduced in a physical context, first as quasi-classical states in quantum mechanics, then as the backbone of quantum optics and they are described in that spirit in the article Coherent states. However, they have generated a huge variety of generalizations, which have led to a tremendous amount of literature in mathematical physics. In this article, we sketch the main directions of research on this line. For further details, we refer to several existing surveys.

In the mathematical subject of group theory, a one-relator group is a group given by a group presentation with a single defining relation. One-relator groups play an important role in geometric group theory by providing many explicit examples of finitely presented groups.

In mathematics, Anderson acceleration, also called Anderson mixing, is a method for the acceleration of the convergence rate of fixed-point iterations. Introduced by Donald G. Anderson, this technique can be used to find the solution to fixed point equations $often arising in the field of computational science.$

Hamiltonian truncation is a numerical method used to study quantum field theories (QFTs) in $spacetime dimensions. Hamiltonian truncation is an adaptation of the Rayleigh-Ritz method from quantum mechanics. It is closely related to the exact diagonalization method used to treat spin systems in condensed matter physics. The method is typically used to study QFTs on spacetimes of the form, specifically to compute the spectrum of the Hamiltonian along . A key feature of Hamiltonian truncation is that an explicit ultraviolet cutoff is introduced, akin to the lattice spacing a in lattice Monte Carlo methods. Since Hamiltonian truncation is a nonperturbative method, it can be used to study strong-coupling phenomena like spontaneous symmetry breaking.$

References

1 2 Christopher C. Paige, Michael A. Saunders (1975). "Solution of sparse indefinite systems of linear equations". SIAM Journal on Numerical Analysis. 12 (4): 617–629. doi:10.1137/0712047.
↑ Nifa, M. Naoufal. "Efficient solvers for constrained optimization in parameter identification problems" (PDF) (Doctoral Thesis). pp. 51–52.
↑ Sven Gross, Arnold Reusken. Numerical Methods for Two-phase Incompressible Flows. section 5.2: Springer. ISBN 978-3-642-19685-0.{{cite book}}: CS1 maint: location (link)

External links

Minimal Residual Method, Wolfram MathWorld, Jul 26, 2022.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[PS-1] 1 2 Christopher C. Paige, Michael A. Saunders (1975). "Solution of sparse indefinite systems of linear equations". SIAM Journal on Numerical Analysis. 12 (4): 617–629. doi:10.1137/0712047.

[2] Nifa, M. Naoufal. "Efficient solvers for constrained optimization in parameter identification problems" (PDF) (Doctoral Thesis). pp. 51–52.

[3] Sven Gross, Arnold Reusken. Numerical Methods for Two-phase Incompressible Flows. section 5.2: Springer. ISBN 978-3-642-19685-0.{{cite book}}: CS1 maint: location (link)

[1]

[2]

[3]