Symmetric rank-one

Last updated

The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This update maintains the symmetry of the matrix but does not guarantee that the update be positive definite.

Contents

The sequence of Hessian approximations generated by the SR1 method converges to the true Hessian under mild conditions, in theory; in practice, the approximate Hessians generated by the SR1 method show faster progress towards the true Hessian than do popular alternatives (BFGS or DFP), in preliminary numerical experiments. [1] [2] The SR1 method has computational advantages for sparse or partially separable problems. [3]

A twice continuously differentiable function has a gradient () and Hessian matrix : The function has an expansion as a Taylor series at , which can be truncated

;

its gradient has a Taylor-series approximation also

,

which is used to update . The above secant-equation need not have a unique solution . The SR1 formula computes (via an update of rank 1) the symmetric solution that is closest[ further explanation needed ] to the current approximate-value :

,

where

.

The corresponding update to the approximate inverse-Hessian is

.

One might wonder why positive-definiteness is not preserved — after all, a rank-1 update of the form is positive-definite if is. The explanation is that the update might be of the form instead because the denominator can be negative, and in that case there are no guarantees about positive-definiteness.

The SR1 formula has been rediscovered a number of times. Since the denominator can vanish, some authors have suggested that the update be applied only if

,

where is a small number, e.g. . [4]

Limited Memory

The SR1 update maintains a dense matrix, which can be prohibitive for large problems. Similar to the L-BFGS method also a limited-memory SR1 (L-SR1) algorithm exists. [5] Instead of storing the full Hessian approximation, a L-SR1 method only stores the most recent pairs , where and is an integer much smaller than the problem size (). The limited-memory matrix is

Since the update can be indefinite, the L-SR1 algorithm is suitable for a trust-region strategy. Because of the limited-memory matrix, the trust-region L-SR1 algorithm scales linearly with the problem size, just like L-BFGS.

See also

Related Research Articles

A finite difference is a mathematical expression of the form f (x + b) − f (x + a). If a finite difference is divided by ba, one gets a difference quotient. The approximation of derivatives by finite differences plays a central role in finite difference methods for the numerical solution of differential equations, especially boundary value problems.

In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols , (where is the nabla operator), or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δf (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p).

<span class="mw-page-title-main">Gradient descent</span> Optimization algorithm

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.

In numerical analysis, polynomial interpolation is the interpolation of a given bivariate data set by the polynomial of lowest possible degree that passes through the points of the dataset.

<span class="mw-page-title-main">Hooke's law</span> Physical law: force needed to deform a spring scales linearly with distance

In physics, Hooke's law is an empirical law which states that the force needed to extend or compress a spring by some distance scales linearly with respect to that distance—that is, Fs = kx, where k is a constant factor characteristic of the spring, and x is small compared to the total possible deformation of the spring. The law is named after 17th-century British physicist Robert Hooke. He first stated the law in 1676 as a Latin anagram. He published the solution of his anagram in 1678 as: ut tensio, sic vis. Hooke states in the 1678 work that he was aware of the law since 1660.

In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇2.

In geometry and algebra, the triple product is a product of three 3-dimensional vectors, usually Euclidean vectors. The name "triple product" is used for two different products, the scalar-valued scalar triple product and, less often, the vector-valued vector triple product.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

Mehrotra's predictor–corrector method in optimization is a specific interior point method for linear programming. It was proposed in 1989 by Sanjay Mehrotra.

In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.

In continuum mechanics, the finite strain theory—also called large strain theory, or large deformation theory—deals with deformations in which strains and/or rotations are large enough to invalidate assumptions inherent in infinitesimal strain theory. In this case, the undeformed and deformed configurations of the continuum are significantly different, requiring a clear distinction between them. This is commonly the case with elastomers, plastically deforming materials and other fluids and biological soft tissue.

<span class="mw-page-title-main">Corner detection</span> Approach used in computer vision systems

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

Limited-memory BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimize over unconstrained values of the real-vector where is a differentiable scalar function.

Quasi-Newton methods are methods used to find either zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. The "full" Newton's method requires the Jacobian in order to search for zeros, or the Hessian for finding extrema. Some iterative methods that reduce to Newton's method, such as SLSQP, may be considered quasi-Newtonian.

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function

The Davidon–Fletcher–Powell formula finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix.

The streamline upwind Petrov–Galerkin pressure-stabilizing Petrov–Galerkin formulation for incompressible Navier–Stokes equations can be used for finite element computations of high Reynolds number incompressible flow using equal order of finite element space by introducing additional stabilization terms in the Navier–Stokes Galerkin formulation.

References

  1. Conn, A. R.; Gould, N. I. M.; Toint, Ph. L. (March 1991). "Convergence of quasi-Newton matrices generated by the symmetric rank one update". Mathematical Programming. 50 (1). Springer Berlin/ Heidelberg: 177–195. doi:10.1007/BF01594934. ISSN   0025-5610. S2CID   28028770.
  2. Khalfan, H. Fayez; et al. (1993). "A Theoretical and Experimental Study of the Symmetric Rank-One Update". SIAM Journal on Optimization. 3 (1): 1–24. doi:10.1137/0803001.
  3. Byrd, Richard H.; et al. (1996). "Analysis of a Symmetric Rank-One Trust Region Method". SIAM Journal on Optimization. 6 (4): 1025–1039. doi:10.1137/S1052623493252985.
  4. Nocedal, Jorge; Wright, Stephen J. (1999). Numerical Optimization. Springer. ISBN   0-387-98793-2.
  5. Brust, J.; et al. (2017). "On solving L-SR1 trust-region subproblems". Computational Optimization and Applications. 66: 245–266. doi:10.1007/s10589-016-9868-3.