Symmetric rank-one

Last updated October 14, 2024

The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This update maintains the symmetry of the matrix but does not guarantee that the update be positive definite.

The sequence of Hessian approximations generated by the SR1 method converges to the true Hessian under mild conditions, in theory; in practice, the approximate Hessians generated by the SR1 method show faster progress towards the true Hessian than do popular alternatives (BFGS or DFP), in preliminary numerical experiments.^[1]^[2] The SR1 method has computational advantages for sparse or partially separable problems.^[3]

A twice continuously differentiable function $x\mapsto f(x)$ has a gradient ( $\nabla f$ ) and Hessian matrix $B$ : The function $f$ has an expansion as a Taylor series at $x_{0}$ , which can be truncated

f(x_{0}+\Delta x)\approx f(x_{0})+\nabla f(x_{0})^{T}\Delta x+{\frac {1}{2}}\Delta x^{T}{B}\Delta x

;

its gradient has a Taylor-series approximation also

\nabla f(x_{0}+\Delta x)\approx \nabla f(x_{0})+B\Delta x

,

which is used to update $B$ . The above secant-equation need not have a unique solution $B$ . The SR1 formula computes (via an update of rank 1) the symmetric solution that is closest^{[ further explanation needed ]} to the current approximate-value $B_{k}$ :

B_{k+1}=B_{k}+{\frac {(y_{k}-B_{k}\Delta x_{k})(y_{k}-B_{k}\Delta x_{k})^{T}}{(y_{k}-B_{k}\Delta x_{k})^{T}\Delta x_{k}}}

,

where

y_{k}=\nabla f(x_{k}+\Delta x_{k})-\nabla f(x_{k})

.

The corresponding update to the approximate inverse-Hessian $H_{k}=B_{k}^{-1}$ is

H_{k+1}=H_{k}+{\frac {(\Delta x_{k}-H_{k}y_{k})(\Delta x_{k}-H_{k}y_{k})^{T}}{(\Delta x_{k}-H_{k}y_{k})^{T}y_{k}}}

.

One might wonder why positive-definiteness is not preserved — after all, a rank-1 update of the form $B_{k+1}=B_{k}+vv^{T}$ is positive-definite if $B_{k}$ is. The explanation is that the update might be of the form $B_{k+1}=B_{k}-vv^{T}$ instead because the denominator can be negative, and in that case there are no guarantees about positive-definiteness.

The SR1 formula has been rediscovered a number of times. Since the denominator can vanish, some authors have suggested that the update be applied only if

|\Delta x_{k}^{T}(y_{k}-B_{k}\Delta x_{k})|\geq r\|\Delta x_{k}\|\cdot \|y_{k}-B_{k}\Delta x_{k}\|

,

where $r\in (0,1)$ is a small number, e.g. $10^{-8}$ .^[4]

Limited Memory

The SR1 update maintains a dense matrix, which can be prohibitive for large problems. Similar to the L-BFGS method also a limited-memory SR1 (L-SR1) algorithm exists.^[5] Instead of storing the full Hessian approximation, a L-SR1 method only stores the $m$ most recent pairs $\{(s_{i},y_{i})\}_{i=k-m}^{k-1}$ , where $\Delta x_{i}:=s_{i}$ and $m$ is an integer much smaller than the problem size ( $m\ll n$ ). The limited-memory matrix is based on a compact matrix representation

$B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},\quad J_{k}=Y_{k}-B_{0}S_{k},\quad N_{k}=D_{k}+L_{k}+L_{k}^{T}-S_{k}^{T}B_{0}S_{k}$

$S_{k}={\begin{bmatrix}s_{k-m}&s_{k-m+1}&\ldots &s_{k-1}\end{bmatrix}},$ $Y_{k}={\begin{bmatrix}y_{k-m}&y_{k-m+1}&\ldots &y_{k-1}\end{bmatrix}},$

${\big (}L_{k}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad D_{k}=s_{i-1}^{T}y_{i-1},\quad k-m\leq i\leq k-1$

Since the update can be indefinite, the L-SR1 algorithm is suitable for a trust-region strategy. Because of the limited-memory matrix, the trust-region L-SR1 algorithm scales linearly with the problem size, just like L-BFGS.

Related Research Articles

A finite difference is a mathematical expression of the form $f (x + b) - f (x + a)$ . If a finite difference is divided by $b - a$ , one gets a difference quotient. The approximation of derivatives by finite differences plays a central role in finite difference methods for the numerical solution of differential equations, especially boundary value problems.

<span class="mw-page-title-main">Moment of inertia</span> Scalar measure of the rotational inertia with respect to a fixed axis of rotation

The moment of inertia, otherwise known as the mass moment of inertia, angular/rotational mass, second moment of mass, or most accurately, rotational inertia, of a rigid body is defined relative to a rotational axis. It is the ratio between the torque applied and the resulting angular acceleration about that axis. It plays the same role in rotational motion as mass does in linear motion. A body's moment of inertia about a particular axis depends both on the mass and its distribution relative to the axis, increasing with mass & distance from the axis.

In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols $, (where is the nabla operator), or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δ f (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p) .$

In numerical analysis, polynomial interpolation is the interpolation of a given bivariate data set by the polynomial of lowest possible degree that passes through the points of the dataset.

<span class="mw-page-title-main">Hooke's law</span> Physical law: force needed to deform a spring scales linearly with distance

In physics, Hooke's law is an empirical law which states that the force needed to extend or compress a spring by some distance scales linearly with respect to that distance—that is, $F s = kx$ , where $k$ is a constant factor characteristic of the spring, and $x$ is small compared to the total possible deformation of the spring. The law is named after 17th-century British physicist Robert Hooke. He first stated the law in 1676 as a Latin anagram. He published the solution of his anagram in 1678 as: ut tensio, sic vis. Hooke states in the 1678 work that he was aware of the law since 1660.

Linear elasticity is a mathematical model as to how solid objects deform and become internally stressed by prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇².

In geometry and algebra, the triple product is a product of three 3-dimensional vectors, usually Euclidean vectors. The name "triple product" is used for two different products, the scalar-valued scalar triple product and, less often, the vector-valued vector triple product.

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

Mehrotra's predictor–corrector method in optimization is a specific interior point method for linear programming. It was proposed in 1989 by Sanjay Mehrotra.

In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.

<span class="mw-page-title-main">Corner detection</span> Approach used in computer vision systems

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.

Limited-memory BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimize $over unconstrained values of the real-vector where is a differentiable scalar function.$

In numerical analysis, a quasi-Newton method is an iterative numerical method used either to find zeroes or to find local maxima and minima of functions via an iterative recurrence formula much like the one for Newton's method, except using approximations of the derivatives of the functions in place of exact derivatives. Newton's method requires the Jacobian matrix of all partial derivatives of a multivariate function when used to search for zeros or the Hessian matrix when used for finding extrema. Quasi-Newton methods, on the other hand, can be used when the Jacobian matrices or Hessian matrices are unavailable or are impractical to compute at every iteration.

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function

The Davidon–Fletcher–Powell formula finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ( $).$

In continuum mechanics, the strain-rate tensor or rate-of-strain tensor is a physical quantity that describes the rate of change of the strain of a material in the neighborhood of a certain point, at a certain moment of time. It can be defined as the derivative of the strain tensor with respect to time, or as the symmetric component of the Jacobian matrix of the flow velocity. In fluid mechanics it also can be described as the velocity gradient, a measure of how the velocity of a fluid changes between different points within the fluid. Though the term can refer to a velocity profile, it is often used to mean the gradient of a flow's velocity with respect to its coordinates. The concept has implications in a variety of areas of physics and engineering, including magnetohydrodynamics, mining and water treatment.

The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.

References

↑ Conn, A. R.; Gould, N. I. M.; Toint, Ph. L. (March 1991). "Convergence of quasi-Newton matrices generated by the symmetric rank one update". Mathematical Programming. 50 (1). Springer Berlin/ Heidelberg: 177–195. doi:10.1007/BF01594934. ISSN 0025-5610. S2CID 28028770.
↑ Khalfan, H. Fayez; et al. (1993). "A Theoretical and Experimental Study of the Symmetric Rank-One Update". SIAM Journal on Optimization. 3 (1): 1–24. doi:10.1137/0803001.
↑ Byrd, Richard H.; et al. (1996). "Analysis of a Symmetric Rank-One Trust Region Method". SIAM Journal on Optimization. 6 (4): 1025–1039. doi:10.1137/S1052623493252985.
↑ Nocedal, Jorge; Wright, Stephen J. (1999). Numerical Optimization. Springer. ISBN 0-387-98793-2.
↑ Brust, J.; et al. (2017). "On solving L-SR1 trust-region subproblems". Computational Optimization and Applications. 66: 245–266. doi:10.1007/s10589-016-9868-3.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[CGT-1] Conn, A. R.; Gould, N. I. M.; Toint, Ph. L. (March 1991). "Convergence of quasi-Newton matrices generated by the symmetric rank one update". Mathematical Programming. 50 (1). Springer Berlin/ Heidelberg: 177–195. doi:10.1007/BF01594934. ISSN 0025-5610. S2CID 28028770.

[2] Khalfan, H. Fayez; et al. (1993). "A Theoretical and Experimental Study of the Symmetric Rank-One Update". SIAM Journal on Optimization. 3 (1): 1–24. doi:10.1137/0803001.

[3] Byrd, Richard H.; et al. (1996). "Analysis of a Symmetric Rank-One Trust Region Method". SIAM Journal on Optimization. 6 (4): 1025–1039. doi:10.1137/S1052623493252985.

[4] Nocedal, Jorge; Wright, Stephen J. (1999). Numerical Optimization. Springer. ISBN 0-387-98793-2.

[bem17-5] Brust, J.; et al. (2017). "On solving L-SR1 trust-region subproblems". Computational Optimization and Applications. 66: 245–266. doi:10.1007/s10589-016-9868-3.

[1]

[2]

[3]

[4]

[5]

Symmetric rank-one

Contents

Limited Memory

See also

Related Research Articles

References