Broyden's method

Last updated November 22, 2023

In numerical analysis, Broyden's method is a quasi-Newton method for finding roots in $k$ variables. It was originally described by C. G. Broyden in 1965.^[1]

Newton's method for solving $f (x) = 0$ uses the Jacobian matrix, $J$ , at every iteration. However, computing this Jacobian is a difficult and expensive operation. The idea behind Broyden's method is to compute the whole Jacobian only at the first iteration and to do rank-one updates at other iterations.

In 1979 Gay proved that when Broyden's method is applied to a linear system of size $n \times n$ , it terminates in $2 n$ steps,^[2] although like all quasi-Newton methods, it may not converge for nonlinear systems.

Description of the method

Solving single-variable equation

In the secant method, we replace the first derivative $f'$ at $x n$ with the finite-difference approximation:

f'(x_{n})\simeq {\frac {f(x_{n})-f(x_{n-1})}{x_{n}-x_{n-1}}},

and proceed similar to Newton's method:

x_{n+1}=x_{n}-{\frac {f(x_{n})}{f^{\prime }(x_{n})}}

where $n$ is the iteration index.

Solving a system of nonlinear equations

Consider a system of $k$ nonlinear equations

\mathbf {f} (\mathbf {x} )=\mathbf {0} ,

where $f$ is a vector-valued function of vector $x$ :

\mathbf {x} =(x_{1},x_{2},x_{3},\dotsc ,x_{k}),

\mathbf {f} (\mathbf {x} )={\big (}f_{1}(x_{1},x_{2},\dotsc ,x_{k}),f_{2}(x_{1},x_{2},\dotsc ,x_{k}),\dotsc ,f_{k}(x_{1},x_{2},\dotsc ,x_{k}){\big )}.

For such problems, Broyden gives a generalization of the one-dimensional Newton's method, replacing the derivative with the Jacobian $J$ . The Jacobian matrix is determined iteratively, based on the secant equation in the finite-difference approximation:

\mathbf {J} _{n}(\mathbf {x} _{n}-\mathbf {x} _{n-1})\simeq \mathbf {f} (\mathbf {x} _{n})-\mathbf {f} (\mathbf {x} _{n-1}),

where $n$ is the iteration index. For clarity, let us define:

\mathbf {f} _{n}=\mathbf {f} (\mathbf {x} _{n}),

\Delta \mathbf {x} _{n}=\mathbf {x} _{n}-\mathbf {x} _{n-1},

\Delta \mathbf {f} _{n}=\mathbf {f} _{n}-\mathbf {f} _{n-1},

so the above may be rewritten as

\mathbf {J} _{n}\Delta \mathbf {x} _{n}\simeq \Delta \mathbf {f} _{n}.

The above equation is underdetermined when $k$ is greater than one. Broyden suggests using the current estimate of the Jacobian matrix $J n -1$ and improving upon it by taking the solution to the secant equation that is a minimal modification to $J n -1$ :

\mathbf {J} _{n}=\mathbf {J} _{n-1}+{\frac {\Delta \mathbf {f} _{n}-\mathbf {J} _{n-1}\Delta \mathbf {x} _{n}}{\|\Delta \mathbf {x} _{n}\|^{2}}}\Delta \mathbf {x} _{n}^{\mathrm {T} }.

This minimizes the following Frobenius norm:

\|\mathbf {J} _{n}-\mathbf {J} _{n-1}\|_{\rm {F}}.

We may then proceed in the Newton direction:

\mathbf {x} _{n+1}=\mathbf {x} _{n}-\mathbf {J} _{n}^{-1}\mathbf {f} (\mathbf {x} _{n}).

Broyden also suggested using the Sherman–Morrison formula to update directly the inverse of the Jacobian matrix:

\mathbf {J} _{n}^{-1}=\mathbf {J} _{n-1}^{-1}+{\frac {\Delta \mathbf {x} _{n}-\mathbf {J} _{n-1}^{-1}\Delta \mathbf {f} _{n}}{\Delta \mathbf {x} _{n}^{\mathrm {T} }\mathbf {J} _{n-1}^{-1}\Delta \mathbf {f} _{n}}}\Delta \mathbf {x} _{n}^{\mathrm {T} }\mathbf {J} _{n-1}^{-1}.

This first method is commonly known as the "good Broyden's method".

A similar technique can be derived by using a slightly different modification to $J n -1$ . This yields a second method, the so-called "bad Broyden's method" (but see^[3]):

\mathbf {J} _{n}^{-1}=\mathbf {J} _{n-1}^{-1}+{\frac {\Delta \mathbf {x} _{n}-\mathbf {J} _{n-1}^{-1}\Delta \mathbf {f} _{n}}{\|\Delta \mathbf {f} _{n}\|^{2}}}\Delta \mathbf {f} _{n}^{\mathrm {T} }.

This minimizes a different Frobenius norm:

\|\mathbf {J} _{n}^{-1}-\mathbf {J} _{n-1}^{-1}\|_{\rm {F}}.

Many other quasi-Newton schemes have been suggested in optimization, where one seeks a maximum or minimum by finding the root of the first derivative (gradient in multiple dimensions). The Jacobian of the gradient is called Hessian and is symmetric, adding further constraints to its update.

Related Methods

In addition to the two methods described above, Broyden a whole class of related methods.^[1]^: 578 Other methods in this class have been introduced by other authors.

The Davidon–Fletcher–Powell update is the only member of this class being published before the two members defined by Broyden.^[1]^: 582
Schubert's or sparse Broyden algorithm – a modification for sparse Jacobian matrices.^[4]
Klement (2014) – uses fewer iterations to solve many equation systems.^[5]^[6]

Related Research Articles

In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots of a real-valued function. The most basic version starts with a real-valued function $f$ , its derivative $f'$ , and an initial guess $x 0$ for a root of $f$ . If $f$ satisfies certain assumptions and the initial guess is close, then

<span class="mw-page-title-main">Lyapunov exponent</span> The rate of separation of infinitesimally close trajectories

In mathematics, the Lyapunov exponent or Lyapunov characteristic exponent of a dynamical system is a quantity that characterizes the rate of separation of infinitesimally close trajectories. Quantitatively, two trajectories in phase space with initial separation vector $diverge at a rate given by$

In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines to better approximate a root of a function f. The secant method can be thought of as a finite-difference approximation of Newton's method. However, the secant method predates Newton's method by over 3000 years.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In optimization, the line search strategy is one of two basic iterative approaches to find a local minimum $of an objective function . The other approach is trust region.$

In numerical analysis, the Crank–Nicolson method is a finite difference method used for numerically solving the heat equation and similar partial differential equations. It is a second-order method in time. It is implicit in time, can be written as an implicit Runge–Kutta method, and it is numerically stable. The method was developed by John Crank and Phyllis Nicolson in the mid 20th century.

In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.

Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. The "full" Newton's method requires the Jacobian in order to search for zeros, or the Hessian for finding extrema. Some iterative methods that reduce to Newton's method, such as SLSQP, may be considered quasi-Newtonian.

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function

In computational chemistry, a constraint algorithm is a method for satisfying the Newtonian motion of a rigid body which consists of mass points. A restraint algorithm is used to ensure that the distance between mass points is maintained. The general steps involved are: (i) choose novel unconstrained coordinates, (ii) introduce explicit constraint forces, (iii) minimize constraint forces implicitly by the technique of Lagrange multipliers or projection methods.

Numerical continuation is a method of computing approximate solutions of a system of parameterized nonlinear equations,

The Kantorovich theorem, or Newton–Kantorovich theorem, is a mathematical statement on the semi-local convergence of Newton's method. It was first stated by Leonid Kantorovich in 1948. It is similar to the form of the Banach fixed-point theorem, although it states existence and uniqueness of a zero rather than a fixed point.

Equilibrium constants are determined in order to quantify chemical equilibria. When an equilibrium constant $K$ is expressed as a concentration quotient,

The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This update maintains the symmetry of the matrix but does not guarantee that the update be positive definite.

The Davidon–Fletcher–Powell formula finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ( $).$

<span class="mw-page-title-main">Lagrangian mechanics</span> Formulation of classical mechanics

In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the stationary-action principle. It was introduced by the Italian-French mathematician and astronomer Joseph-Louis Lagrange in his presentation to the Turin Academy of Science in 1760 culminating in his 1788 grand opus, Mécanique analytique.

Powell's dog leg method, also called Powell's hybrid method, is an iterative optimisation algorithm for the solution of non-linear least squares problems, introduced in 1970 by Michael J. D. Powell. Similarly to the Levenberg–Marquardt algorithm, it combines the Gauss–Newton algorithm with gradient descent, but it uses an explicit trust region. At each iteration, if the step from the Gauss–Newton algorithm is within the trust region, it is used to update the current solution. If not, the algorithm searches for the minimum of the objective function along the steepest descent direction, known as Cauchy point. If the Cauchy point is outside of the trust region, it is truncated to the boundary of the latter and it is taken as the new solution. If the Cauchy point is inside the trust region, the new solution is taken at the intersection between the trust region boundary and the line joining the Cauchy point and the Gauss-Newton step.

The Barzilai-Borwein method is an iterative gradient descent method for unconstrained optimization using either of two step sizes derived from the linear trend of the most recent two iterates. This method, and modifications, are globally convergent under mild conditions, and perform competitively with conjugate gradient methods for many problems. Not depending on the objective itself, it can also solve some systems of linear and non-linear equations.

References

1 2 3 Broyden, C. G. (October 1965). "A Class of Methods for Solving Nonlinear Simultaneous Equations". Mathematics of Computation. American Mathematical Society. 19 (92): 577–593. doi: 10.1090/S0025-5718-1965-0198670-6 . JSTOR 2003941.
↑ Gay, D. M. (August 1979). "Some convergence properties of Broyden's method". SIAM Journal on Numerical Analysis. SIAM. 16 (4): 623–630. doi:10.1137/0716047.
↑ Kvaalen, Eric (November 1991). "A faster Broyden method". BIT Numerical Mathematics. SIAM. 31 (2): 369–372. doi:10.1007/BF01931297.
↑ Schubert, L. K. (1970-01-01). "Modification of a quasi-Newton method for nonlinear equations with a sparse Jacobian". Mathematics of Computation. 24 (109): 27–30. doi: 10.1090/S0025-5718-1970-0258276-9 . ISSN 0025-5718.
↑ Klement, Jan (2014-11-23). "On Using Quasi-Newton Algorithms of the Broyden Class for Model-to-Test Correlation". Journal of Aerospace Technology and Management. 6 (4): 407–414. doi: 10.5028/jatm.v6i4.373 . ISSN 2175-9146.
↑ "Broyden class methods – File Exchange – MATLAB Central". www.mathworks.com. Retrieved 2016-02-04.

External links

Simple basic explanation: The story of the blind archer

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Broyden_1965-1] 1 2 3 Broyden, C. G. (October 1965). "A Class of Methods for Solving Nonlinear Simultaneous Equations". Mathematics of Computation. American Mathematical Society. 19 (92): 577–593. doi: 10.1090/S0025-5718-1965-0198670-6 . JSTOR 2003941.

[2] Gay, D. M. (August 1979). "Some convergence properties of Broyden's method". SIAM Journal on Numerical Analysis. SIAM. 16 (4): 623–630. doi:10.1137/0716047.

[3] Kvaalen, Eric (November 1991). "A faster Broyden method". BIT Numerical Mathematics. SIAM. 31 (2): 369–372. doi:10.1007/BF01931297.

[4] Schubert, L. K. (1970-01-01). "Modification of a quasi-Newton method for nonlinear equations with a sparse Jacobian". Mathematics of Computation. 24 (109): 27–30. doi: 10.1090/S0025-5718-1970-0258276-9 . ISSN 0025-5718.

[5] Klement, Jan (2014-11-23). "On Using Quasi-Newton Algorithms of the Broyden Class for Model-to-Test Correlation". Journal of Aerospace Technology and Management. 6 (4): 407–414. doi: 10.5028/jatm.v6i4.373 . ISSN 2175-9146.

[6] "Broyden class methods – File Exchange – MATLAB Central". www.mathworks.com. Retrieved 2016-02-04.

[1]

[2]

[3]

[4]

[5]

[6]