Compact quasi-Newton representation

Last updated

The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.

Contents

The compact representation (right) of a dense Hessian approximation (left) is an initial matrix (typically diagonal) plus low rank decomposition. It has a small memory footprint (shaded areas) and enables efficient matrix computations Main-figure0.pdf
The compact representation (right) of a dense Hessian approximation (left) is an initial matrix (typically diagonal) plus low rank decomposition. It has a small memory footprint (shaded areas) and enables efficient matrix computations

Definition

The compact representation of a quasi-Newton matrix for the inverse Hessian or direct Hessian of a nonlinear objective function expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank- or rank- update of an initial matrix. [1] [2] Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients in its definition . In particular, for or the rectangular matrices and the square symmetric systems depend on the 's and define the quasi-Newton representations

Applications

Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software. [3] [4] [5] [6] When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients. [7] Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and trust region techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector is:

Background

In the context of the GMRES method, Walker [8] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of identity plus rank-1 matrices. [7] Specifically, for and when the product of rank-1 updates to the identity is The BFGS update can be expressed in terms of products of the 's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations

Recursive quasi-Newton updates

A parametric family of quasi-Newton updates includes many of the most known formulas. [9] For arbitrary vectors and such that and general recursive update formulas for the inverse and direct Hessian estimates are

By making specific choices for the parameter vectors and well known methods are recovered

Table 1: Quasi-Newton updates parametrized by vectors and
BFGS PSB (Powell Symmetric Broyden)
DFP
SR1 SR1
[10] MSS (Multipoint-Symmetric-Secant)


Compact Representations

Collecting the updating vectors of the recursive formulas into matrices, define

upper triangular

lower triangular

and diagonal

With these definitions the compact representations of general rank-2 updates in ( 2 ) and ( 3 ) (including the well known quasi-Newton updates in Table 1) have been developed in Brust: [11]

and the formula for the direct Hessian is

For instance, when the representation in ( 4 ) is the compact formula for the BFGS recursion in ( 1 ).

Specific Representations

Prior to the development of the compact representations of ( 2 ) and ( 3 ), equivalent representations have been discovered for most known updates (see Table 1).

BFGS

Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known. [7] In particular, the inverse representation is given by

The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:

SR1

The SR1 (Symmetric Rank-1) compact representation was first proposed in. [7] Using the definitions of and from above, the inverse Hessian formula is given by

The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form

MSS

The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov. [12] The compact representation for the direct Hessian was derived in [13]

Another equivalent compact representation for the MSS matrix is derived by rewriting in terms of . [14] The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.

DFP

Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping , and in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS. [15]

PSB

The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation. [16] It is equivalent to substituting in ( 5 )

Structured BFGS

For structured optimization problems in which the objective function can be decomposed into two parts , where the gradients and Hessian of are known but only the gradient of is known, structured BFGS formulas exist. The compact representation of these methods has the general form of ( 5 ), with specific and . [17]

Reduced BFGS

The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization , where is underdetermined. In addition to the matrices the RCR also stores the projections of the 's onto the nullspace of

For the compact representation of the BFGS matrix (with a multiple of the identity ) the (1,1) block of the inverse KKT matrix has the compact representation [18]

Limited Memory

Pattern in a limited-memory updating strategy. With a memory parameter of
m
=
7
{\displaystyle m=7}
, the first
k
<=
m
{\displaystyle k\leq m}
iterations fill the matrix (e.g., an upper triangular for the compact representation,
R
k
=
triu
(
S
k
T
Y
k
)
{\displaystyle R_{k}={\text{triu}}(S_{k}^{T}Y_{k})}
). For
k
>
m
{\displaystyle k>m}
the limited-memory techniques discards the oldest information and add a new column to the end. Lmupdates fit.gif
Pattern in a limited-memory updating strategy. With a memory parameter of , the first iterations fill the matrix (e.g., an upper triangular for the compact representation, ). For the limited-memory techniques discards the oldest information and add a new column to the end.


The most common use of the compact representations is for the limited-memory setting where denotes the memory parameter, with typical values around (see e.g., [18] [7] ). Then, instead of storing the history of all vectors one limits this to the most recent vectors and possibly or . Further, typically the initialization is chosen as an adaptive multiple of the identity , with and . Limited-memory methods are frequently used for large-scale problems with many variables (i.e., can be large), in which the limited-memory matrices and (and possibly ) are tall and very skinny: and .

Implementations

Open source implementations include:

Non open source implementations include:

Works cited

  1. Nocedal, J.; Wright, S.J. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer New York, NY. doi:10.1007/978-0-387-40065-5. ISBN   978-0-387-30303-1.
  2. Brust, J. J. (2018). Large-Scale Quasi-Newton Trust-Region Methods: High-Accuracy Solvers, Dense Initializations, and Extensions (PhD thesis). University of California, Merced.
  3. 1 2 Byrd, R. H.; Nocedal, J; Waltz, R. A. (2006). "KNITRO: An integrated package for nonlinear optimization". Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications. Vol. 83. In: Di Pillo, G., Roma, M. (eds) Large-Scale Nonlinear Optimization. Nonconvex Optimization and Its Applications, vol 83.: Springer, Boston, MA. p. 35-59. doi:10.1007/0-387-30065-1_4. ISBN   978-0-387-30063-4.{{cite book}}: CS1 maint: location (link)
  4. Zhu, C.; Byrd, R. H.; Lu, P.; Nocedal, J. (1997). "Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization". ACM Transactions on Mathematical Software (TOMS). 23 (4): 550-560. doi:10.1145/279232.279236.
  5. Brust, J.; Burdakov, O.; Erway, J.; Marcia, R. (2022). "Algorithm 1030: SC-SR1: MATLAB software for limited-memory SR1 trust-region methods". ACM Transactions on Mathematical Software (TOMS). 48 (4): 1-33. doi:10.1145/3550269.
  6. Wächter, A.; Biegler, L. T. (2006). "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming". Mathematical Programming. 106: 25-57. doi:10.1007/s10107-004-0559-y.
  7. 1 2 3 4 5 Byrd, R. H.; Nocedal, J.; Schnabel, R. B. (1994). "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods". Mathematical Programming. 63 (4): 129–156. doi:10.1007/BF01582063. S2CID   5581219.
  8. Walker, H. F. (1988). "Implementation of the GMRES Method Using Householder Transformations". SIAM Journal on Scientific and Statistical Computing. 9 (1): 152–163. doi:10.1137/0909010.
  9. Dennis, Jr, J. E.; Moré, J. J. (1977). "Quasi-Newton methods, motivation and theory" (PDF). SIAM Review. 19 (1): 46-89. doi:10.1137/1019005. hdl: 1813/6056 .{{cite journal}}: CS1 maint: multiple names: authors list (link)
  10. Brust, J. J. (2024). "Useful Compact Representations for Data-Fitting". arXiv: 2403.12206 [math.OC].
  11. Burdakov, O. P. (1983). "Methods of the secant type for systems of equations with symmetric Jacobian matrix". Numerical Functional Analysis and Optimization. 6 (2): 1–18. doi:10.1080/01630568308816160.
  12. Burdakov, O. P.; Martínez, J. M.; Pilotta, E. A. (2002). "A limited-memory multipoint symmetric secant method for bound constrained optimization". Annals of Operations Research. 117: 51–70. doi:10.1023/A:1021561204463.
  13. Brust, J. J.; Erway, J. B.; Marcia, R. F. (2024). "Shape-changing trust-region methods using multipoint symmetric secant matrices". Optimization Methods and Software. 39 (5): 990–1007. arXiv: 2209.12057 . doi:10.1080/10556788.2023.2296441.
  14. Erway, J. B.; Jain, V.; Marcia, R. F. (2013). Shifted limited-memory DFP systems. In 2013 Asilomar Conference on Signals, Systems and Computers. IEEE. pp. 1033–1037.
  15. Kanzow, C.; Steck, D. (2023). "Regularization of limited memory quasi-Newton methods for large-scale nonconvex minimization". Mathematical Programming Computation. 15 (3): 417–444. arXiv: 1911.04584 . doi: 10.1007/s12532-023-00238-4 .
  16. Brust, J. J; Di, Z.; Leyffer, S.; Petra, C. G. (2021). "Compact representations of structured BFGS matrices". Computational Optimization and Applications. 80 (1): 55–88. arXiv: 2208.00057 . doi:10.1007/s10589-021-00297-0.
  17. 1 2 Brust, J. J; Marcia, R.F.; Petra, C.G.; Saunders, M. A. (2022). "Large-scale optimization with linear equality constraints using reduced compact representation". SIAM Journal on Scientific Computing. 44 (1): A103 –A127. arXiv: 2101.11048 . Bibcode:2022SJSC...44A.103B. doi:10.1137/21M1393819.
  18. "Collected Algorithms of the ACM (CALGO)". calgo.acm.org.
  19. "TOMS Alg. 1030". calgo.acm.org/1030.zip.
  20. Zhu, C.; Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge (1997). "L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization". ACM Transactions on Mathematical Software. 23 (4): 550–560. doi: 10.1145/279232.279236 . S2CID   207228122.

Related Research Articles

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In mathematics, a symmetric matrix with real entries is positive-definite if the real number is positive for every nonzero real column vector where is the row vector transpose of More generally, a Hermitian matrix is positive-definite if the real number is positive for every nonzero complex column vector where denotes the conjugate transpose of

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.

<span class="mw-page-title-main">Symplectic group</span> Mathematical group

In mathematics, the name symplectic group can refer to two different, but closely related, collections of mathematical groups, denoted Sp(2n, F) and Sp(n) for positive integer n and field F (usually C or R). The latter is called the compact symplectic group and is also denoted by . Many authors prefer slightly different notations, usually differing by factors of 2. The notation used here is consistent with the size of the most common matrices which represent the groups. In Cartan's classification of the simple Lie algebras, the Lie algebra of the complex group Sp(2n, C) is denoted Cn, and Sp(n) is the compact real form of Sp(2n, C). Note that when we refer to the (compact) symplectic group it is implied that we are talking about the collection of (compact) symplectic groups, indexed by their dimension n.

<span class="mw-page-title-main">Special unitary group</span> Group of unitary matrices with determinant of 1

In mathematics, the special unitary group of degree n, denoted SU(n), is the Lie group of n × n unitary matrices with determinant 1.

In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇2.

In mathematics, the Heisenberg group, named after Werner Heisenberg, is the group of 3×3 upper triangular matrices of the form

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In mathematics and in theoretical physics, the Stone–von Neumann theorem refers to any one of a number of different formulations of the uniqueness of the canonical commutation relations between position and momentum operators. It is named after Marshall Stone and John von Neumann.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In the study of the representation theory of Lie groups, the study of representations of SU(2) is fundamental to the study of representations of semisimple Lie groups. It is the first case of a Lie group that is both a compact group and a non-abelian group. The first condition implies the representation theory is discrete: representations are direct sums of a collection of basic irreducible representations. The second means that there will be irreducible representations in dimensions greater than 1.

In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.

Limited-memory BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimize over unconstrained values of the real-vector where is a differentiable scalar function.

In numerical analysis, a quasi-Newton method is an iterative numerical method used either to find zeroes or to find local maxima and minima of functions via an iterative recurrence formula much like the one for Newton's method, except using approximations of the derivatives of the functions in place of exact derivatives. Newton's method requires the Jacobian matrix of all partial derivatives of a multivariate function when used to search for zeros or the Hessian matrix when used for finding extrema. Quasi-Newton methods, on the other hand, can be used when the Jacobian matrices or Hessian matrices are unavailable or are impractical to compute at every iteration.

<span class="mw-page-title-main">Classical group</span> Type of group in mathematics

In mathematics, the classical groups are defined as the special linear groups over the reals , the complex numbers and the quaternions together with special automorphism groups of symmetric or skew-symmetric bilinear forms and Hermitian or skew-Hermitian sesquilinear forms defined on real, complex and quaternionic finite-dimensional vector spaces. Of these, the complex classical Lie groups are four infinite families of Lie groups that together with the exceptional groups exhaust the classification of simple Lie groups. The compact classical groups are compact real forms of the complex classical groups. The finite analogues of the classical groups are the classical groups of Lie type. The term "classical group" was coined by Hermann Weyl, it being the title of his 1939 monograph The Classical Groups.

The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This update maintains the symmetry of the matrix but does not guarantee that the update be positive definite.

The Davidon–Fletcher–Powell formula finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix.

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

<span class="mw-page-title-main">Lie algebra extension</span> Creating a "larger" Lie algebra from a smaller one, in one of several ways

In the theory of Lie groups, Lie algebras and their representation theory, a Lie algebra extensione is an enlargement of a given Lie algebra g by another Lie algebra h. Extensions arise in several ways. There is the trivial extension obtained by taking a direct sum of two Lie algebras. Other types are the split extension and the central extension. Extensions may arise naturally, for instance, when forming a Lie algebra from projective group representations. Such a Lie algebra will contain central charges.