The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.
The compact representation of a quasi-Newton matrix for the inverse Hessian or direct Hessian of a nonlinear objective function expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank- or rank- update of an initial matrix. [1] [2] Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients in its definition . In particular, for or the rectangular matrices and the square symmetric systems depend on the 's and define the quasi-Newton representations
Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software. [3] [4] [5] [6] When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients. [7] Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and trust region techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector is:
In the context of the GMRES method, Walker [8] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of identity plus rank-1 matrices. [7] Specifically, for and when the product of rank-1 updates to the identity is The BFGS update can be expressed in terms of products of the 's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations
1 |
A parametric family of quasi-Newton updates includes many of the most known formulas. [9] For arbitrary vectors and such that and general recursive update formulas for the inverse and direct Hessian estimates are
2 |
3 |
By making specific choices for the parameter vectors and well known methods are recovered
BFGS | PSB (Powell Symmetric Broyden) | ||
DFP | |||
SR1 | SR1 | ||
[10] | MSS (Multipoint-Symmetric-Secant) |
Collecting the updating vectors of the recursive formulas into matrices, define
upper triangular
lower triangular
and diagonal
With these definitions the compact representations of general rank-2 updates in ( 2 ) and ( 3 ) (including the well known quasi-Newton updates in Table 1) have been developed in Brust: [11]
4 |
and the formula for the direct Hessian is
5 |
For instance, when the representation in ( 4 ) is the compact formula for the BFGS recursion in ( 1 ).
Prior to the development of the compact representations of ( 2 ) and ( 3 ), equivalent representations have been discovered for most known updates (see Table 1).
Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known. [7] In particular, the inverse representation is given by
The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:
The SR1 (Symmetric Rank-1) compact representation was first proposed in. [7] Using the definitions of and from above, the inverse Hessian formula is given by
The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form
The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov. [12] The compact representation for the direct Hessian was derived in [13]
Another equivalent compact representation for the MSS matrix is derived by rewriting in terms of . [14] The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.
Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping , and in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS. [15]
The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation. [16] It is equivalent to substituting in ( 5 )
For structured optimization problems in which the objective function can be decomposed into two parts , where the gradients and Hessian of are known but only the gradient of is known, structured BFGS formulas exist. The compact representation of these methods has the general form of ( 5 ), with specific and . [17]
The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization , where is underdetermined. In addition to the matrices the RCR also stores the projections of the 's onto the nullspace of
For the compact representation of the BFGS matrix (with a multiple of the identity ) the (1,1) block of the inverse KKT matrix has the compact representation [18]
The most common use of the compact representations is for the limited-memory setting where denotes the memory parameter, with typical values around (see e.g., [18] [7] ). Then, instead of storing the history of all vectors one limits this to the most recent vectors and possibly or . Further, typically the initialization is chosen as an adaptive multiple of the identity , with and . Limited-memory methods are frequently used for large-scale problems with many variables (i.e., can be large), in which the limited-memory matrices and (and possibly ) are tall and very skinny: and .
Open source implementations include:
optim
general-purpose optimizer routine uses the L-BFGS-B method.Non open source implementations include:
{{cite book}}
: CS1 maint: location (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)