The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.
The compact representation of a quasi-Newton matrix for the inverse Hessian or direct Hessian of a nonlinear objective function expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank- or rank- update of an initial matrix. [1] [2] Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients in its definition . In particular, for or the rectangular matrices and the square symmetric systems depend on the 's and define the quasi-Newton representations
Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software. [3] [4] [5] [6] When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients. [7] Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and trust region techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector is:
In the context of the GMRES method, Walker [8] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of identity plus rank-1 matrices. [7] Specifically, for and when the product of rank-1 updates to the identity is The BFGS update can be expressed in terms of products of the 's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations
1 |
A parametric family of quasi-Newton updates includes many of the most known formulas. [9] For arbitrary vectors and such that and general recursive update formulas for the inverse and direct Hessian estimates are
2 |
3 |
By making specific choices for the parameter vectors and well known methods are recovered
BFGS | PSB (Powell Symmetric Broyden) | ||
DFP | |||
SR1 | SR1 | ||
[10] | MSS (Multipoint-Symmetric-Secant) |
Collecting the updating vectors of the recursive formulas into matrices, define
upper triangular
lower triangular
and diagonal
With these definitions the compact representations of general rank-2 updates in ( 2 ) and ( 3 ) (including the well known quasi-Newton updates in Table 1) have been developed in Brust: [11]
4 |
and the formula for the direct Hessian is
5 |
For instance, when the representation in ( 4 ) is the compact formula for the BFGS recursion in ( 1 ).
Prior to the development of the compact representations of ( 2 ) and ( 3 ), equivalent representations have been discovered for most known updates (see Table 1).
Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known. [7] In particular, the inverse representation is given by
The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:
The SR1 (Symmetric Rank-1) compact representation was first proposed in. [7] Using the definitions of and from above, the inverse Hessian formula is given by
The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form
The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov. [12] The compact representation for the direct Hessian was derived in [13]
Another equivalent compact representation for the MSS matrix is derived by rewriting in terms of . [14] The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.
Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping , and in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS. [15]
The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation. [16] It is equivalent to substituting in ( 5 )
For structured optimization problems in which the objective function can be decomposed into two parts , where the gradients and Hessian of are known but only the gradient of is known, structured BFGS formulas exist. The compact representation of these methods has the general form of ( 5 ), with specific and . [17]
The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization , where is underdetermined. In addition to the matrices the RCR also stores the projections of the 's onto the nullspace of
For the compact representation of the BFGS matrix (with a multiple of the identity ) the (1,1) block of the inverse KKT matrix has the compact representation [18]
The most common use of the compact representations is for the limited-memory setting where denotes the memory parameter, with typical values around (see e.g., [18] [7] ). Then, instead of storing the history of all vectors one limits this to the most recent vectors and possibly or . Further, typically the initialization is chosen as an adaptive multiple of the identity , with and . Limited-memory methods are frequently used for large-scale problems with many variables (i.e., can be large), in which the limited-memory matrices and (and possibly ) are tall and very skinny: and .
Open source implementations include:
optim
general-purpose optimizer routine uses the L-BFGS-B method.Non open source implementations include:
{{cite book}}
: CS1 maint: location (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.
In mathematics, a symmetric matrix with real entries is positive-definite if the real number is positive for every nonzero real column vector where is the row vector transpose of More generally, a Hermitian matrix is positive-definite if the real number is positive for every nonzero complex column vector where denotes the conjugate transpose of
In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.
In mathematics, the name symplectic group can refer to two different, but closely related, collections of mathematical groups, denoted Sp(2n, F) and Sp(n) for positive integer n and field F (usually C or R). The latter is called the compact symplectic group and is also denoted by . Many authors prefer slightly different notations, usually differing by factors of 2. The notation used here is consistent with the size of the most common matrices which represent the groups. In Cartan's classification of the simple Lie algebras, the Lie algebra of the complex group Sp(2n, C) is denoted Cn, and Sp(n) is the compact real form of Sp(2n, C). Note that when we refer to the (compact) symplectic group it is implied that we are talking about the collection of (compact) symplectic groups, indexed by their dimension n.
In mathematics, the special unitary group of degree n, denoted SU(n), is the Lie group of n × n unitary matrices with determinant 1.
In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇2.
In mathematics, the Heisenberg group, named after Werner Heisenberg, is the group of 3×3 upper triangular matrices of the form
In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.
In mathematics and in theoretical physics, the Stone–von Neumann theorem refers to any one of a number of different formulations of the uniqueness of the canonical commutation relations between position and momentum operators. It is named after Marshall Stone and John von Neumann.
In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix
In the study of the representation theory of Lie groups, the study of representations of SU(2) is fundamental to the study of representations of semisimple Lie groups. It is the first case of a Lie group that is both a compact group and a non-abelian group. The first condition implies the representation theory is discrete: representations are direct sums of a collection of basic irreducible representations. The second means that there will be irreducible representations in dimensions greater than 1.
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.
Limited-memory BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimize over unconstrained values of the real-vector where is a differentiable scalar function.
In numerical analysis, a quasi-Newton method is an iterative numerical method used either to find zeroes or to find local maxima and minima of functions via an iterative recurrence formula much like the one for Newton's method, except using approximations of the derivatives of the functions in place of exact derivatives. Newton's method requires the Jacobian matrix of all partial derivatives of a multivariate function when used to search for zeros or the Hessian matrix when used for finding extrema. Quasi-Newton methods, on the other hand, can be used when the Jacobian matrices or Hessian matrices are unavailable or are impractical to compute at every iteration.
In mathematics, the classical groups are defined as the special linear groups over the reals , the complex numbers and the quaternions together with special automorphism groups of symmetric or skew-symmetric bilinear forms and Hermitian or skew-Hermitian sesquilinear forms defined on real, complex and quaternionic finite-dimensional vector spaces. Of these, the complex classical Lie groups are four infinite families of Lie groups that together with the exceptional groups exhaust the classification of simple Lie groups. The compact classical groups are compact real forms of the complex classical groups. The finite analogues of the classical groups are the classical groups of Lie type. The term "classical group" was coined by Hermann Weyl, it being the title of his 1939 monograph The Classical Groups.
The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This update maintains the symmetry of the matrix but does not guarantee that the update be positive definite.
The Davidon–Fletcher–Powell formula finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix.
In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.
In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.
In the theory of Lie groups, Lie algebras and their representation theory, a Lie algebra extensione is an enlargement of a given Lie algebra g by another Lie algebra h. Extensions arise in several ways. There is the trivial extension obtained by taking a direct sum of two Lie algebras. Other types are the split extension and the central extension. Extensions may arise naturally, for instance, when forming a Lie algebra from projective group representations. Such a Lie algebra will contain central charges.