Adjoint state method

Last updated

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem. [1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks. [2]

Contents

The adjoint state space is chosen to simplify the physical interpretation of equation constraints. [3]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the dual problem [4] and is used e.g. in the Landweber iteration method. [5]

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix is used.

When the initial problem consists of calculating the product and must satisfy , the dual problem can be realized as calculating the product (), where must satisfy . And is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea, [6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable , an optimization variable , an objective functional is defined. The state variable is often implicitly dependant on through the (direct) state equation (usually the weak form of a partial differential equation), thus the considered objective is . Usually, one would be interested in calculating using the chain rule:

Unfortunately, the term is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of , the problem

has an associate Lagrangian functional defined by

where is a Lagrange multiplier or adjoint state variable and is an inner product on . The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

where is the Gateaux derivative of with respect to in the direction . The last equation is equivalent to , the state equation, to which the solution is . The first equation is the so-called adjoint state equation,

because the operator involved is the adjoint operator of , . Resolving this equation yields the adjoint state . The gradient of the quantity of interest with respect to is (the second equation with and ), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be , for , and , and let the state equation be , with and .

The lagrangian function of the problem is , where .

The derivative of with respect to yields the state equation as shown before, and the state variable is . The derivative of with respect to is equivalent to the adjoint equation, which is, for every ,

Thus, we can write symbolically . The gradient would be

where is a third order tensor, is the dyadic product between the direct and adjoint states and denotes a double tensor contraction. It is assumed that has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator was self-adjoint, , the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in operations for the decomposition and operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only operations since the matrices are the same.

See also

Related Research Articles

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints. It is named after the mathematician Joseph-Louis Lagrange.

In physics, a Langevin equation is a stochastic differential equation describing how a system evolves when subjected to a combination of deterministic and fluctuating ("random") forces. The dependent variables in a Langevin equation typically are collective (macroscopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. One application is to Brownian motion, which models the fluctuating motion of a small particle in a fluid.

<span class="mw-page-title-main">Fokker–Planck equation</span> Partial differential equation

In statistical mechanics and information theory, the Fokker–Planck equation is a partial differential equation that describes the time evolution of the probability density function of the velocity of a particle under the influence of drag forces and random forces, as in Brownian motion. The equation can be generalized to other observables as well. The Fokker-Planck equation has multiple applications in information theory, graph theory, data science, finance, economics etc.

In mathematics, a self-adjoint operator on an infinite-dimensional complex vector space V with inner product is a linear map A that is its own adjoint. If V is finite-dimensional with a given orthonormal basis, this is equivalent to the condition that the matrix of A is a Hermitian matrix, i.e., equal to its conjugate transpose A. By the finite-dimensional spectral theorem, V has an orthonormal basis such that the matrix of A relative to this basis is a diagonal matrix with entries in the real numbers. This article deals with applying generalizations of this concept to operators on Hilbert spaces of arbitrary dimension.

<span class="mw-page-title-main">Lattice model (physics)</span>

In mathematical physics, a lattice model is a mathematical model of a physical system that is defined on a lattice, as opposed to a continuum, such as the continuum of space or spacetime. Lattice models originally occurred in the context of condensed matter physics, where the atoms of a crystal automatically form a lattice. Currently, lattice models are quite popular in theoretical physics, for many reasons. Some models are exactly solvable, and thus offer insight into physics beyond what can be learned from perturbation theory. Lattice models are also ideal for study by the methods of computational physics, as the discretization of any continuum model automatically turns it into a lattice model. The exact solution to many of these models includes the presence of solitons. Techniques for solving these include the inverse scattering transform and the method of Lax pairs, the Yang–Baxter equation and quantum groups. The solution of these models has given insights into the nature of phase transitions, magnetization and scaling behaviour, as well as insights into the nature of quantum field theory. Physical lattice models frequently occur as an approximation to a continuum theory, either to give an ultraviolet cutoff to the theory to prevent divergences or to perform numerical computations. An example of a continuum theory that is widely studied by lattice models is the QCD lattice model, a discretization of quantum chromodynamics. However, digital physics considers nature fundamentally discrete at the Planck scale, which imposes upper limit to the density of information, aka Holographic principle. More generally, lattice gauge theory and lattice field theory are areas of study. Lattice models are also used to simulate the structure and dynamics of polymers.

In mathematics, the Hodge star operator or Hodge star is a linear map defined on the exterior algebra of a finite-dimensional oriented vector space endowed with a nondegenerate symmetric bilinear form. Applying the operator to an element of the algebra produces the Hodge dual of the element. This map was introduced by W. V. D. Hodge.

In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇2.

In quantum mechanics, the canonical commutation relation is the fundamental relation between canonical conjugate quantities. For example,

In probability theory and related fields, Malliavin calculus is a set of mathematical techniques and ideas that extend the mathematical field of calculus of variations from deterministic functions to stochastic processes. In particular, it allows the computation of derivatives of random variables. Malliavin calculus is also called the stochastic calculus of variations. P. Malliavin first initiated the calculus on infinite dimensional space. Then, the significant contributors such as S. Kusuoka, D. Stroock, J-M. Bismut, S. Watanabe, I. Shigekawa, and so on finally completed the foundations.

In functional analysis, a branch of mathematics, the Borel functional calculus is a functional calculus, which has particularly broad scope. Thus for instance if T is an operator, applying the squaring function ss2 to T yields the operator T2. Using the functional calculus for larger classes of functions, we can for example define rigorously the "square root" of the (negative) Laplacian operator −Δ or the exponential

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called "hints".

In differential geometry, the Laplace–Beltrami operator is a generalization of the Laplace operator to functions defined on submanifolds in Euclidean space and, even more generally, on Riemannian and pseudo-Riemannian manifolds. It is named after Pierre-Simon Laplace and Eugenio Beltrami.

In 3-dimensional topology, a part of the mathematical field of geometric topology, the Casson invariant is an integer-valued invariant of oriented integral homology 3-spheres, introduced by Andrew Casson.

<span class="mw-page-title-main">Lagrangian coherent structure</span> Distinguished surfaces of dynamic trajectories

Lagrangian coherent structures (LCSs) are distinguished surfaces of trajectories in a dynamical system that exert a major influence on nearby trajectories over a time interval of interest. The type of this influence may vary, but it invariably creates a coherent trajectory pattern for which the underlying LCS serves as a theoretical centerpiece. In observations of tracer patterns in nature, one readily identifies coherent features, but it is often the underlying structure creating these features that is of interest.

In computational fluid dynamics, the Stochastic Eulerian Lagrangian Method (SELM) is an approach to capture essential features of fluid-structure interactions subject to thermal fluctuations while introducing approximations which facilitate analysis and the development of tractable numerical methods. SELM is a hybrid approach utilizing an Eulerian description for the continuum hydrodynamic fields and a Lagrangian description for elastic structures. Thermal fluctuations are introduced through stochastic driving fields. Approaches also are introduced for the stochastic fields of the SPDEs to obtain numerical methods taking into account the numerical discretization artifacts to maintain statistical principles, such as fluctuation-dissipation balance and other properties in statistical mechanics.

Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is regularization of the form

<span class="mw-page-title-main">Causal fermion systems</span> Candidate unified theory of physics

The theory of causal fermion systems is an approach to describe fundamental physics. It provides a unification of the weak, the strong and the electromagnetic forces with gravity at the level of classical field theory. Moreover, it gives quantum mechanics as a limiting case and has revealed close connections to quantum field theory. Therefore, it is a candidate for a unified physical theory. Instead of introducing physical objects on a preexisting spacetime manifold, the general concept is to derive spacetime as well as all the objects therein as secondary objects from the structures of an underlying causal fermion system. This concept also makes it possible to generalize notions of differential geometry to the non-smooth setting. In particular, one can describe situations when spacetime no longer has a manifold structure on the microscopic scale. As a result, the theory of causal fermion systems is a proposal for quantum geometry and an approach to quantum gravity.

In mathematical optimization, the proximal operator is an operator associated with a proper, lower semi-continuous convex function from a Hilbert space to , and is defined by:

Tau functions are an important ingredient in the modern mathematical theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form.

Hamiltonian truncation is a numerical method used to study quantum field theories (QFTs) in spacetime dimensions. Hamiltonian truncation is an adaptation of the Rayleigh–Ritz method from quantum mechanics. It is closely related to the exact diagonalization method used to treat spin systems in condensed matter physics. The method is typically used to study QFTs on spacetimes of the form , specifically to compute the spectrum of the Hamiltonian along . A key feature of Hamiltonian truncation is that an explicit ultraviolet cutoff is introduced, akin to the lattice spacing a in lattice Monte Carlo methods. Since Hamiltonian truncation is a nonperturbative method, it can be used to study strong-coupling phenomena like spontaneous symmetry breaking.

References

  1. Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers". Structural and Multidisciplinary Optimization. 57 (6): 2273–2289. doi:10.1007/s00158-017-1858-2. ISSN   1615-1488. S2CID   125712091.
  2. Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
  3. Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
  4. McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004). "Fluid control using the adjoint method" (PDF). ACM Transactions on Graphics. 23 (3): 449–456. doi:10.1145/1015706.1015744. Archived (PDF) from the original on 29 January 2022. Retrieved 28 October 2022.
  5. Lundvall, Johan (2007). "Data Assimilation in Fluid Dynamics using Adjoint Optimization" (PDF). Sweden: Linköping University of Technology. Archived (PDF) from the original on 9 October 2022. Retrieved 28 October 2022.
  6. Cea, Jean (1986). "Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût". ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique (in French). 20 (3): 371–402. doi: 10.1051/m2an/1986200303711 .