Hamilton–Jacobi–Bellman equation

Last updated

The Hamilton-Jacobi-Bellman (HJB) equation is a nonlinear partial differential equation that provides necessary and sufficient conditions for optimality of a control with respect to a loss function. [1] Its solution is the value function of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer (or minimizer) of the Hamiltonian involved in the HJB equation. [2] [3]

Contents

The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by Richard Bellman and coworkers. [4] [5] [6] The connection to the Hamilton–Jacobi equation from classical physics was first drawn by Rudolf Kálmán. [7] In discrete-time problems, the analogous difference equation is usually referred to as the Bellman equation.

While classical variational problems, such as the brachistochrone problem, can be solved using the Hamilton–Jacobi–Bellman equation, [8] the method can be applied to a broader spectrum of problems. Further it can be generalized to stochastic systems, in which case the HJB equation is a second-order elliptic partial differential equation. [9] A major drawback, however, is that the HJB equation admits classical solutions only for a sufficiently smooth value function, which is not guaranteed in most situations. Instead, the notion of a viscosity solution is required, in which conventional derivatives are replaced by (set-valued) subderivatives. [10]

Optimal Control Problems

Consider the following problem in deterministic optimal control over the time period :

where is the scalar cost rate function and is a function that gives the bequest value at the final state, is the system state vector, is assumed given, and for is the control vector that we are trying to find. Thus, is the value function.

The system must also be subject to

where gives the vector determining physical evolution of the state vector over time.

The Partial Differential Equation

For this simple system, the Hamilton–Jacobi–Bellman partial differential equation is

subject to the terminal condition

As before, the unknown scalar function in the above partial differential equation is the Bellman value function, which represents the cost incurred from starting in state at time and controlling the system optimally from then until time .

Deriving the Equation

Intuitively, the HJB equation can be derived as follows. If is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's principle of optimality, going from time t to t + dt, we have

Note that the Taylor expansion of the first term on the right-hand side is

where denotes the terms in the Taylor expansion of higher order than one in little-o notation. Then if we subtract from both sides, divide by dt, and take the limit as dt approaches zero, we obtain the HJB equation defined above.

Solving the Equation

The HJB equation is usually solved backwards in time, starting from and ending at . [11]

When solved over the whole of state space and is continuously differentiable, the HJB equation is a necessary and sufficient condition for an optimum when the terminal state is unconstrained. [12] If we can solve for then we can find from it a control that achieves the minimum cost.

In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution (Pierre-Louis Lions and Michael Crandall), [13] minimax solution (Andrei Izmailovich Subbotin  [ ru ]), and others.

Approximate dynamic programming has been introduced by D. P. Bertsekas and J. N. Tsitsiklis with the use of artificial neural networks (multilayer perceptrons) for approximating the Bellman function in general. [14] This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters. In particular, for continuous-time systems, an approximate dynamic programming approach that combines both policy iterations with neural networks was introduced. [15] In discrete-time, an approach to solve the HJB equation combining value iterations and neural networks was introduced. [16]

Alternatively, it has been shown that sum-of-squares optimization can yield an approximate polynomial solution to the Hamilton–Jacobi–Bellman equation arbitrarily well with respect to the norm. [17]

Extension to Stochastic Problems

The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above

now with the stochastic process to optimize and the steering. By first using Bellman and then expanding with Itô's rule, one finds the stochastic HJB equation

where represents the stochastic differentiation operator, and subject to the terminal condition

Note that the randomness has disappeared. In this case a solution of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example Merton's portfolio problem).

Application to LQG-Control

As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by

and the cost accumulates at rate , the HJB equation is given by

with optimal action given by

Assuming a quadratic form for the value function, we obtain the usual Riccati equation for the Hessian of the value function as is usual for Linear-quadratic-Gaussian control.

See also

Related Research Articles

<span class="mw-page-title-main">Dynamic programming</span> Problem optimization method

Dynamic programming is both a mathematical optimization method and an algorithmic paradigm. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.

In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many other scientists since most systems are inherently nonlinear in nature. Nonlinear dynamical systems, describing changes in variables over time, may appear chaotic, unpredictable, or counterintuitive, contrasting with much simpler linear systems.

The calculus of variations is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals: mappings from a set of functions to the real numbers. Functionals are often expressed as definite integrals involving functions and their derivatives. Functions that maximize or minimize functionals may be found using the Euler–Lagrange equation of the calculus of variations.

<span class="mw-page-title-main">Optimal control</span> Mathematical way of attaining a desired output from a dynamic system

Optimal control theory is a branch of control theory that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the Moon with minimum fuel expenditure. Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory.

<span class="mw-page-title-main">Richard E. Bellman</span> American mathematician

Richard Ernest Bellman was an American applied mathematician, who introduced dynamic programming in 1953, and made important contributions in other fields of mathematics, such as biomathematics. He founded the leading biomathematical journal Mathematical Biosciences, as well as the Journal of Mathematical Analysis and Applications.

Pontryagin's maximum principle is used in optimal control theory to find the best possible control for taking a dynamical system from one state to another, especially in the presence of constraints for the state or input controls. It states that it is necessary for any optimal control along with the optimal state trajectory to solve the so-called Hamiltonian system, which is a two-point boundary value problem, plus a maximum condition of the control Hamiltonian. These necessary conditions become sufficient under certain convexity conditions on the objective and constraint functions.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov as they are an extension of Markov chains.

<span class="mw-page-title-main">Bellman equation</span> Necessary condition for optimality associated with dynamic programming

A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time in terms of the payoff from some initial choices and the "value" of the remaining decision problem that results from those initial choices. This breaks a dynamic optimization problem into a sequence of simpler subproblems, as Bellman's “principle of optimality" prescribes. The equation applies to algebraic structures with a total ordering; for algebraic structures with a partial ordering, the generic Bellman's equation can be used.

<span class="mw-page-title-main">Differential equation</span> Type of functional equation (mathematics)

In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, and the differential equation defines a relationship between the two. Such relations are common; therefore, differential equations play a prominent role in many disciplines including engineering, physics, economics, and biology.

<span class="mw-page-title-main">Ornstein–Uhlenbeck process</span> Stochastic process modeling random walk with friction

In mathematics, the Ornstein–Uhlenbeck process is a stochastic process with applications in financial mathematics and the physical sciences. Its original application in physics was as a model for the velocity of a massive Brownian particle under the influence of friction. It is named after Leonard Ornstein and George Eugene Uhlenbeck.

In mathematics, the viscosity solution concept was introduced in the early 1980s by Pierre-Louis Lions and Michael G. Crandall as a generalization of the classical concept of what is meant by a 'solution' to a partial differential equation (PDE). It has been found that the viscosity solution is the natural solution concept to use in many applications of PDE's, including for example first order equations arising in dynamic programming, differential games or front evolution problems, as well as second-order equations such as the ones arising in stochastic optimal control or stochastic differential games.

An eikonal equation is a non-linear first-order partial differential equation that is encountered in problems of wave propagation.

The Hamiltonian is a function used to solve a problem of optimal control for a dynamical system. It can be understood as an instantaneous increment of the Lagrangian expression of the problem that is to be optimized over a certain time period. Inspired by—but distinct from—the Hamiltonian of classical mechanics, the Hamiltonian of optimal control theory was developed by Lev Pontryagin as part of his maximum principle. Pontryagin proved that a necessary condition for solving the optimal control problem is that the control should be chosen so as to optimize the Hamiltonian.

Merton's portfolio problem is a problem in continuous-time finance and in particular intertemporal portfolio choice. An investor must choose how much to consume and must allocate their wealth between stocks and a risk-free asset so as to maximize expected utility. The problem was formulated and solved by Robert C. Merton in 1969 both for finite lifetimes and for the infinite case. Research has continued to extend and generalize the model to include factors like transaction costs and bankruptcy.

In the theory of stochastic processes, filtering describes the problem of determining the state of a system from an incomplete and potentially noisy set of observations. While originally motivated by problems in engineering, filtering found applications in many fields from signal processing to finance.

The value function of an optimization problem gives the value attained by the objective function at a solution, while only depending on the parameters of the problem. In a controlled dynamical system, the value function represents the optimal payoff of the system over the interval [t, t1] when started at the time-t state variable x(t)=x. If the objective function represents some cost that is to be minimized, the value function can be interpreted as the cost to finish the optimal program, and is thus referred to as "cost-to-go function." In an economic context, where the objective function usually represents utility, the value function is conceptually equivalent to the indirect utility function.

<span class="mw-page-title-main">Black–Scholes equation</span> Partial differential equation in mathematical finance

In mathematical finance, the Black–Scholes equation, also called the Black–Scholes–Merton equation, is a partial differential equation (PDE) governing the price evolution of derivatives under the Black–Scholes model. Broadly speaking, the term may refer to a similar PDE that can be derived for a variety of options, or more generally, derivatives.

Stochastic mechanics is a framework for describing the dynamics of particles that are subjected to an intrinsic random processes as well as various external forces. The framework provides a derivation of the diffusion equations associated to these stochastic particles. It is best known for its derivation of the Schrödinger equation as the Kolmogorov equation for a certain type of conservative diffusion, and for this purpose it is also referred to as stochastic quantum mechanics.

Mean-field game theory is the study of strategic decision making by small interacting agents in very large populations. It lies at the intersection of game theory with stochastic analysis and control theory. The use of the term "mean field" is inspired by mean-field theory in physics, which considers the behavior of systems of large numbers of particles where individual particles have negligible impacts upon the system. In other words, each agent acts according to his minimization or maximization problem taking into account other agents’ decisions and because their population is large we can assume the number of agents goes to infinity and a representative agent exists.

References

  1. Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Englewood Cliffs, NJ: Prentice-Hall. pp. 86–90. ISBN   0-13-638098-0.
  2. Yong, Jiongmin; Zhou, Xun Yu (1999). "Dynamic Programming and HJB Equations". Stochastic Controls : Hamiltonian Systems and HJB Equations. Springer. pp. 157–215 [p. 163]. ISBN   0-387-98723-1.
  3. Naidu, Desineni S. (2003). "The Hamilton–Jacobi–Bellman Equation". Optimal Control Systems. Boca Raton: CRC Press. pp. 277–283 [p. 280]. ISBN   0-8493-0892-5.
  4. Bellman, R. E. (1954). "Dynamic Programming and a new formalism in the calculus of variations". Proc. Natl. Acad. Sci. 40 (4): 231–235. Bibcode:1954PNAS...40..231B. doi: 10.1073/pnas.40.4.231 . PMC   527981 . PMID   16589462.
  5. Bellman, R. E. (1957). Dynamic Programming. Princeton, NJ: Princeton University Press.
  6. Bellman, R.; Dreyfus, S. (1959). "An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories". J. Br. Interplanet. Soc. 17: 78–83.
  7. Kálmán, Rudolf E. (1963). "The Theory of Optimal Control and the Calculus of Variations". In Bellman, Richard (ed.). Mathematical Optimization Techniques. Berkeley: University of California Press. pp. 309–331. OCLC   1033974.
  8. Kemajou-Brown, Isabelle (2016). "Brief History of Optimal Control Theory and Some Recent Developments". In Budzban, Gregory; Hughes, Harry Randolph; Schurz, Henri (eds.). Probability on Algebraic and Geometric Structures. Contemporary Mathematics. Vol. 668. pp. 119–130. doi:10.1090/conm/668/13400. ISBN   9781470419455.
  9. Chang, Fwu-Ranq (2004). Stochastic Optimization in Continuous Time. Cambridge, UK: Cambridge University Press. pp. 113–168. ISBN   0-521-83406-6.
  10. Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Boston: Birkhäuser. ISBN   0-8176-3640-4.
  11. Lewis, Frank L.; Vrabie, Draguna; Syrmos, Vassilis L. (2012). Optimal Control (3rd ed.). Wiley. p. 278. ISBN   978-0-470-63349-6.
  12. Bertsekas, Dimitri P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.
  13. Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkhäuser. ISBN   0-8176-3640-4.
  14. Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-dynamic Programming. Athena Scientific. ISBN   978-1-886529-10-6.
  15. Abu-Khalaf, Murad; Lewis, Frank L. (2005). "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach". Automatica. 41 (5): 779–791. doi:10.1016/j.automatica.2004.11.034. S2CID   14757582.
  16. Al-Tamimi, Asma; Lewis, Frank L.; Abu-Khalaf, Murad (2008). "Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 38 (4): 943–949. doi:10.1109/TSMCB.2008.926614. PMID   18632382. S2CID   14202785.
  17. Jones, Morgan; Peet, Matthew (2020). "Polynomial Approximation of Value Functions and Nonlinear Controller Design with Performance Bounds". arXiv: 2010.06828 [math.OC].

Further reading