Value function

Last updated August 01, 2023

The value function of an optimization problem gives the value attained by the objective function at a solution, while only depending on the parameters of the problem.^[1]^[2] In a controlled dynamical system, the value function represents the optimal payoff of the system over the interval [t, t₁] when started at the time-t state variable x(t)=x.^[3] If the objective function represents some cost that is to be minimized, the value function can be interpreted as the cost to finish the optimal program, and is thus referred to as "cost-to-go function."^[4]^[5] In an economic context, where the objective function usually represents utility, the value function is conceptually equivalent to the indirect utility function.^[6]^[7]

In a problem of optimal control, the value function is defined as the supremum of the objective function taken over the set of admissible controls. Given $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ , a typical optimal control problem is to

{\text{maximize}}\quad J(t_{0},x_{0};u)=\int _{t_{0}}^{t_{1}}I(t,x(t),u(t))\,\mathrm {d} t+\phi (x(t_{1}))

subject to

{\frac {\mathrm {d} x(t)}{\mathrm {d} t}}=f(t,x(t),u(t))

with initial state variable $x(t_{0})=x_{0}$ .^[8] The objective function $J(t_{0},x_{0};u)$ is to be maximized over all admissible controls $u\in U[t_{0},t_{1}]$ , where $u$ is a Lebesgue measurable function from $[t_{0},t_{1}]$ to some prescribed arbitrary set in $\mathbb {R} ^{m}$ . The value function is then defined as

$V(t,x(t))=\max _{u\in U}\int _{t}^{t_{1}}I(\tau ,x(\tau ),u(\tau ))\,\mathrm {d} \tau +\phi (x(t_{1}))$

with $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ , where $\phi (x(t_{1}))$ is the "scrap value". If the optimal pair of control and state trajectories is $(x^{\ast },u^{\ast })$ , then $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ . The function $h$ that gives the optimal control $u^{\ast }$ based on the current state $x$ is called a feedback control policy,^[4] or simply a policy function.^[9]

Bellman's principle of optimality roughly states that any optimal policy at time $t$ , $t_{0}\leq t\leq t_{1}$ taking the current state $x(t)$ as "new" initial condition must be optimal for the remaining problem. If the value function happens to be continuously differentiable,^[10] this gives rise to an important partial differential equation known as Hamilton–Jacobi–Bellman equation,

-{\frac {\partial V(t,x)}{\partial t}}=\max _{u}\left\{I(t,x,u)+{\frac {\partial V(t,x)}{\partial x}}f(t,x,u)\right\}

where the maximand on the right-hand side can also be re-written as the Hamiltonian, $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda (t)f(t,x,u)$ , as

-{\frac {\partial V(t,x)}{\partial t}}=\max _{u}H(t,x,u,\lambda )

with $\partial V(t,x)/\partial x=\lambda (t)$ playing the role of the costate variables.^[11] Given this definition, we further have $\mathrm {d} \lambda (t)/\mathrm {d} t=\partial ^{2}V(t,x)/\partial x\partial t+\partial ^{2}V(t,x)/\partial x^{2}\cdot f(x)$ , and after differentiating both sides of the HJB equation with respect to $x$ ,

-{\frac {\partial ^{2}V(t,x)}{\partial t\partial x}}={\frac {\partial I}{\partial x}}+{\frac {\partial ^{2}V(t,x)}{\partial x^{2}}}f(x)+{\frac {\partial V(t,x)}{\partial x}}{\frac {\partial f(x)}{\partial x}}

which after replacing the appropriate terms recovers the costate equation

-{\dot {\lambda }}(t)=\underbrace {{\frac {\partial I}{\partial x}}+\lambda (t){\frac {\partial f(x)}{\partial x}}} _{={\frac {\partial H}{\partial x}}}

where ${\dot {\lambda }}(t)$ is Newton notation for the derivative with respect to time.^[12]

The value function is the unique viscosity solution to the Hamilton–Jacobi–Bellman equation.^[13] In an online closed-loop approximate optimal control, the value function is also a Lyapunov function that establishes global asymptotic stability of the closed-loop system.^[14]

Related Research Articles

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints. It is named after the mathematician Joseph-Louis Lagrange. The basic idea is to convert a constrained problem into a form such that the derivative test of an unconstrained problem can still be applied. The relationship between the gradient of the function and gradients of the constraints rather naturally leads to a reformulation of the original problem, known as the Lagrangian function.

In physics, a Langevin equation is a stochastic differential equation describing how a system evolves when subjected to a combination of deterministic and fluctuating ("random") forces. The dependent variables in a Langevin equation typically are collective (macroscopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. One application is to Brownian motion, which models the fluctuating motion of a small particle in a fluid.

Optimal control theory is a branch of mathematical optimization that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the moon with minimum fuel expenditure. Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory.

In mathematics and its applications, classical Sturm–Liouville theory is the theory of real second-order linear ordinary differential equations of the form:

In mathematics, a Dirichlet problem is the problem of finding a function which solves a specified partial differential equation (PDE) in the interior of a given region that takes prescribed values on the boundary of the region.

The Hamilton-Jacobi-Bellman (HJB) equation is a nonlinear partial differential equation that provides necessary and sufficient conditions for optimality of a control with respect to a loss function. Its solution is the value function of the optimal control problem which, once known, can be used to obtain the optimal control by taking the maximizer of the Hamiltonian involved in the HJB equation.

Pontryagin's maximum principle is used in optimal control theory to find the best possible control for taking a dynamical system from one state to another, especially in the presence of constraints for the state or input controls. It states that it is necessary for any optimal control along with the optimal state trajectory to solve the so-called Hamiltonian system, which is a two-point boundary value problem, plus a maximum condition of the control Hamiltonian. These necessary conditions become sufficient under certain convexity conditions on the objective and constraint functions.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

In system analysis, among other fields of study, a linear time-invariant (LTI) system is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined below. These properties apply (exactly or approximately) to many important physical systems, in which case the response $y (t)$ of the system to an arbitrary input $x (t)$ can be found directly using convolution: $y (t) = (x * h)(t)$ where $h (t)$ is called the system's impulse response and ∗ represents convolution (not to be confused with multiplication). What's more, there are systematic methods for solving any such system (determining $h (t)$ ), whereas systems not meeting both properties are generally more difficult (or impossible) to solve analytically. A good example of an LTI system is any electrical circuit consisting of resistors, capacitors, inductors and linear amplifiers.

In quantum mechanics, the Hellmann–Feynman theorem relates the derivative of the total energy with respect to a parameter to the expectation value of the derivative of the Hamiltonian with respect to that same parameter. According to the theorem, once the spatial distribution of the electrons has been determined by solving the Schrödinger equation, all the forces in the system can be calculated using classical electrostatics.

In general relativity, a geodesic generalizes the notion of a "straight line" to curved spacetime. Importantly, the world line of a particle free from all external, non-gravitational forces is a particular type of geodesic. In other words, a freely moving or falling particle always moves along a geodesic.

In mathematics, delay differential equations (DDEs) are a type of differential equation in which the derivative of the unknown function at a certain time is given in terms of the values of the function at previous times. DDEs are also called time-delay systems, systems with aftereffect or dead-time, hereditary systems, equations with deviating argument, or differential-difference equations. They belong to the class of systems with the functional state, i.e. partial differential equations (PDEs) which are infinite dimensional, as opposed to ordinary differential equations (ODEs) having a finite dimensional state vector. Four points may give a possible explanation of the popularity of DDEs:

Aftereffect is an applied problem: it is well known that, together with the increasing expectations of dynamic performances, engineers need their models to behave more like the real process. Many processes include aftereffect phenomena in their inner dynamics. In addition, actuators, sensors, and communication networks that are now involved in feedback control loops introduce such delays. Finally, besides actual delays, time lags are frequently used to simplify very high order models. Then, the interest for DDEs keeps on growing in all scientific areas and, especially, in control engineering.
Delay systems are still resistant to many classical controllers: one could think that the simplest approach would consist in replacing them by some finite-dimensional approximations. Unfortunately, ignoring effects which are adequately represented by DDEs is not a general alternative: in the best situation, it leads to the same degree of complexity in the control design. In worst cases, it is potentially disastrous in terms of stability and oscillations.
Voluntary introduction of delays can benefit the control system.
In spite of their complexity, DDEs often appear as simple infinite-dimensional models in the very complex area of partial differential equations (PDEs).

Linear dynamical systems are dynamical systems whose evolution functions are linear. While dynamical systems, in general, do not have closed-form solutions, linear dynamical systems can be solved exactly, and they have a rich set of mathematical properties. Linear systems can also be used to understand the qualitative behavior of general dynamical systems, by calculating the equilibrium points of the system and approximating it as a linear system around each such point.

The Hamiltonian is a function used to solve a problem of optimal control for a dynamical system. It can be understood as an instantaneous increment of the Lagrangian expression of the problem that is to be optimized over a certain time period. Inspired by, but distinct from, the Hamiltonian of classical mechanics, the Hamiltonian of optimal control theory was developed by Lev Pontryagin as part of his maximum principle. Pontryagin proved that a necessary condition for solving the optimal control problem is that the control should be chosen so as to optimize the Hamiltonian.

The Poisson–Boltzmann equation is a useful equation in many settings, whether it be to understand physiological interfaces, polymer science, electron interactions in a semiconductor, or more. It aims to describe the distribution of the electric potential in solution in the direction normal to a charged surface. This distribution is important to determine how the electrostatic interactions will affect the molecules in solution. The Poisson–Boltzmann equation is derived via mean-field assumptions. From the Poisson–Boltzmann equation many other equations have been derived with a number of different assumptions.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

Chapman–Enskog theory provides a framework in which equations of hydrodynamics for a gas can be derived from the Boltzmann equation. The technique justifies the otherwise phenomenological constitutive relations appearing in hydrodynamical descriptions such as the Navier–Stokes equations. In doing so, expressions for various transport coefficients such as thermal conductivity and viscosity are obtained in terms of molecular parameters. Thus, Chapman–Enskog theory constitutes an important step in the passage from a microscopic, particle-based description to a continuum hydrodynamical one.

In mathematics, the modular lambda function λ(τ) is a highly symmetric Holomorphic function on the complex upper half-plane. It is invariant under the fractional linear action of the congruence group Γ(2), and generates the function field of the corresponding quotient, i.e., it is a Hauptmodul for the modular curve X(2). Over any point τ, its value can be described as a cross ratio of the branch points of a ramified double cover of the projective line by the elliptic curve $, where the map is defined as the quotient by the [-1] involution.$

In differential geometry, a fibered manifold is surjective submersion of smooth manifolds $Y \to X$ . Locally trivial fibered manifolds are fiber bundles. Therefore, a notion of connection on fibered manifolds provides a general framework of a connection on fiber bundles.

Tau functions are an important ingredient in the modern mathematical theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by Ryogo Hirota in his direct method approach to soliton equations, based on expressing them in an equivalent bilinear form.

References

↑ Fleming, Wendell H.; Rishel, Raymond W. (1975). Deterministic and Stochastic Optimal Control. New York: Springer. pp. 81–83. ISBN 0-387-90155-8.
↑ Caputo, Michael R. (2005). Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications. New York: Cambridge University Press. p. 185. ISBN 0-521-60368-4.
↑ Weber, Thomas A. (2011). Optimal Control Theory : with Applications in Economics. Cambridge: The MIT Press. p. 82. ISBN 978-0-262-01573-8.
1 2 Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-Dynamic Programming. Belmont: Athena Scientific. p. 2. ISBN 1-886529-10-8.
↑ "EE365: Dynamic Programming" (PDF).
↑ Mas-Colell, Andreu; Whinston, Michael D.; Green, Jerry R. (1995). Microeconomic Theory. New York: Oxford University Press. p. 964. ISBN 0-19-507340-1.
↑ Corbae, Dean; Stinchcombe, Maxwell B.; Zeman, Juraj (2009). An Introduction to Mathematical Analysis for Economic Theory and Econometrics. Princeton University Press. p. 145. ISBN 978-0-691-11867-3.
↑ Kamien, Morton I.; Schwartz, Nancy L. (1991). Dynamic Optimization : The Calculus of Variations and Optimal Control in Economics and Management (2nd ed.). Amsterdam: North-Holland. p. 259. ISBN 0-444-01609-0.
↑ Ljungqvist, Lars; Sargent, Thomas J. (2018). Recursive Macroeconomic Theory (Fourth ed.). Cambridge: MIT Press. p. 106. ISBN 978-0-262-03866-9.
↑ Benveniste and Scheinkman established sufficient conditions for the differentiability of the value function, which in turn allows an application of the envelope theorem, see Benveniste, L. M.; Scheinkman, J. A. (1979). "On the Differentiability of the Value Function in Dynamic Models of Economics". Econometrica. 47 (3): 727–732. doi:10.2307/1910417. JSTOR 1910417. Also see Seierstad, Atle (1982). "Differentiability Properties of the Optimal Value Function in Control Theory". Journal of Economic Dynamics and Control. 4: 303–310. doi:10.1016/0165-1889(82)90019-7.
↑ Kirk, Donald E. (1970). Optimal Control Theory. Englewood Cliffs, NJ: Prentice-Hall. p. 88. ISBN 0-13-638098-0.
↑ Zhou, X. Y. (1990). "Maximum Principle, Dynamic Programming, and their Connection in Deterministic Control". Journal of Optimization Theory and Applications. 65 (2): 363–373. doi:10.1007/BF01102352. S2CID 122333807.
↑ Theorem 10.1 in Bressan, Alberto (2019). "Viscosity Solutions of Hamilton-Jacobi Equations and Optimal Control Problems" (PDF). Lecture Notes.
↑ Kamalapurkar, Rushikesh; Walters, Patrick; Rosenfeld, Joel; Dixon, Warren (2018). "Optimal Control and Lyapunov Stability". Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Berlin: Springer. pp. 26–27. ISBN 978-3-319-78383-3.

Value function

Related Research Articles

References

Further reading