In mathematics, the Legendre transformation (or Legendre transform), first introduced by Adrien-Marie Legendre in 1787 when studying the minimal surface problem, [1] is an involutive transformation on real-valued functions that are convex on a real variable. Specifically, if a real-valued multivariable function is convex on one of its independent real variables, then the Legendre transform with respect to this variable is applicable to the function.
In physical problems, the Legendre transform is used to convert functions of one quantity (such as position, pressure, or temperature) into functions of the conjugate quantity (momentum, volume, and entropy, respectively). In this way, it is commonly used in classical mechanics to derive the Hamiltonian formalism out of the Lagrangian formalism (or vice versa) and in thermodynamics to derive the thermodynamic potentials, as well as in the solution of differential equations of several variables.
For sufficiently smooth functions on the real line, the Legendre transform of a function can be specified, up to an additive constant, by the condition that the functions' first derivatives are inverse functions of each other. This can be expressed in Euler's derivative notation as where is an operator of differentiation, represents an argument or input to the associated function, is an inverse function such that , or equivalently, as and in Lagrange's notation.
The generalization of the Legendre transformation to affine spaces and non-convex functions is known as the convex conjugate (also called the Legendre–Fenchel transformation), which can be used to construct a function's convex hull.
Let be an interval, and a convex function; then the Legendre transformof is the function defined by where denotes the supremum over , e.g., in is chosen such that is maximized at each , or is such that as a bounded value throughout exists (e.g., when is a linear function).
The function is called the convex conjugate function of . For historical reasons (rooted in analytic mechanics), the conjugate variable is often denoted , instead of . If the convex function is defined on the whole line and is everywhere differentiable, then can be interpreted as the negative of the -intercept of the tangent line to the graph of that has slope .
The generalization to convex functions on a convex set is straightforward: has domain and is defined by where denotes the dot product of and .
The Legendre transformation is an application of the duality relationship between points and lines. The functional relationship specified by can be represented equally well as a set of points, or as a set of tangent lines specified by their slope and intercept values.
For a differentiable convex function on the real line with the first derivative and its inverse , the Legendre transform of , , can be specified, up to an additive constant, by the condition that the functions' first derivatives are inverse functions of each other, i.e., and .
To see this, first note that if as a convex function on the real line is differentiable and is a critical point of the function of , then the supremum is achieved at (by convexity, see the first figure in this Wikipedia page). Therefore, the Legendre transform of is .
Then, suppose that the first derivative is invertible and let the inverse be . Then for each , the point is the unique critical point of the function (i.e., ) because and the function's first derivative with respect to at is . Hence we have for each . By differentiating with respect to , we find Since this simplifies to . In other words, and are inverses to each other.
In general, if as the inverse of then so integration gives with a constant
In practical terms, given the parametric plot of versus amounts to the graph of versus
In some cases (e.g. thermodynamic potentials, below), a non-standard requirement is used, amounting to an alternative definition of f * with a minus sign,
In analytical mechanics and thermodynamics, Legendre transformation is usually defined as follows: suppose is a function of ; then we have
Performing the Legendre transformation on this function means that we take as the independent variable, so that the above expression can be written as
and according to Leibniz's rule we then have
and taking we have which means
When is a function of variables , then we can perform the Legendre transformation on each one or several variables: we have
where Then if we want to perform the Legendre transformation on, e.g. , then we take together with as independent variables, and with Leibniz's rule we have
So for the function we have
We can also do this transformation for variables . If we do it to all the variables, then we have
In analytical mechanics, people perform this transformation on variables of the Lagrangian to get the Hamiltonian:
In thermodynamics, people perform this transformation on variables according to the type of thermodynamic system they want; for example, starting from the cardinal function of state, the internal energy , we have
so we can perform the Legendre transformation on either or both of to yield
and each of these three expressions has a physical meaning.
This definition of the Legendre transformation is the one originally introduced by Legendre in his work in 1787, [1] and is still applied by physicists nowadays. Indeed, this definition can be mathematically rigorous if we treat all the variables and functions defined above: for example, as differentiable functions defined on an open set of or on a differentiable manifold, and their differentials (which are treated as cotangent vector field in the context of differentiable manifold). This definition is equivalent to the modern mathematicians' definition as long as is differentiable and convex for the variables
As shown above, for a convex function , with maximizing or making bounded at each to define the Legendre transform and with , the following identities hold.
Consider the exponential function which has the domain . From the definition, the Legendre transform is where remains to be determined. To evaluate the supremum, compute the derivative of with respect to and set equal to zero: The second derivative is negative everywhere, so the maximal value is achieved at . Thus, the Legendre transform is and has domain This illustrates that the domains of a function and its Legendre transform can be different.
To find the Legendre transformation of the Legendre transformation of , where a variable is intentionally used as the argument of the function to show the involution property of the Legendre transform as . we compute thus the maximum occurs at because the second derivative over the domain of as As a result, is found as thereby confirming that as expected.
Let f(x) = cx2 defined on R, where c > 0 is a fixed constant.
For x* fixed, the function of x, x*x − f(x) = x*x − cx2 has the first derivative x* − 2cx and second derivative −2c; there is one stationary point at x = x*/2c, which is always a maximum.
Thus, I* = R and
The first derivatives of f, 2cx, and of f *, x*/(2c), are inverse functions to each other. Clearly, furthermore, namely f ** = f.
Let f(x) = x2 for x ∈ (I = [2, 3]).
For x* fixed, x*x − f(x) is continuous on I compact, hence it always takes a finite maximum on it; it follows that the domain of the Legendre transform of is I* = R.
The stationary point at x = x*/2 (found by setting that the first derivative of x*x − f(x) with respect to equal to zero) is in the domain [2, 3] if and only if 4 ≤ x* ≤ 6. Otherwise the maximum is taken either at x = 2 or x = 3 because the second derivative of x*x − f(x) with respect to is negative as ; for a part of the domain the maximum that x*x − f(x) can take with respect to is obtained at while for it becomes the maximum at . Thus, it follows that
The function f(x) = cx is convex, for every x (strict convexity is not required for the Legendre transformation to be well defined). Clearly x*x − f(x) = (x* − c)x is never bounded from above as a function of x, unless x* − c = 0. Hence f* is defined on I* = {c} and f*(c) = 0. (The definition of the Legendre transform requires the existence of the supremum, that requires upper bounds.)
One may check involutivity: of course, x*x − f*(x*) is always bounded as a function of x*∈{c}, hence I** = R. Then, for all x one has and hence f **(x) = cx = f(x).
As an example of a convex continuous function that is not everywhere differentiable, consider . This givesand thus on its domain .
Let be defined on X = Rn, where A is a real, positive definite matrix.
Then f is convex, and has gradient p − 2Ax and Hessian −2A, which is negative; hence the stationary point x = A−1p/2 is a maximum.
We have X* = Rn, and
The Legendre transform is linked to integration by parts, p dx = d(px) − x dp.
Let f(x,y) be a function of two independent variables x and y, with the differential
Assume that the function f is convex in x for all y, so that one may perform the Legendre transform on f in x, with p the variable conjugate to x (for information, there is a relation where is a point in x maximizing or making bounded for given p and y). Since the new independent variable of the transform with respect to f is p, the differentials dx and dy in df devolve to dp and dy in the differential of the transform, i.e., we build another function with its differential expressed in terms of the new basis dp and dy.
We thus consider the function g(p, y) = f − px so that
The function −g(p, y) is the Legendre transform of f(x, y), where only the independent variable x has been supplanted by p. This is widely used in thermodynamics, as illustrated below.
A Legendre transform is used in classical mechanics to derive the Hamiltonian formulation from the Lagrangian formulation, and conversely. A typical Lagrangian has the form
where are coordinates on Rn × Rn, M is a positive definite real matrix, and
For every q fixed, is a convex function of , while plays the role of a constant.
Hence the Legendre transform of as a function of is the Hamiltonian function,
In a more general setting, are local coordinates on the tangent bundle of a manifold . For each q, is a convex function of the tangent space Vq. The Legendre transform gives the Hamiltonian as a function of the coordinates (p, q) of the cotangent bundle ; the inner product used to define the Legendre transform is inherited from the pertinent canonical symplectic structure. In this abstract setting, the Legendre transformation corresponds to the tautological one-form.[ further explanation needed ]
The strategy behind the use of Legendre transforms in thermodynamics is to shift from a function that depends on a variable to a new (conjugate) function that depends on a new variable, the conjugate of the original one. The new variable is the partial derivative of the original function with respect to the original variable. The new function is the difference between the original function and the product of the old and new variables. Typically, this transformation is useful because it shifts the dependence of, e.g., the energy from an extensive variable to its conjugate intensive variable, which can often be controlled more easily in a physical experiment.
For example, the internal energy U is an explicit function of the extensive variables entropy S, volume V, and chemical composition Ni (e.g., ) which has a total differential
where .
(Subscripts are not necessary by the definition of partial derivatives but left here for clarifying variables.) Stipulating some common reference state, by using the (non-standard) Legendre transform of the internal energy U with respect to volume V, the enthalpy H may be obtained as the following.
To get the (standard) Legendre transform of the internal energy U with respect to volume V, the function is defined first, then it shall be maximized or bounded by V. To do this, the condition needs to be satisfied, so is obtained. This approach is justified because U is a linear function with respect to V (so a convex function on V) by the definition of extensive variables. The non-standard Legendre transform here is obtained by negating the standard version, so .
H is definitely a state function as it is obtained by adding PV (P and V as state variables) to a state function , so its differential is an exact differential. Because of and the fact that it must be an exact differential, .
The enthalpy is suitable for description of processes in which the pressure is controlled from the surroundings.
It is likewise possible to shift the dependence of the energy from the extensive variable of entropy, S, to the (often more convenient) intensive variable T, resulting in the Helmholtz and Gibbs free energies. The Helmholtz free energy A, and Gibbs energy G, are obtained by performing Legendre transforms of the internal energy and enthalpy, respectively,
The Helmholtz free energy is often the most useful thermodynamic potential when temperature and volume are controlled from the surroundings, while the Gibbs energy is often the most useful when temperature and pressure are controlled from the surroundings.
As another example from physics, consider a parallel conductive plate capacitor, in which the plates can move relative to one another. Such a capacitor would allow transfer of the electric energy which is stored in the capacitor into external mechanical work, done by the force acting on the plates. One may think of the electric charge as analogous to the "charge" of a gas in a cylinder, with the resulting mechanical force exerted on a piston.
Compute the force on the plates as a function of x, the distance which separates them. To find the force, compute the potential energy, and then apply the definition of force as the gradient of the potential energy function.
The electrostatic potential energy stored in a capacitor of the capacitance C(x) and a positive electric charge +Q or negative charge -Q on each conductive plate is (with using the definition of the capacitance as ),
where the dependence on the area of the plates, the dielectric constant of the insulation material between the plates, and the separation x are abstracted away as the capacitance C(x). (For a parallel plate capacitor, this is proportional to the area of the plates and inversely proportional to the separation.)
The force F between the plates due to the electric field created by the charge separation is then
If the capacitor is not connected to any electric circuit, then the electric charges on the plates remain constant and the voltage varies when the plates move with respect to each other, and the force is the negative gradient of the electrostatic potential energy as
where as the charge is fixed in this configuration.
However, instead, suppose that the voltage between the plates V is maintained constant as the plate moves by connection to a battery, which is a reservoir for electric charges at a constant potential difference. Then the amount of chargesis a variable instead of the voltage; and are the Legendre conjugate to each other. To find the force, first compute the non-standard Legendre transform with respect to (also with using ),
This transformation is possible because is now a linear function of so is convex on it. The force now becomes the negative gradient of this Legendre transform, resulting in the same force obtained from the original function ,
The two conjugate energies and happen to stand opposite to each other (their signs are opposite), only because of the linearity of the capacitance—except now Q is no longer a constant. They reflect the two different pathways of storing energy into the capacitor, resulting in, for instance, the same "pull" between a capacitor's plates.
In large deviations theory, the rate function is defined as the Legendre transformation of the logarithm of the moment generating function of a random variable. An important application of the rate function is in the calculation of tail probabilities of sums of i.i.d. random variables, in particular in Cramér's theorem.
If are i.i.d. random variables, let be the associated random walk and the moment generating function of . For , . Hence, by Markov's inequality, one has for and where . Since the left-hand side is independent of , we may take the infimum of the right-hand side, which leads one to consider the supremum of , i.e., the Legendre transform of , evaluated at .
Legendre transformation arises naturally in microeconomics in the process of finding the supply S(P) of some product given a fixed price P on the market knowing the cost function C(Q), i.e. the cost for the producer to make/mine/etc. Q units of the given product.
A simple theory explains the shape of the supply curve based solely on the cost function. Let us suppose the market price for a one unit of our product is P. For a company selling this good, the best strategy is to adjust the production Q so that its profit is maximized. We can maximize the profit by differentiating with respect to Q and solving
Qopt represents the optimal quantity Q of goods that the producer is willing to supply, which is indeed the supply itself:
If we consider the maximal profit as a function of price, , we see that it is the Legendre transform of the cost function .
For a strictly convex function, the Legendre transformation can be interpreted as a mapping between the graph of the function and the family of tangents of the graph. (For a function of one variable, the tangents are well-defined at all but at most countably many points, since a convex function is differentiable at all but at most countably many points.)
The equation of a line with slope and -intercept is given by . For this line to be tangent to the graph of a function at the point requires and
Being the derivative of a strictly convex function, the function is strictly monotone and thus injective. The second equation can be solved for allowing elimination of from the first, and solving for the -intercept of the tangent as a function of its slope where denotes the Legendre transform of
The family of tangent lines of the graph of parameterized by the slope is therefore given by or, written implicitly, by the solutions of the equation
The graph of the original function can be reconstructed from this family of lines as the envelope of this family by demanding
Eliminating from these two equations gives
Identifying with and recognizing the right side of the preceding equation as the Legendre transform of yield
For a differentiable real-valued function on an open convex subset U of Rn the Legendre conjugate of the pair (U, f) is defined to be the pair (V, g), where V is the image of U under the gradient mapping Df, and g is the function on V given by the formula where
is the scalar product on Rn. The multidimensional transform can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes. [2] This can be seen as consequence of the following two observations. On the one hand, the hyperplane tangent to the epigraph of at some point has normal vector . On the other hand, any closed convex set can be characterized via the set of its supporting hyperplanes by the equations , where is the support function of . But the definition of Legendre transform via the maximization matches precisely that of the support function, that is, . We thus conclude that the Legendre transform characterizes the epigraph in the sense that the tangent plane to the epigraph at any point is given explicitly by
Alternatively, if X is a vector space and Y is its dual vector space, then for each point x of X and y of Y, there is a natural identification of the cotangent spaces T*Xx with Y and T*Yy with X. If f is a real differentiable function over X, then its exterior derivative, df, is a section of the cotangent bundle T*X and as such, we can construct a map from X to Y. Similarly, if g is a real differentiable function over Y, then dg defines a map from Y to X. If both maps happen to be inverses of each other, we say we have a Legendre transform. The notion of the tautological one-form is commonly used in this setting.
When the function is not differentiable, the Legendre transform can still be extended, and is known as the Legendre-Fenchel transformation. In this more general setting, a few properties are lost: for example, the Legendre transform is no longer its own inverse (unless there are extra assumptions, like convexity).
Let be a smooth manifold, let and be a vector bundle on and its associated bundle projection, respectively. Let be a smooth function. We think of as a Lagrangian by analogy with the classical case where , and for some positive number and function .
As usual, the dual of is denote by . The fiber of over is denoted , and the restriction of to is denoted by . The Legendre transformation of is the smooth morphism defined by , where . Here we use the fact that since is a vector space, can be identified with . In other words, is the covector that sends to the directional derivative .
To describe the Legendre transformation locally, let be a coordinate chart over which is trivial. Picking a trivialization of over , we obtain charts and . In terms of these charts, we have , where for all . If, as in the classical case, the restriction of to each fiber is strictly convex and bounded below by a positive definite quadratic form minus a constant, then the Legendre transform is a diffeomorphism. [3] Suppose that is a diffeomorphism and let be the "Hamiltonian" function defined by where . Using the natural isomorphism , we may view the Legendre transformation of as a map . Then we have [3]
The Legendre transformation has the following scaling properties: For a > 0,
It follows that if a function is homogeneous of degree r then its image under the Legendre transformation is a homogeneous function of degree s, where 1/r + 1/s = 1. (Since f(x) = xr/r, with r > 1, implies f*(p) = ps/s.) Thus, the only monomial whose degree is invariant under Legendre transform is the quadratic.
Let A : Rn → Rm be a linear transformation. For any convex function f on Rn, one has where A* is the adjoint operator of A defined by and Af is the push-forward of f along A
A closed convex function f is symmetric with respect to a given set G of orthogonal linear transformations, if and only if f* is symmetric with respect to G.
The infimal convolution of two functions f and g is defined as
Let f1, ..., fm be proper convex functions on Rn. Then
For any function f and its convex conjugate f *Fenchel's inequality (also known as the Fenchel–Young inequality) holds for every x ∈ X and p ∈ X*, i.e., independentx, p pairs,
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:
In differential geometry, a subject of mathematics, a symplectic manifold is a smooth manifold, , equipped with a closed nondegenerate differential 2-form , called the symplectic form. The study of symplectic manifolds is called symplectic geometry or symplectic topology. Symplectic manifolds arise naturally in abstract formulations of classical mechanics and analytical mechanics as the cotangent bundles of manifolds. For example, in the Hamiltonian formulation of classical mechanics, which provides one of the major motivations for the field, the set of all possible configurations of a system is modeled as a manifold, and this manifold's cotangent bundle describes the phase space of the system.
The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances. They were named after French engineer and physicist Claude-Louis Navier and the Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).
A Fourier series is an expansion of a periodic function into a sum of trigonometric functions. The Fourier series is an example of a trigonometric series, but not all trigonometric series are Fourier series. By expressing a function as a sum of sines and cosines, many problems involving the function become easier to analyze because trigonometric functions are well understood. For example, Fourier series were first used by Joseph Fourier to find solutions to the heat equation. This application is possible because the derivatives of trigonometric functions fall into simple patterns. Fourier series cannot be used to approximate arbitrary functions, because most functions have infinitely many terms in their Fourier series, and the series do not always converge. Well-behaved functions, for example smooth functions, have Fourier series that converge to the original function. The coefficients of the Fourier series are determined by integrals of the function multiplied by trigonometric functions, described in Common forms of the Fourier series below.
A mathematical symbol is a figure or a combination of figures that is used to represent a mathematical object, an action on mathematical objects, a relation between mathematical objects, or for structuring the other symbols that occur in a formula. As formulas are entirely constituted with symbols of various types, many symbols are needed for expressing all mathematics.
In vector calculus, the divergence theorem, also known as Gauss's theorem or Ostrogradsky's theorem, is a theorem relating the flux of a vector field through a closed surface to the divergence of the field in the volume enclosed.
In geometry, a normal is an object that is perpendicular to a given object. For example, the normal line to a plane curve at a given point is the line perpendicular to the tangent line to the curve at the point.
In the mathematical field of differential geometry, a metric tensor is an additional structure on a manifold M that allows defining distances and angles, just as the inner product on a Euclidean space allows defining distances and angles there. More precisely, a metric tensor at a point p of M is a bilinear form defined on the tangent space at p, and a metric field on M consists of a metric tensor at each point p of M that varies smoothly with p.
In physics, Hamiltonian mechanics is a reformulation of Lagrangian mechanics that emerged in 1833. Introduced by Sir William Rowan Hamilton, Hamiltonian mechanics replaces (generalized) velocities used in Lagrangian mechanics with (generalized) momenta. Both theories provide interpretations of classical mechanics and describe the same physical phenomena.
In vector calculus, Green's theorem relates a line integral around a simple closed curve C to a double integral over the plane region D bounded by C. It is the two-dimensional special case of Stokes' theorem. In one dimension, it is equivalent to the fundamental theorem of calculus. In three dimensions, it is equivalent to the divergence theorem.
In vector calculus, a conservative vector field is a vector field that is the gradient of some function. A conservative vector field has the property that its line integral is path independent; the choice of path between two points does not change the value of the line integral. Path independence of the line integral is equivalent to the vector field under the line integral being conservative. A conservative vector field is also irrotational; in three dimensions, this means that it has vanishing curl. An irrotational vector field is necessarily conservative provided that the domain is simply connected.
In Hamiltonian mechanics, a canonical transformation is a change of canonical coordinates (q, p) → that preserves the form of Hamilton's equations. This is sometimes known as form invariance. Although Hamilton's equations are preserved, it need not preserve the explicit form of the Hamiltonian itself. Canonical transformations are useful in their own right, and also form the basis for the Hamilton–Jacobi equations and Liouville's theorem.
In multivariable calculus, the implicit function theorem is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.
In mathematics, a bump function is a function on a Euclidean space which is both smooth and compactly supported. The set of all bump functions with domain forms a vector space, denoted or The dual space of this space endowed with a suitable topology is the space of distributions.
In mathematics, the Riesz–Thorin theorem, often referred to as the Riesz–Thorin interpolation theorem or the Riesz–Thorin convexity theorem, is a result about interpolation of operators. It is named after Marcel Riesz and his student G. Olof Thorin.
In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is zero only at the origin. In particular, the Euclidean distance in a Euclidean space is defined by a norm on the associated Euclidean vector space, called the Euclidean norm, the 2-norm, or, sometimes, the magnitude or length of the vector. This norm can be defined as the square root of the inner product of a vector with itself.
In mathematics, the Hankel transform expresses any given function f(r) as the weighted sum of an infinite number of Bessel functions of the first kind Jν(kr). The Bessel functions in the sum are all of the same order ν, but differ in a scaling factor k along the r axis. The necessary coefficient Fν of each Bessel function in the sum, as a function of the scaling factor k constitutes the transformed function. The Hankel transform is an integral transform and was first developed by the mathematician Hermann Hankel. It is also known as the Fourier–Bessel transform. Just as the Fourier transform for an infinite interval is related to the Fourier series over a finite interval, so the Hankel transform over an infinite interval is related to the Fourier–Bessel series over a finite interval.
A vector-valued function, also referred to as a vector function, is a mathematical function of one or more variables whose range is a set of multidimensional vectors or infinite-dimensional vectors. The input of a vector-valued function could be a scalar or a vector ; the dimension of the function's domain has no relation to the dimension of its range.
Stokes' theorem, also known as the Kelvin–Stokes theorem after Lord Kelvin and George Stokes, the fundamental theorem for curls or simply the curl theorem, is a theorem in vector calculus on . Given a vector field, the theorem relates the integral of the curl of the vector field over some surface, to the line integral of the vector field around the boundary of the surface. The classical theorem of Stokes can be stated in one sentence:
In the geometry of numbers, the Klein polyhedron, named after Felix Klein, is used to generalize the concept of continued fractions to higher dimensions.