Total derivative

Last updated September 13, 2024

In mathematics, the total derivative of a function $f$ at a point is the best linear approximation near this point of the function with respect to its arguments. Unlike partial derivatives, the total derivative approximates the function with respect to all of its arguments, not just a single one. In many situations, this is the same as considering all partial derivatives simultaneously. The term "total derivative" is primarily used when $f$ is a function of several variables, because when $f$ is a function of a single variable, the total derivative is the same as the ordinary derivative of the function.^[1]^{: 198–203}

The total derivative as a linear map
The total derivative as a differential form
The chain rule for total derivatives
Example: Differentiation with direct dependencies
Example: Differentiation with indirect dependencies
Total differential equation
Application to equation systems
See also
References
External links

The total derivative as a linear map

Let $U\subseteq \mathbb {R} ^{n}$ be an open subset. Then a function $f:U\to \mathbb {R} ^{m}$ is said to be (totally) differentiable at a point $a\in U$ if there exists a linear transformation $df_{a}:\mathbb {R} ^{n}\to \mathbb {R} ^{m}$ such that

\lim _{x\to a}{\frac {\|f(x)-f(a)-df_{a}(x-a)\|}{\|x-a\|}}=0.

The linear map $df_{a}$ is called the (total) derivative or (total) differential of $f$ at $a$ . Other notations for the total derivative include $D_{a}f$ and $Df(a)$ . A function is (totally) differentiable if its total derivative exists at every point in its domain.

Conceptually, the definition of the total derivative expresses the idea that $df_{a}$ is the best linear approximation to $f$ at the point $a$ . This can be made precise by quantifying the error in the linear approximation determined by $df_{a}$ . To do so, write

f(a+h)=f(a)+df_{a}(h)+\varepsilon (h),

where $\varepsilon (h)$ equals the error in the approximation. To say that the derivative of $f$ at $a$ is $df_{a}$ is equivalent to the statement

\varepsilon (h)=o(\lVert h\rVert ),

where $o$ is little-o notation and indicates that $\varepsilon (h)$ is much smaller than $\lVert h\rVert$ as $h\to 0$ . The total derivative $df_{a}$ is the unique linear transformation for which the error term is this small, and this is the sense in which it is the best linear approximation to $f$ .

The function $f$ is differentiable if and only if each of its components $f_{i}\colon U\to \mathbb {R}$ is differentiable, so when studying total derivatives, it is often possible to work one coordinate at a time in the codomain. However, the same is not true of the coordinates in the domain. It is true that if $f$ is differentiable at $a$ , then each partial derivative $\partial f/\partial x_{i}$ exists at $a$ . The converse does not hold: it can happen that all of the partial derivatives of $f$ at $a$ exist, but $f$ is not differentiable at $a$ . This means that the function is very "rough" at $a$ , to such an extreme that its behavior cannot be adequately described by its behavior in the coordinate directions. When $f$ is not so rough, this cannot happen. More precisely, if all the partial derivatives of $f$ at $a$ exist and are continuous in a neighborhood of $a$ , then $f$ is differentiable at $a$ . When this happens, then in addition, the total derivative of $f$ is the linear transformation corresponding to the Jacobian matrix of partial derivatives at that point.^[2]

The total derivative as a differential form

When the function under consideration is real-valued, the total derivative can be recast using differential forms. For example, suppose that $f\colon \mathbb {R} ^{n}\to \mathbb {R}$ is a differentiable function of variables $x_{1},\ldots ,x_{n}$ . The total derivative of $f$ at $a$ may be written in terms of its Jacobian matrix, which in this instance is a row matrix:

Df_{a}={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}(a)&\cdots &{\frac {\partial f}{\partial x_{n}}}(a)\end{bmatrix}}.

The linear approximation property of the total derivative implies that if

\Delta x={\begin{bmatrix}\Delta x_{1}&\cdots &\Delta x_{n}\end{bmatrix}}^{\mathsf {T}}

is a small vector (where the ${\mathsf {T}}$ denotes transpose, so that this vector is a column vector), then

f(a+\Delta x)-f(a)\approx Df_{a}\cdot \Delta x=\sum _{i=1}^{n}{\frac {\partial f}{\partial x_{i}}}(a)\cdot \Delta x_{i}.

Heuristically, this suggests that if $dx_{1},\ldots ,dx_{n}$ are infinitesimal increments in the coordinate directions, then

df_{a}=\sum _{i=1}^{n}{\frac {\partial f}{\partial x_{i}}}(a)\cdot dx_{i}.

In fact, the notion of the infinitesimal, which is merely symbolic here, can be equipped with extensive mathematical structure. Techniques, such as the theory of differential forms, effectively give analytical and algebraic descriptions of objects like infinitesimal increments, $dx_{i}$ . For instance, $dx_{i}$ may be inscribed as a linear functional on the vector space $\mathbb {R} ^{n}$ . Evaluating $dx_{i}$ at a vector $h$ in $\mathbb {R} ^{n}$ measures how much $h$ points in the $i$ th coordinate direction. The total derivative $df_{a}$ is a linear combination of linear functionals and hence is itself a linear functional. The evaluation $df_{a}(h)$ measures how much $f$ points in the direction determined by $h$ at $a$ , and this direction is the gradient. This point of view makes the total derivative an instance of the exterior derivative.

Suppose now that $f$ is a vector-valued function, that is, $f\colon \mathbb {R} ^{n}\to \mathbb {R} ^{m}$ . In this case, the components $f_{i}$ of $f$ are real-valued functions, so they have associated differential forms $df_{i}$ . The total derivative $df$ amalgamates these forms into a single object and is therefore an instance of a vector-valued differential form.

The chain rule for total derivatives

The chain rule has a particularly elegant statement in terms of total derivatives. It says that, for two functions $f$ and $g$ , the total derivative of the composite function $f\circ g$ at $a$ satisfies

d(f\circ g)_{a}=df_{g(a)}\cdot dg_{a}.

If the total derivatives of $f$ and $g$ are identified with their Jacobian matrices, then the composite on the right-hand side is simply matrix multiplication. This is enormously useful in applications, as it makes it possible to account for essentially arbitrary dependencies among the arguments of a composite function.

Example: Differentiation with direct dependencies

Suppose that f is a function of two variables, x and y. If these two variables are independent, so that the domain of f is $\mathbb {R} ^{2}$ , then the behavior of f may be understood in terms of its partial derivatives in the x and y directions. However, in some situations, x and y may be dependent. For example, it might happen that f is constrained to a curve $y=y(x)$ . In this case, we are actually interested in the behavior of the composite function $f(x,y(x))$ . The partial derivative of f with respect to x does not give the true rate of change of f with respect to changing x because changing x necessarily changes y. However, the chain rule for the total derivative takes such dependencies into account. Write $\gamma (x)=(x,y(x))$ . Then, the chain rule says

d(f\circ \gamma )_{x_{0}}=df_{(x_{0},y(x_{0}))}\cdot d\gamma _{x_{0}}.

By expressing the total derivative using Jacobian matrices, this becomes:

{\frac {df(x,y(x))}{dx}}(x_{0})={\frac {\partial f}{\partial x}}(x_{0},y(x_{0}))\cdot {\frac {dx}{dx}}(x_{0})+{\frac {\partial f}{\partial y}}(x_{0},y(x_{0}))\cdot {\frac {dy}{dx}}(x_{0}).

Suppressing the evaluation at $x_{0}$ for legibility, we may also write this as

{\frac {df(x,y(x))}{dx}}={\frac {\partial f}{\partial x}}{\frac {dx}{dx}}+{\frac {\partial f}{\partial y}}{\frac {dy}{dx}}.

This gives a straightforward formula for the derivative of $f(x,y(x))$ in terms of the partial derivatives of $f$ and the derivative of $y(x)$ .

For example, suppose

f(x,y)=xy.

The rate of change of f with respect to x is usually the partial derivative of f with respect to x; in this case,

{\frac {\partial f}{\partial x}}=y.

However, if y depends on x, the partial derivative does not give the true rate of change of f as x changes because the partial derivative assumes that y is fixed. Suppose we are constrained to the line

y=x.

Then

f(x,y)=f(x,x)=x^{2},

and the total derivative of f with respect to x is

{\frac {df}{dx}}=2x,

which we see is not equal to the partial derivative $\partial f/\partial x$ . Instead of immediately substituting for y in terms of x, however, we can also use the chain rule as above:

{\frac {df}{dx}}={\frac {\partial f}{\partial x}}+{\frac {\partial f}{\partial y}}{\frac {dy}{dx}}=y+x\cdot 1=x+y=2x.

Example: Differentiation with indirect dependencies

While one can often perform substitutions to eliminate indirect dependencies, the chain rule provides for a more efficient and general technique. Suppose $L(t,x_{1},\dots ,x_{n})$ is a function of time $t$ and $n$ variables $x_{i}$ which themselves depend on time. Then, the time derivative of $L$ is

{\frac {dL}{dt}}={\frac {d}{dt}}L{\bigl (}t,x_{1}(t),\ldots ,x_{n}(t){\bigr )}.

The chain rule expresses this derivative in terms of the partial derivatives of $L$ and the time derivatives of the functions $x_{i}$ :

{\frac {dL}{dt}}={\frac {\partial L}{\partial t}}+\sum _{i=1}^{n}{\frac {\partial L}{\partial x_{i}}}{\frac {dx_{i}}{dt}}={\biggl (}{\frac {\partial }{\partial t}}+\sum _{i=1}^{n}{\frac {dx_{i}}{dt}}{\frac {\partial }{\partial x_{i}}}{\biggr )}(L).

This expression is often used in physics for a gauge transformation of the Lagrangian, as two Lagrangians that differ only by the total time derivative of a function of time and the $n$ generalized coordinates lead to the same equations of motion. An interesting example concerns the resolution of causality concerning the Wheeler–Feynman time-symmetric theory. The operator in brackets (in the final expression above) is also called the total derivative operator (with respect to $t$ ).

For example, the total derivative of $f(x(t),y(t))$ is

{\frac {df}{dt}}={\partial f \over \partial x}{dx \over dt}+{\partial f \over \partial y}{dy \over dt}.

Here there is no $\partial f/\partial t$ term since $f$ itself does not depend on the independent variable $t$ directly.

Total differential equation

A total differential equation is a differential equation expressed in terms of total derivatives. Since the exterior derivative is coordinate-free, in a sense that can be given a technical meaning, such equations are intrinsic and geometric.

Application to equation systems

In economics, it is common for the total derivative to arise in the context of a system of equations.^[1]^{: pp. 217–220} For example, a simple supply-demand system might specify the quantity q of a product demanded as a function D of its price p and consumers' income I, the latter being an exogenous variable, and might specify the quantity supplied by producers as a function S of its price and two exogenous resource cost variables r and w. The resulting system of equations

q=D(p,I),

q=S(p,r,w),

determines the market equilibrium values of the variables p and q. The total derivative $dp/dr$ of p with respect to r, for example, gives the sign and magnitude of the reaction of the market price to the exogenous variable r. In the indicated system, there are a total of six possible total derivatives, also known in this context as comparative static derivatives: $dp / dr$ , $dp / dw$ , $dp / dI$ , $dq / dr$ , $dq / dw$ , and $dq / dI$ . The total derivatives are found by totally differentiating the system of equations, dividing through by, say $dr$ , treating $dq / dr$ and $dp / dr$ as the unknowns, setting $dI = dw = 0$ , and solving the two totally differentiated equations simultaneously, typically by using Cramer's rule.

Related Research Articles

In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions $f$ and $g$ in terms of the derivatives of $f$ and $g$ . More precisely, if $is the function such that for every x, then the chain rule is, in Lagrange's notation, or, equivalently,$

<span class="mw-page-title-main">Cauchy–Riemann equations</span> Chacteristic property of holomorphic functions

In the field of complex analysis in mathematics, the Cauchy–Riemann equations, named after Augustin Cauchy and Bernhard Riemann, consist of a system of two partial differential equations which form a necessary and sufficient condition for a complex function of a complex variable to be complex differentiable.

In mathematics, the derivative is a fundamental tool that quantifies the sensitivity of change of a function's output with respect to its input. The derivative of a function of a single variable at a chosen input value, when it exists, is the slope of the tangent line to the graph of the function at that point. The tangent line is the best linear approximation of the function near that input value. For this reason, the derivative is often described as the instantaneous rate of change, the ratio of the instantaneous change in the dependent variable to that of the independent variable. The process of finding a derivative is called differentiation.

<span class="mw-page-title-main">Gradient</span> Multivariate derivative (mathematics)

In vector calculus, the gradient of a scalar-valued differentiable function $of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point, the direction of the gradient is the direction in which the function increases most quickly from, and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:$

In calculus, the product rule is a formula used to find the derivatives of products of two or more functions. For two functions, it may be stated in Lagrange's notation as $or in Leibniz's notation as$

In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and returns another function.

In mathematics, a linear differential equation is a differential equation that is defined by a linear polynomial in the unknown function and its derivatives, that is an equation of the form $where a 0 (x), ..., a n (x) and b (x) are arbitrary differentiable functions that do not need to be linear, and y', ..., y (n) are the successive derivatives of an unknown function y of the variable x .$

In mathematics, the Legendre transformation, first introduced by Adrien-Marie Legendre in 1787 when studying the minimal surface problem, is an involutive transformation on real-valued functions that are convex on a real variable. Specifically, if a real-valued multivariable function is convex on one of its independent real variables, then the Legendre transform with respect to this variable is applicable to the function.

In multivariable calculus, the implicit function theorem is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.

Multi-index notation is a mathematical notation that simplifies formulas used in multivariable calculus, partial differential equations and the theory of distributions, by generalising the concept of an integer index to an ordered tuple of indices.

In mathematics, differential refers to several related notions derived from the early days of calculus, put on a rigorous footing, such as infinitesimal differences and the derivatives of functions.

In mathematical analysis, and applications in geometry, applied mathematics, engineering, and natural sciences, a function of a real variable is a function whose domain is the real numbers $, or a subset of that contains an interval of positive length. Most real functions that are considered and studied are differentiable in some interval. The most widely considered such functions are the real functions, which are the real-valued functions of a real variable, that is, the functions of a real variable whose codomain is the set of real numbers.$

In mathematics, the derivative is a fundamental construction of differential calculus and admits many possible generalizations within the fields of mathematical analysis, combinatorics, algebra, geometry, etc.

In mathematics, the Gateaux differential or Gateaux derivative is a generalization of the concept of directional derivative in differential calculus. Named after René Gateaux, it is defined for functions between locally convex topological vector spaces such as Banach spaces. Like the Fréchet derivative on a Banach space, the Gateaux differential is often used to formalize the functional derivative commonly used in the calculus of variations and physics.

In mathematics, the Fréchet derivative is a derivative defined on normed spaces. Named after Maurice Fréchet, it is commonly used to generalize the derivative of a real-valued function of a single real variable to the case of a vector-valued function of multiple real variables, and to define the functional derivative used widely in the calculus of variations.

In differential calculus, there is no single uniform notation for differentiation. Instead, various notations for the derivative of a function or variable have been proposed by various mathematicians. The usefulness of each notation varies with the context, and it is sometimes advantageous to use more than one notation in a given context. The most common notations for differentiation are listed below.

This is a summary of differentiation rules, that is, rules for computing the derivative of a function in calculus.

In calculus, the differential represents the principal part of the change in a function $with respect to changes in the independent variable. The differential is defined by where is the derivative of f with respect to, and is an additional real variable. The notation is such that the equation$

In mathematical analysis and its applications, a function of several real variables or real multivariate function is a function with more than one argument, with all arguments being real variables. This concept extends the idea of a function of a real variable to several variables. The "input" variables take real values, while the "output", also called the "value of the function", may be real or complex. However, the study of the complex-valued functions may be easily reduced to the study of the real-valued functions, by considering the real and imaginary parts of the complex function; therefore, unless explicitly specified, only real-valued functions will be considered in this article.

In mathematics, calculus on Euclidean space is a generalization of calculus of functions in one or several variables to calculus of functions on Euclidean space $as well as a finite-dimensional real vector space. This calculus is also known as advanced calculus, especially in the United States. It is similar to multivariable calculus but is somewhat more sophisticated in that it uses linear algebra more extensively and covers some concepts from differential geometry such as differential forms and Stokes' formula in terms of differential forms. This extensive use of linear algebra also allows a natural generalization of multivariable calculus to calculus on Banach spaces or topological vector spaces.$

References

1 2 Chiang, Alpha C. (1984). Fundamental Methods of Mathematical Economics (Third ed.). McGraw-Hill. ISBN 0-07-010813-7.
↑ Abraham, Ralph; Marsden, J. E.; Ratiu, Tudor (2012). Manifolds, Tensor Analysis, and Applications. Springer Science & Business Media. p. 78. ISBN 9781461210290.

A. D. Polyanin and V. F. Zaitsev, Handbook of Exact Solutions for Ordinary Differential Equations (2nd edition), Chapman & Hall/CRC Press, Boca Raton, 2003. ISBN 1-58488-297-2
From thesaurus.maths.org total derivative

External links

Weisstein, Eric W. "Total Derivative". MathWorld .
Ronald D. Kriz (2007) Envisioning total derivatives of scalar functions of two dimensions using raised surfaces and tangent planes from Virginia Tech

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Chiang-1] 1 2 Chiang, Alpha C. (1984). Fundamental Methods of Mathematical Economics (Third ed.). McGraw-Hill. ISBN 0-07-010813-7.

[2] Abraham, Ralph; Marsden, J. E.; Ratiu, Tudor (2012). Manifolds, Tensor Analysis, and Applications. Springer Science & Business Media. p. 78. ISBN 9781461210290.

[1]

[2]

v t e Analysis in topological vector spaces
Basic concepts	Abstract Wiener space Classical Wiener space Bochner space Convex series Cylinder set measure Infinite-dimensional vector function Matrix calculus Vector calculus
Derivatives	Differentiable vector–valued functions from Euclidean space Differentiation in Fréchet spaces Fréchet derivative Total Functional derivative Gateaux derivative Directional Generalizations of the derivative Hadamard derivative Holomorphic Quasi-derivative
Measurability	Besov measure Cylinder set measure Canonical Gaussian Classical Wiener measure Measure like set functions infinite-dimensional Gaussian measure Projection-valued Vector Bochner / Weakly / Strongly measurable function Radonifying function
Integrals	Bochner Direct integral Dunford Gelfand–Pettis/Weak Regulated Paley–Wiener
Results	Cameron–Martin theorem Inverse function theorem Nash–Moser theorem Feldman–Hájek theorem No infinite-dimensional Lebesgue measure Sazonov's theorem Structure theorem for Gaussian measures
Related	Crinkled arc Covariance operator
Functional calculus	Borel functional calculus Continuous functional calculus Holomorphic functional calculus
Applications	Banach manifold (bundle) Convenient vector space Choquet theory Fréchet manifold Hilbert manifold