Jacobi's formula

Last updated August 30, 2024

In matrix calculus, Jacobi's formula expresses the derivative of the determinant of a matrix A in terms of the adjugate of A and the derivative of A.^[1]

Derivation

Via matrix computation

Theorem. (Jacobi's formula) For any differentiable map A from the real numbers to n × n matrices,

d\det(A)=\operatorname {tr} (\operatorname {adj} (A)\,dA).

Proof. Laplace's formula for the determinant of a matrix A can be stated as

\det(A)=\sum _{j}A_{ij}\operatorname {adj} ^{\rm {T}}(A)_{ij}.

Notice that the summation is performed over some arbitrary row i of the matrix.

The determinant of A can be considered to be a function of the elements of A:

\det(A)=F\,(A_{11},A_{12},\ldots ,A_{21},A_{22},\ldots ,A_{nn})

so that, by the chain rule, its differential is

d\det(A)=\sum _{i}\sum _{j}{\partial F \over \partial A_{ij}}\,dA_{ij}.

This summation is performed over all n×n elements of the matrix.

To find ∂F/∂A_ij consider that on the right hand side of Laplace's formula, the index i can be chosen at will. (In order to optimize calculations: Any other choice would eventually yield the same result, but it could be much harder). In particular, it can be chosen to match the first index of ∂ / ∂A_ij:

{\partial \det(A) \over \partial A_{ij}}={\partial \sum _{k}A_{ik}\operatorname {adj} ^{\rm {T}}(A)_{ik} \over \partial A_{ij}}=\sum _{k}{\partial (A_{ik}\operatorname {adj} ^{\rm {T}}(A)_{ik}) \over \partial A_{ij}}

Thus, by the product rule,

{\partial \det(A) \over \partial A_{ij}}=\sum _{k}{\partial A_{ik} \over \partial A_{ij}}\operatorname {adj} ^{\rm {T}}(A)_{ik}+\sum _{k}A_{ik}{\partial \operatorname {adj} ^{\rm {T}}(A)_{ik} \over \partial A_{ij}}.

Now, if an element of a matrix A_ij and a cofactor adj^T(A)_ik of element A_ik lie on the same row (or column), then the cofactor will not be a function of A_ij, because the cofactor of A_ik is expressed in terms of elements not in its own row (nor column). Thus,

{\partial \operatorname {adj} ^{\rm {T}}(A)_{ik} \over \partial A_{ij}}=0,

so

{\partial \det(A) \over \partial A_{ij}}=\sum _{k}\operatorname {adj} ^{\rm {T}}(A)_{ik}{\partial A_{ik} \over \partial A_{ij}}.

All the elements of A are independent of each other, i.e.

{\partial A_{ik} \over \partial A_{ij}}=\delta _{jk},

where δ is the Kronecker delta, so

{\partial \det(A) \over \partial A_{ij}}=\sum _{k}\operatorname {adj} ^{\rm {T}}(A)_{ik}\delta _{jk}=\operatorname {adj} ^{\rm {T}}(A)_{ij}.

Therefore,

d(\det(A))=\sum _{i}\sum _{j}\operatorname {adj} ^{\rm {T}}(A)_{ij}\,dA_{ij}=\sum _{j}\sum _{i}\operatorname {adj} (A)_{ji}\,dA_{ij}=\sum _{j}(\operatorname {adj} (A)\,dA)_{jj}=\operatorname {tr} (\operatorname {adj} (A)\,dA).\ \square

Via chain rule

Lemma 1. $\det '(I)=\mathrm {tr}$ , where $\det '$ is the differential of $\det$ .

This equation means that the differential of $\det$ , evaluated at the identity matrix, is equal to the trace. The differential $\det '(I)$ is a linear operator that maps an n × n matrix to a real number.

Proof. Using the definition of a directional derivative together with one of its basic properties for differentiable functions, we have

\det '(I)(T)=\nabla _{T}\det(I)=\lim _{\varepsilon \to 0}{\frac {\det(I+\varepsilon T)-\det I}{\varepsilon }}

$\det(I+\varepsilon T)$ is a polynomial in $\varepsilon$ of order n. It is closely related to the characteristic polynomial of $T$ . The constant term in that polynomial (the term with $\varepsilon =0$ ) is 1, while the linear term in $\varepsilon$ is $\mathrm {tr} \ T$ .

Lemma 2. For an invertible matrix A, we have: $\det '(A)(T)=\det A\;\mathrm {tr} (A^{-1}T)$ .

Proof. Consider the following function of X:

\det X=\det(AA^{-1}X)=\det(A)\ \det(A^{-1}X)

We calculate the differential of $\det X$ and evaluate it at $X=A$ using Lemma 1, the equation above, and the chain rule:

\det '(A)(T)=\det A\ \det '(I)(A^{-1}T)=\det A\ \mathrm {tr} (A^{-1}T)

Theorem. (Jacobi's formula) ${\frac {d}{dt}}\det A=\mathrm {tr} \left(\mathrm {adj} \ A{\frac {dA}{dt}}\right)$

Proof. If $A$ is invertible, by Lemma 2, with $T=dA/dt$

{\frac {d}{dt}}\det A=\det A\;\mathrm {tr} \left(A^{-1}{\frac {dA}{dt}}\right)=\mathrm {tr} \left(\mathrm {adj} \ A\;{\frac {dA}{dt}}\right)

using the equation relating the adjugate of $A$ to $A^{-1}$ . Now, the formula holds for all matrices, since the set of invertible linear matrices is dense in the space of matrices.

Via diagonalization

Both sides of the Jacobi formula are polynomials in the matrix coefficients of $A$ and $A'$ . It is therefore sufficient to verify the polynomial identity on the dense subset where the eigenvalues of $A$ are distinct and nonzero.

If $A$ factors differentiably as $A=BC$ , then

\mathrm {tr} (A^{-1}A')=\mathrm {tr} ((BC)^{-1}(BC)')=\mathrm {tr} (B^{-1}B')+\mathrm {tr} (C^{-1}C').

In particular, if $L$ is invertible, then $I=L^{-1}L$ and

0=\mathrm {tr} (I^{-1}I')=\mathrm {tr} (L(L^{-1})')+\mathrm {tr} (L^{-1}L').

Since $A$ has distinct eigenvalues, there exists a differentiable complex invertible matrix $L$ such that $A=L^{-1}DL$ and $D$ is diagonal. Then

\mathrm {tr} (A^{-1}A')=\mathrm {tr} (L(L^{-1})')+\mathrm {tr} (D^{-1}D')+\mathrm {tr} (L^{-1}L')=\mathrm {tr} (D^{-1}D').

Let $\lambda _{i}$ , $i=1,\ldots ,n$ be the eigenvalues of $A$ . Then

{\frac {\det(A)'}{\det(A)}}=\sum _{i=1}^{n}\lambda _{i}'/\lambda _{i}=\mathrm {tr} (D^{-1}D')=\mathrm {tr} (A^{-1}A'),

which is the Jacobi formula for matrices $A$ with distinct nonzero eigenvalues.

Corollary

The following is a useful relation connecting the trace to the determinant of the associated matrix exponential:

$\det e^{B}=e^{\operatorname {tr} \left(B\right)}$

This statement is clear for diagonal matrices, and a proof of the general claim follows.

For any invertible matrix $A(t)$ , in the previous section "Via Chain Rule", we showed that

{\frac {d}{dt}}\det A(t)=\det A(t)\;\operatorname {tr} \left(A(t)^{-1}\,{\frac {d}{dt}}A(t)\right)

Considering $A(t)=\exp(tB)$ in this equation yields:

{\frac {d}{dt}}\det e^{tB}=\operatorname {tr} (B)\det e^{tB}

The desired result follows as the solution to this ordinary differential equation.

Applications

Several forms of the formula underlie the Faddeev–LeVerrier algorithm for computing the characteristic polynomial, and explicit applications of the Cayley–Hamilton theorem. For example, starting from the following equation, which was proved above:

{\frac {d}{dt}}\det A(t)=\det A(t)\ \operatorname {tr} \left(A(t)^{-1}\,{\frac {d}{dt}}A(t)\right)

and using $A(t)=tI-B$ , we get:

{\frac {d}{dt}}\det(tI-B)=\det(tI-B)\operatorname {tr} [(tI-B)^{-1}]=\operatorname {tr} [\operatorname {adj} (tI-B)]

where adj denotes the adjugate matrix.

Remarks

↑ Magnus & Neudecker (1999 , pp. 149–150), Part Three, Section 8.3

Related Research Articles

In mathematics, the determinant is a scalar-valued function of the entries of a square matrix. The determinant of a matrix $A$ is commonly denoted $det(A)$ , $det A$ , or $| A |$ . Its value characterizes some properties of the matrix and the linear map represented, on a given basis, by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the corresponding linear map is an isomorphism.

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In linear algebra, the trace of a square matrix $A$ , denoted $tr(A)$ , is defined to be the sum of elements on the main diagonal of $A$ . The trace is only defined for a square matrix.

In linear algebra, the Cayley–Hamilton theorem states that every square matrix over a commutative ring satisfies its own characteristic equation.

In continuum mechanics, the infinitesimal strain theory is a mathematical approach to the description of the deformation of a solid body in which the displacements of the material particles are assumed to be much smaller than any relevant dimension of the body; so that its geometry and the constitutive properties of the material at each point of space can be assumed to be unchanged by the deformation.

In the mathematical field of differential geometry, a metric tensor is an additional structure on a manifold $M$ that allows defining distances and angles, just as the inner product on a Euclidean space allows defining distances and angles there. More precisely, a metric tensor at a point $p$ of $M$ is a bilinear form defined on the tangent space at $p$ , and a metric field on $M$ consists of a metric tensor at each point $p$ of $M$ that varies smoothly with $p$ .

In linear algebra, the adjugate of a square matrix $A$ is the transpose of its cofactor matrix and is denoted by $adj(A)$ . It is also occasionally known as adjunct matrix, or "adjoint", though the latter term today normally refers to a different concept, the adjoint operator which for a matrix is the conjugate transpose.

In linear algebra, an $n$ -by- $n$ square matrix $A$ is called invertible if there exists an $n$ -by- $n$ square matrix $B$ such that $where I n denotes the n -by- n identity matrix and the multiplication used is ordinary matrix multiplication. If this is the case, then the matrix B is uniquely determined by A, and is called the (multiplicative) inverse of A, denoted by A -1 . Matrix inversion is the process of finding the matrix which when multiplied by the original matrix gives the identity matrix.$

In differential geometry, the Ricci curvature tensor, named after Gregorio Ricci-Curbastro, is a geometric object which is determined by a choice of Riemannian or pseudo-Riemannian metric on a manifold. It can be considered, broadly, as a measure of the degree to which the geometry of a given metric tensor differs locally from that of ordinary Euclidean space or pseudo-Euclidean space.

In mathematics and classical mechanics, the Poisson bracket is an important binary operation in Hamiltonian mechanics, playing a central role in Hamilton's equations of motion, which govern the time evolution of a Hamiltonian dynamical system. The Poisson bracket also distinguishes a certain class of coordinate transformations, called canonical transformations, which map canonical coordinate systems into canonical coordinate systems. A "canonical coordinate system" consists of canonical position and momentum variables that satisfy canonical Poisson bracket relations. The set of possible canonical transformations is always very rich. For instance, it is often possible to choose the Hamiltonian itself $as one of the new canonical momentum coordinates.$

In mathematics, the determinant of an m-by-m skew-symmetric matrix can always be written as the square of a polynomial in the matrix entries, a polynomial with integer coefficients that only depends on m. When m is odd, the polynomial is zero, and when m is even, it is a nonzero polynomial of degree m/2, and is unique up to multiplication by ±1. The convention on skew-symmetric tridiagonal matrices, given below in the examples, then determines one specific polynomial, called the Pfaffian polynomial. The value of this polynomial, when applied to the entries of a skew-symmetric matrix, is called the Pfaffian of that matrix. The term Pfaffian was introduced by Cayley, who indirectly named them after Johann Friedrich Pfaff.

In mathematics, a volume element provides a means for integrating a function with respect to volume in various coordinate systems such as spherical coordinates and cylindrical coordinates. Thus a volume element is an expression of the form $where the are the coordinates, so that the volume of any set can be computed by For example, in spherical coordinates, and so .$

In mathematics, Capelli's identity, named after Alfredo Capelli (1887), is an analogue of the formula det(AB) = det(A) det(B), for certain matrices with noncommuting entries, related to the representation theory of the Lie algebra $. It can be used to relate an invariant ƒ to the invariant Ω ƒ, where Ω is Cayley's Ω process.$

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

Curvilinear coordinates can be formulated in tensor calculus, with important applications in physics and engineering, particularly for describing transportation of physical quantities and deformation of matter in fluid mechanics and continuum mechanics.

<span class="mw-page-title-main">Derivative of the exponential map</span> Formula in Lie group theory

In the theory of Lie groups, the exponential map is a map from the Lie algebra $g$ of a Lie group $G$ into $G$ . In case $G$ is a matrix Lie group, the exponential map reduces to the matrix exponential. The exponential map, denoted $exp: g \to G$ , is analytic and has as such a derivative $⁠ d / dt ⁠ exp(X (t)):T g \to T G$ , where $X (t)$ is a $C 1$ path in the Lie algebra, and a closely related differential $d exp:T g \to T G$ .

In mathematics, the Faddeev–LeVerrier algorithm is a recursive method to calculate the coefficients of the characteristic polynomial $of a square matrix, A, named after Dmitry Konstantinovich Faddeev and Urbain Le Verrier. Calculation of this polynomial yields the eigenvalues of A as its roots; as a matrix polynomial in the matrix A itself, it vanishes by the Cayley-Hamilton theorem. Computing the characteristic polynomial directly from the definition of the determinant is computationally cumbersome insofar as it introduces a new symbolic quantity; by contrast, the Faddeev-Le Verrier algorithm works directly with coefficients of matrix .$

In mathematics, Fischer's inequality gives an upper bound for the determinant of a positive-semidefinite matrix whose entries are complex numbers in terms of the determinants of its principal diagonal blocks. Suppose A, C are respectively p×p, q×q positive-semidefinite complex matrices and B is a p×q complex matrix. Let

The Fradkin tensor, or Jauch-Hill-Fradkin tensor, named after Josef-Maria Jauch and Edward Lee Hill and David M. Fradkin, is a conservation law used in the treatment of the isotropic multidimensional harmonic oscillator in classical mechanics. For the treatment of the quantum harmonic oscillator in quantum mechanics, it is replaced by the tensor-valued Fradkin operator.

References

Magnus, Jan R.; Neudecker, Heinz (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised ed.). Wiley. ISBN 0-471-98633-X.
Bellman, Richard (1997). Introduction to Matrix Analysis. SIAM. ISBN 0-89871-399-4.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Magnus & Neudecker (1999 , pp. 149–150), Part Three, Section 8.3

[1]