Jacobi's formula

Last updated

In matrix calculus, Jacobi's formula expresses the derivative of the determinant of a matrix A in terms of the adjugate of A and the derivative of A. [1]

Contents

If A is a differentiable map from the real numbers to n × n matrices, then

where tr(X) is the trace of the matrix X. (The latter equality only holds if A(t) is invertible.)

As a special case,

Equivalently, if dA stands for the differential of A, the general formula is

The formula is named after the mathematician Carl Gustav Jacob Jacobi.

Derivation

Via Matrix Computation

We first prove a preliminary lemma:

Lemma. Let A and B be a pair of square matrices of the same dimension n. Then

Proof. The product AB of the pair of matrices has components

Replacing the matrix A by its transpose AT is equivalent to permuting the indices of its components:

The result follows by taking the trace of both sides:

Theorem. (Jacobi's formula) For any differentiable map A from the real numbers to n × n matrices,

Proof. Laplace's formula for the determinant of a matrix A can be stated as

Notice that the summation is performed over some arbitrary row i of the matrix.

The determinant of A can be considered to be a function of the elements of A:

so that, by the chain rule, its differential is

This summation is performed over all n×n elements of the matrix.

To find ∂F/∂Aij consider that on the right hand side of Laplace's formula, the index i can be chosen at will. (In order to optimize calculations: Any other choice would eventually yield the same result, but it could be much harder). In particular, it can be chosen to match the first index of ∂ / Aij:

Thus, by the product rule,

Now, if an element of a matrix Aij and a cofactor adjT(A)ik of element Aik lie on the same row (or column), then the cofactor will not be a function of Aij, because the cofactor of Aik is expressed in terms of elements not in its own row (nor column). Thus,

so

All the elements of A are independent of each other, i.e.

where δ is the Kronecker delta, so

Therefore,

and applying the Lemma yields

Via Chain Rule

Lemma 1., where is the differential of .

This equation means that the differential of , evaluated at the identity matrix, is equal to the trace. The differential is a linear operator that maps an n × n matrix to a real number.

Proof. Using the definition of a directional derivative together with one of its basic properties for differentiable functions, we have

is a polynomial in of order n. It is closely related to the characteristic polynomial of . The constant term in that polynomial (the term with ) is 1, while the linear term in is .

Lemma 2. For an invertible matrix A, we have: .

Proof. Consider the following function of X:

We calculate the differential of and evaluate it at using Lemma 1, the equation above, and the chain rule:

Theorem. (Jacobi's formula)

Proof. If is invertible, by Lemma 2, with

using the equation relating the adjugate of to . Now, the formula holds for all matrices, since the set of invertible linear matrices is dense in the space of matrices.

Via Diagonalization

Both sides of the Jacobi formula are polynomials in the matrix coefficients of A and A'. It is therefore sufficient to verify the polynomial identity on the dense subset where the eigenvalues of A are distinct and nonzero.

If A factors differentiably as , then

In particular, if L is invertible, then and

Since A has distinct eigenvalues, there exists a differentiable complex invertible matrix L such that and D is diagonal. Then

Let , be the eigenvalues of A. Then

which is the Jacobi formula for matrices A with distinct nonzero eigenvalues.

Corollary

The following is a useful relation connecting the trace to the determinant of the associated matrix exponential:

This statement is clear for diagonal matrices, and a proof of the general claim follows.

For any invertible matrix , in the previous section "Via Chain Rule", we showed that

Considering in this equation yields:

The desired result follows as the solution to this ordinary differential equation.

Applications

Several forms of the formula underlie the Faddeev–LeVerrier algorithm for computing the characteristic polynomial, and explicit applications of the Cayley–Hamilton theorem. For example, starting from the following equation, which was proved above:

and using , we get:

where adj denotes the adjugate matrix.

Remarks

  1. Magnus & Neudecker (1999 , pp. 149–150), Part Three, Section 8.3

Related Research Articles

In mathematics, the determinant is a scalar value that is a certain function of the entries of a square matrix. The determinant of a matrix A is commonly denoted det(A), det A, or |A|. Its value characterizes some properties of the matrix and the linear map represented, on a given basis, by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the corresponding linear map is an isomorphism. The determinant of a product of matrices is the product of their determinants.

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In linear algebra, the trace of a square matrix A, denoted tr(A), is defined to be the sum of elements on the main diagonal of A. The trace is only defined for a square matrix.

In quantum mechanics, a density matrix is a matrix that describes the quantum state of a physical system. It allows for the calculation of the probabilities of the outcomes of any measurement performed upon this system, using the Born rule. It is a generalization of the more usual state vectors or wavefunctions: while those can only represent pure states, density matrices can also represent mixed states. Mixed states arise in quantum mechanics in two different situations:

  1. when the preparation of the system is not fully known, and thus one must deal with a statistical ensemble of possible preparations, and
  2. when one wants to describe a physical system that is entangled with another, without describing their combined state; this case is typical for a system interacting with some environment.
<span class="mw-page-title-main">Cayley–Hamilton theorem</span> Every square matrix over a commutative ring satisfies its own characteristic equation

In linear algebra, the Cayley–Hamilton theorem states that every square matrix over a commutative ring satisfies its own characteristic equation.

In continuum mechanics, the infinitesimal strain theory is a mathematical approach to the description of the deformation of a solid body in which the displacements of the material particles are assumed to be much smaller than any relevant dimension of the body; so that its geometry and the constitutive properties of the material at each point of space can be assumed to be unchanged by the deformation.

In linear algebra, the adjugate of a square matrix A is the transpose of its cofactor matrix and is denoted by adj(A). It is also occasionally known as adjunct matrix, or "adjoint", though the latter term today normally refers to a different concept, the adjoint operator which for a matrix is the conjugate transpose.

In linear algebra, an n-by-n square matrix A is called invertible if there exists an n-by-n square matrix B such that

In mathematics, the conjugate transpose, also known as the Hermitian transpose, of an complex matrix is an matrix obtained by transposing and applying complex conjugation to each entry. There are several notations, such as or , , or .

In differential geometry, the Ricci curvature tensor, named after Gregorio Ricci-Curbastro, is a geometric object which is determined by a choice of Riemannian or pseudo-Riemannian metric on a manifold. It can be considered, broadly, as a measure of the degree to which the geometry of a given metric tensor differs locally from that of ordinary Euclidean space or pseudo-Euclidean space.

<span class="mw-page-title-main">Poisson bracket</span> Operation in Hamiltonian mechanics

In mathematics and classical mechanics, the Poisson bracket is an important binary operation in Hamiltonian mechanics, playing a central role in Hamilton's equations of motion, which govern the time evolution of a Hamiltonian dynamical system. The Poisson bracket also distinguishes a certain class of coordinate transformations, called canonical transformations, which map canonical coordinate systems into canonical coordinate systems. A "canonical coordinate system" consists of canonical position and momentum variables that satisfy canonical Poisson bracket relations. The set of possible canonical transformations is always very rich. For instance, it is often possible to choose the Hamiltonian itself as one of the new canonical momentum coordinates.

In mathematics, the determinant of an m×m skew-symmetric matrix can always be written as the square of a polynomial in the matrix entries, a polynomial with integer coefficients that only depends on m. When m is odd, the polynomial is zero. When m is even, it is a nonzero polynomial of degree m/2, and is unique up to multiplication by ±1. The convention on skew-symmetric tridiagonal matrices, given below in the examples, then determines one specific polynomial, called the Pfaffian polynomial. The value of this polynomial, when applied to the entries of a skew-symmetric matrix, is called the Pfaffian of that matrix. The term Pfaffian was introduced by Cayley, who indirectly named them after Johann Friedrich Pfaff.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

In mathematics, Capelli's identity, named after Alfredo Capelli (1887), is an analogue of the formula det(AB) = det(A) det(B), for certain matrices with noncommuting entries, related to the representation theory of the Lie algebra . It can be used to relate an invariant ƒ to the invariant Ωƒ, where Ω is Cayley's Ω process.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

Curvilinear coordinates can be formulated in tensor calculus, with important applications in physics and engineering, particularly for describing transportation of physical quantities and deformation of matter in fluid mechanics and continuum mechanics.

In linear algebra, a branch of mathematics, a (multiplicative) compound matrix is a matrix whose entries are all minors, of a given size, of another matrix. Compound matrices are closely related to exterior algebras, and their computation appears in a wide array of problems, such as in the analysis of nonlinear time-varying dynamical systems and generalizations of positive systems, cooperative systems and contracting systems.

<span class="mw-page-title-main">Faddeev–LeVerrier algorithm</span>

In mathematics, the Faddeev–LeVerrier algorithm is a recursive method to calculate the coefficients of the characteristic polynomial of a square matrix, A, named after Dmitry Konstantinovich Faddeev and Urbain Le Verrier. Calculation of this polynomial yields the eigenvalues of A as its roots; as a matrix polynomial in the matrix A itself, it vanishes by the Cayley–Hamilton theorem. Computing the characteristic polynomial directly from the definition of the determinant is computationally cumbersome insofar as it introduces a new symbolic quantity ; by contrast, the Faddeev-Le Verrier algorithm works directly with coefficients of matrix .

In mathematics, Fischer's inequality gives an upper bound for the determinant of a positive-semidefinite matrix whose entries are complex numbers in terms of the determinants of its principal diagonal blocks. Suppose A, C are respectively p×p, q×q positive-semidefinite complex matrices and B is a p×q complex matrix. Let

References