When X is an n×ndiagonal matrix then exp(X) will be an n×n diagonal matrix with each diagonal element equal to the ordinary exponential applied to the corresponding diagonal element of X.
Properties
Elementary properties
Let X and Y be n×n complex matrices and let a and b be arbitrary complex numbers. We denote the n×nidentity matrix by I and the zero matrix by 0. The matrix exponential satisfies the following properties.[2]
We begin with the properties that are immediate consequences of the definition as a power series:
e0 = I
exp(XT) = (exp X)T, where XT denotes the transpose of X.
The proof of this identity is the same as the standard power-series argument for the corresponding identity for the exponential of real numbers. That is to say, as long as and commute, it makes no difference to the argument whether and are numbers or matrices. It is important to note that this identity typically does not hold if and do not commute (see Golden-Thompson inequality below).
Consequences of the preceding identity are the following:
eaXebX = e(a + b)X
eXe−X = I
Using the above results, we can easily verify the following claims. If X is symmetric then eX is also symmetric, and if X is skew-symmetric then eX is orthogonal. If X is Hermitian then eX is also Hermitian, and if X is skew-Hermitian then eX is unitary.
Finally, a Laplace transform of matrix exponentials amounts to the resolvent, for all sufficiently large positive values of s.
One of the reasons for the importance of the matrix exponential is that it can be used to solve systems of linear ordinary differential equations. The solution of where A is a constant matrix and y is a column vector, is given by
The matrix exponential can also be used to solve the inhomogeneous equation See the section on applications below for examples.
There is no closed-form solution for differential equations of the form where A is not constant, but the Magnus series gives the solution as an infinite sum.
In addition to providing a computational tool, this formula demonstrates that a matrix exponential is always an invertible matrix. This follows from the fact that the right hand side of the above equation is always non-zero, and so det(eA) ≠ 0, which implies that eA must be invertible.
In the real-valued case, the formula also exhibits the map to not be surjective, in contrast to the complex case mentioned earlier. This follows from the fact that, for real-valued matrices, the right-hand side of the formula is always positive, while there exist invertible matrices with a negative determinant.
Real symmetric matrices
The matrix exponential of a real symmetric matrix is positive definite. Let be an n×n real symmetric matrix and a column vector. Using the elementary properties of the matrix exponential and of symmetric matrices, we have:
Since is invertible, the equality only holds for , and we have for all non-zero . Hence is positive definite.
The exponential of sums
For any real numbers (scalars) x and y we know that the exponential function satisfies ex+y = exey. The same is true for commuting matrices. If matrices X and Y commute (meaning that XY = YX), then,
However, for matrices that do not commute the above equality does not necessarily hold.
The Lie product formula
Even if X and Y do not commute, the exponential eX + Y can be computed by the Lie product formula[4]
Using a large finite k to approximate the above is basis of the Suzuki-Trotter expansion, often used in numerical time evolution.
The Baker–Campbell–Hausdorff formula
In the other direction, if X and Y are sufficiently small (but not necessarily commuting) matrices, we have where Z may be computed as a series in commutators of X and Y by means of the Baker–Campbell–Hausdorff formula:[5] where the remaining terms are all iterated commutators involving X and Y. If X and Y commute, then all the commutators are zero and we have simply Z = X + Y.
Inequalities for exponentials of Hermitian matrices
There is no requirement of commutativity. There are counterexamples to show that the Golden–Thompson inequality cannot be extended to three matrices – and, in any event, tr(exp(A)exp(B)exp(C)) is not guaranteed to be real for Hermitian A, B, C. However, Lieb proved[7][8] that it can be generalized to three matrices if we modify the expression as follows
The exponential map
The exponential of a matrix is always an invertible matrix. The inverse matrix of eX is given by e−X. This is analogous to the fact that the exponential of a complex number is always nonzero. The matrix exponential then gives us a map from the space of all n×n matrices to the general linear group of degree n, i.e. the group of all n×n invertible matrices. In fact, this map is surjective which means that every invertible matrix can be written as the exponential of some other matrix[9] (for this, it is essential to consider the field C of complex numbers and not R).
The derivative of this curve (or tangent vector) at a point t is given by
(1)
The derivative at t = 0 is just the matrix X, which is to say that X generates this one-parameter subgroup.
More generally,[10] for a generic t-dependent exponent, X(t),
Taking the above expression eX(t) outside the integral sign and expanding the integrand with the help of the Hadamard lemma one can obtain the following useful expression for the derivative of the matrix exponent,[11]
The coefficients in the expression above are different from what appears in the exponential. For a closed form, see derivative of the exponential map.
Directional derivatives when restricted to Hermitian matrices
Let be a Hermitian matrix with distinct eigenvalues. Let be its eigen-decomposition where is a unitary matrix whose columns are the eigenvectors of , is its conjugate transpose, and the vector of corresponding eigenvalues. Then, for any Hermitian matrix , the directional derivative of at in the direction is [12][13] where , the operator denotes the Hadamard product, and, for all , the matrix is defined as In addition, for any Hermitian matrix , the second directional derivative in directions and is[13] where the matrix-valued function is defined, for all , as with
Computing the matrix exponential
Finding reliable and accurate methods to compute the matrix exponential is difficult, and this is still a topic of considerable current research in mathematics and numerical analysis. Matlab, GNU Octave, R, and SciPy all use the Padé approximant.[14][15][16][17] In this section, we discuss methods that are applicable in principle to any matrix, and which can be carried out explicitly for small matrices.[18] Subsequent sections describe methods suitable for numerical evaluation on large matrices.
Diagonalizable case
If a matrix is diagonal: then its exponential can be obtained by exponentiating each entry on the main diagonal:
Application of Sylvester's formula yields the same result. (To see this, note that addition and multiplication, hence also exponentiation, of diagonal matrices is equivalent to element-wise addition and multiplication, and hence exponentiation; in particular, the "one-dimensional" exponentiation is felt element-wise for the diagonal case.)
Example: Diagonalizable
For example, the matrix can be diagonalized as
Thus,
Nilpotent case
A matrix N is nilpotent if Nq = 0 for some integer q. In this case, the matrix exponential eN can be computed directly from the series expansion, as the series terminates after a finite number of terms:
Since the series has a finite number of steps, it is a matrix polynomial, which can be computed efficiently.
This means that we can compute the exponential of X by reducing to the previous two cases:
Note that we need the commutativity of A and N for the last step to work.
Using the Jordan canonical form
A closely related method is, if the field is algebraically closed, to work with the Jordan form of X. Suppose that X = PJP−1 where J is the Jordan form of X. Then
Also, since
Therefore, we need only know how to compute the matrix exponential of a Jordan block. But each Jordan block is of the form
where N is a special nilpotent matrix. The matrix exponential of J is then given by
Deriving this by expansion of the exponential function, each power of P reduces to P which becomes a common factor of the sum:
Rotation case
For a simple rotation in which the perpendicular unit vectors a and b specify a plane,[19] the rotation matrixR can be expressed in terms of a similar exponential function involving a generatorG and angle θ.[20][21]
The formula for the exponential results from reducing the powers of G in the series expansion and identifying the respective series coefficients of G2 and G with −cos(θ) and sin(θ) respectively. The second expression here for eGθ is the same as the expression for R(θ) in the article containing the derivation of the generator, R(θ) = eGθ.
In two dimensions, if and , then , , and reduces to the standard matrix for a plane rotation.
The matrix P = −G2projects a vector onto the ab-plane and the rotation only affects this part of the vector. An example illustrating this is a rotation of 30° = π/6 in the plane spanned by a and b,
Let N = I - P, so N2 = N and its products with P and G are zero. This will allow us to evaluate powers of R.
By virtue of the Cayley–Hamilton theorem the matrix exponential is expressible as a polynomial of order n−1.
If P and Qt are nonzero polynomials in one variable, such that P(A) = 0, and if the meromorphic function is entire, then To prove this, multiply the first of the two above equalities by P(z) and replace z by A.
Such a polynomial Qt(z) can be found as follows−see Sylvester's formula. Letting a be a root of P, Qa,t(z) is solved from the product of P by the principal part of the Laurent series of f at a: It is proportional to the relevant Frobenius covariant. Then the sum St of the Qa,t, where a runs over all the roots of P, can be taken as a particular Qt. All the other Qt will be obtained by adding a multiple of P to St(z). In particular, St(z), the Lagrange-Sylvester polynomial, is the only Qt whose degree is less than that of P.
Example: Consider the case of an arbitrary 2×2 matrix,
Thus, as indicated above, the matrix A having decomposed into the sum of two mutually commuting pieces, the traceful piece and the traceless piece,
the matrix exponential reduces to a plain product of the exponentials of the two respective pieces. This is a formula often used in physics, as it amounts to the analog of Euler's formula for Pauli spin matrices, that is rotations of the doublet representation of the group SU(2).
The polynomial St can also be given the following "interpolation" characterization. Define et(z) ≡ etz, and n ≡ deg P. Then St(z) is the unique degree < n polynomial which satisfies St(k)(a) = et(k)(a) whenever k is less than the multiplicity of a as a root of P. We assume, as we obviously can, that P is the minimal polynomial of A. We further assume that A is a diagonalizable matrix. In particular, the roots of P are simple, and the "interpolation" characterization indicates that St is given by the Lagrange interpolation formula, so it is the Lagrange−Sylvester polynomial.
At the other extreme, if P = (z - a)n, then
The simplest case not covered by the above observations is when with a ≠ b, which yields
A practical, expedited computation of the above reduces to the following rapid steps. Recall from above that an n×n matrix exp(tA) amounts to a linear combination of the first n−1 powers of A by the Cayley–Hamilton theorem. For diagonalizable matrices, as illustrated above, e.g. in the 2×2 case, Sylvester's formula yields exp(tA) = Bα exp(tα) + Bβ exp(tβ), where the Bs are the Frobenius covariants of A.
It is easiest, however, to simply solve for these Bs directly, by evaluating this expression and its first derivative at t = 0, in terms of A and I, to find the same answer as above.
But this simple procedure also works for defective matrices, in a generalization due to Buchheim.[22] This is illustrated here for a 4×4 example of a matrix which is not diagonalizable, and the Bs are not projection matrices.
Consider with eigenvalues λ1 = 3/4 and λ2 = 1, each with a multiplicity of two.
Consider the exponential of each eigenvalue multiplied by t, exp(λit). Multiply each exponentiated eigenvalue by the corresponding undetermined coefficient matrix Bi. If the eigenvalues have an algebraic multiplicity greater than 1, then repeat the process, but now multiplying by an extra factor of t for each repetition, to ensure linear independence.
(If one eigenvalue had a multiplicity of three, then there would be the three terms: . By contrast, when all eigenvalues are distinct, the Bs are just the Frobenius covariants, and solving for them as below just amounts to the inversion of the Vandermonde matrix of these 4 eigenvalues.)
Sum all such terms, here four such,
To solve for all of the unknown matrices B in terms of the first three powers of A and the identity, one needs four equations, the above one providing one such at t = 0. Further, differentiate it with respect to t,
and again,
and once more,
(In the general case, n−1 derivatives need be taken.)
Setting t = 0 in these four equations, the four coefficient matrices Bs may now be solved for,
to yield
Substituting with the value for A yields the coefficient matrices
so the final answer is
The procedure is much shorter than Putzer's algorithm sometimes utilized in such cases.
The exponential of a 1×1 matrix is just the exponential of the one entry of the matrix, so exp(J1(4)) = [e4]. The exponential of J2(16) can be calculated by the formula e(λI + N) = eλeN mentioned above; this yields[23]
Therefore, the exponential of the original matrix B is
Applications
Linear differential equations
The matrix exponential has applications to systems of linear differential equations. (See also matrix differential equation.) Recall from earlier in this article that a homogeneous differential equation of the form has solution eAty(0).
If we consider the vector we can express a system of inhomogeneous coupled linear differential equations as Making an ansatz to use an integrating factor of e−At and multiplying throughout, yields
The second step is possible due to the fact that, if AB = BA, then eAtB = BeAt. So, calculating eAt leads to the solution to the system, by simply integrating the third step with respect to t.
A solution to this can be obtained by integrating and multiplying by to eliminate the exponent in the LHS. Notice that while is a matrix, given that it is a matrix exponential, we can say that . In other words, .
so that the general solution of the homogeneous system is
amounting to
Example (inhomogeneous)
Consider now the inhomogeneous system
We again have
and
From before, we already have the general solution to the homogeneous equation. Since the sum of the homogeneous and particular solutions give the general solution to the inhomogeneous problem, we now only need find the particular solution.
We have, by above, which could be further simplified to get the requisite particular solution determined through variation of parameters. Note c = yp(0). For more rigor, see the following generalization.
Inhomogeneous case generalization: variation of parameters
The matrix exponential of another matrix (matrix-matrix exponential),[24] is defined as for any normal and non-singularn×n matrix X, and any complex n×n matrix Y.
For matrix-matrix exponentials, there is a distinction between the left exponential YX and the right exponential XY, because the multiplication operator for matrix-to-matrix is not commutative. Moreover,
If X is normal and non-singular, then XY and YX have the same set of eigenvalues.
If X is normal and non-singular, Y is normal, and XY = YX, then XY = YX.
If X is normal and non-singular, and X, Y, Z commute with each other, then XY+Z = XY·XZ and Y+ZX = YX·ZX.
In physics, the Lorentz transformations are a six-parameter family of linear transformations from a coordinate frame in spacetime to another frame that moves at a constant velocity relative to the former. The respective inverse transformation is then parameterized by the negative of this velocity. The transformations are named after the Dutch physicist Hendrik Lorentz.
In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.
The moment of inertia, otherwise known as the mass moment of inertia, angular/rotational mass, second moment of mass, or most accurately, rotational inertia, of a rigid body is defined relative to a rotational axis. It is the ratio between the torque applied and the resulting angular acceleration about that axis. It plays the same role in rotational motion as mass does in linear motion. A body's moment of inertia about a particular axis depends both on the mass and its distribution relative to the axis, increasing with mass & distance from the axis.
In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.
In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition
In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form and with parametric extension for arbitrary real constants a, b and non-zero c. It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".
In mathematics, the Heisenberg group, named after Werner Heisenberg, is the group of 3×3 upper triangular matrices of the form
An infinitesimal rotation matrix or differential rotation matrix is a matrix representing an infinitely small rotation.
In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.
In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.
In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.
In mathematics, the axis–angle representation parameterizes a rotation in a three-dimensional Euclidean space by two quantities: a unit vector e indicating the direction of an axis of rotation, and an angle of rotation θ describing the magnitude and sense of the rotation about the axis. Only two numbers, not three, are needed to define the direction of a unit vector e rooted at the origin because the magnitude of e is constrained. For example, the elevation and azimuth angles of e suffice to locate it in any particular Cartesian coordinate frame.
In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.
A differential equation is a mathematical equation for an unknown function of one or several variables that relates the values of the function itself and its derivatives of various orders. A matrix differential equation contains more than one function stacked into vector form with a matrix relating the functions to their derivatives.
Common integrals in quantum field theory are all variations and generalizations of Gaussian integrals to the complex plane and to multiple dimensions. Other integrals can be approximated by versions of the Gaussian integral. Fourier integrals are also considered.
Stokes' theorem, also known as the Kelvin–Stokes theorem after Lord Kelvin and George Stokes, the fundamental theorem for curls or simply the curl theorem, is a theorem in vector calculus on . Given a vector field, the theorem relates the integral of the curl of the vector field over some surface, to the line integral of the vector field around the boundary of the surface. The classical theorem of Stokes can be stated in one sentence:
In electromagnetism, a branch of fundamental physics, the matrix representations of the Maxwell's equations are a formulation of Maxwell's equations using matrices, complex numbers, and vector calculus. These representations are for a homogeneous medium, an approximation in an inhomogeneous medium. A matrix representation for an inhomogeneous medium was presented using a pair of matrix equations. A single equation using 4 × 4 matrices is necessary and sufficient for any homogeneous medium. For an inhomogeneous medium it necessarily requires 8 × 8 matrices.
There are many ways to derive the Lorentz transformations using a variety of physical principles, ranging from Maxwell's equations to Einstein's postulates of special relativity, and mathematical tools, spanning from elementary algebra and hyperbolic functions, to linear algebra and group theory.
↑ R. M. Wilcox (1967). "Exponential Operators and Parameter Differentiation in Quantum Physics". Journal of Mathematical Physics. 8 (4): 962–982. Bibcode:1967JMP.....8..962W. doi:10.1063/1.1705306.
↑ This can be generalized; in general, the exponential of Jn(a) is an upper triangular matrix with ea/0! on the main diagonal, ea/1! on the one above, ea/2! on the next one, and so on.
Hall, Brian C. (2015), Lie groups, Lie algebras, and representations: An elementary introduction, Graduate Texts in Mathematics, vol.222 (2nded.), Springer, ISBN978-3-319-13466-6
Suzuki, Masuo (1985). "Decomposition formulas of exponential operators and Lie exponentials with some applications to quantum mechanics and statistical physics". Journal of Mathematical Physics. 26 (4): 601–612. Bibcode:1985JMP....26..601S. doi:10.1063/1.526596.
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.