Einstein notation

Last updated January 16, 2025

In mathematics, especially the usage of linear algebra in mathematical physics and differential geometry, Einstein notation (also known as the Einstein summation convention or Einstein summation notation) is a notational convention that implies summation over a set of indexed terms in a formula, thus achieving brevity. As part of mathematics it is a notational subset of Ricci calculus; however, it is often used in physics applications that do not distinguish between tangent and cotangent spaces. It was introduced to physics by Albert Einstein in 1916.^[1]

Introduction

Statement of convention

According to this convention, when an index variable appears twice in a single term and is not otherwise defined (see Free and bound variables), it implies summation of that term over all the values of the index. So where the indices can range over the set ${1, 2, 3}$ , $y=\sum _{i=1}^{3}x^{i}e_{i}=x^{1}e_{1}+x^{2}e_{2}+x^{3}e_{3}$ is simplified by the convention to: $y=x^{i}e_{i}$

The upper indices are not exponents but are indices of coordinates, coefficients or basis vectors. That is, in this context $x 2$ should be understood as the second component of $x$ rather than the square of $x$ (this can occasionally lead to ambiguity). The upper index position in $x i$ is because, typically, an index occurs once in an upper (superscript) and once in a lower (subscript) position in a term (see § Application below). Typically, $(x 1 x 2 x 3)$ would be equivalent to the traditional $(x y z)$ .

In general relativity, a common convention is that

the Greek alphabet is used for space and time components, where indices take on values 0, 1, 2, or 3 (frequently used letters are $μ, ν, ...$ ),
the Latin alphabet is used for spatial components only, where indices take on values 1, 2, or 3 (frequently used letters are $i, j, ...$ ),

In general, indices can range over any indexing set, including an infinite set. This should not be confused with a typographically similar convention used to distinguish between tensor index notation and the closely related but distinct basis-independent abstract index notation.

An index that is summed over is a summation index, in this case " $i$ ". It is also called a dummy index since any symbol can replace " $i$ " without changing the meaning of the expression (provided that it does not collide with other index symbols in the same term).

An index that is not summed over is a free index and should appear only once per term. If such an index does appear, it usually also appears in every other term in an equation. An example of a free index is the " $i$ " in the equation $v_{i}=a_{i}b_{j}x^{j}$ , which is equivalent to the equation ${\textstyle v_{i}=\sum _{j}(a_{i}b_{j}x^{j})}$ .

Application

Einstein notation can be applied in slightly different ways. Typically, each index occurs once in an upper (superscript) and once in a lower (subscript) position in a term; however, the convention can be applied more generally to any repeated indices within a term.^[2] When dealing with covariant and contravariant vectors, where the position of an index indicates the type of vector, the first case usually applies; a covariant vector can only be contracted with a contravariant vector, corresponding to summation of the products of coefficients. On the other hand, when there is a fixed coordinate basis (or when not considering coordinate vectors), one may choose to use only subscripts; see § Superscripts and subscripts versus only subscripts below.

Vector representations

Superscripts and subscripts versus only subscripts

In terms of covariance and contravariance of vectors,

upper indices represent components of contravariant vectors (vectors),
lower indices represent components of covariant vectors (covectors).

They transform contravariantly or covariantly, respectively, with respect to change of basis.

In recognition of this fact, the following notation uses the same symbol both for a vector or covector and its components, as in: ${\begin{aligned}v=v^{i}e_{i}={\begin{bmatrix}e_{1}&e_{2}&\cdots &e_{n}\end{bmatrix}}{\begin{bmatrix}v^{1}\\v^{2}\\\vdots \\v^{n}\end{bmatrix}}\\w=w_{i}e^{i}={\begin{bmatrix}w_{1}&w_{2}&\cdots &w_{n}\end{bmatrix}}{\begin{bmatrix}e^{1}\\e^{2}\\\vdots \\e^{n}\end{bmatrix}}\end{aligned}}$

where $v$ is the vector and $v^{i}$ are its components (not the $i$ th covector $v$ ), $w$ is the covector and $w_{i}$ are its components. The basis vector elements $e_{i}$ are each column vectors, and the covector basis elements $e^{i}$ are each row covectors. (See also § Abstract description; duality, below and the examples)

In the presence of a non-degenerate form (an isomorphism $V \to V *$ , for instance a Riemannian metric or Minkowski metric), one can raise and lower indices.

A basis gives such a form (via the dual basis), hence when working on $R n$ with a Euclidean metric and a fixed orthonormal basis, one has the option to work with only subscripts.

However, if one changes coordinates, the way that coefficients change depends on the variance of the object, and one cannot ignore the distinction; see Covariance and contravariance of vectors.

Mnemonics

In the above example, vectors are represented as $n \times 1$ matrices (column vectors), while covectors are represented as $1 \times n$ matrices (row covectors).

When using the column vector convention:

"Upper indices go up to down; lower indices go left to right."
"Covariant tensors are row vectors that have indices that are below (co-row-below)."
Covectors are row vectors: ${\begin{bmatrix}w_{1}&\cdots &w_{k}\end{bmatrix}}.$ Hence the lower index indicates which column you are in.
Contravariant vectors are column vectors: ${\begin{bmatrix}v^{1}\\\vdots \\v^{k}\end{bmatrix}}$ Hence the upper index indicates which row you are in.

Abstract description

The virtue of Einstein notation is that it represents the invariant quantities with a simple notation.

In physics, a scalar is invariant under transformations of basis. In particular, a Lorentz scalar is invariant under a Lorentz transformation. The individual terms in the sum are not. When the basis is changed, the components of a vector change by a linear transformation described by a matrix. This led Einstein to propose the convention that repeated indices imply the summation is to be done.

As for covectors, they change by the inverse matrix. This is designed to guarantee that the linear function associated with the covector, the sum above, is the same no matter what the basis is.

The value of the Einstein convention is that it applies to other vector spaces built from $V$ using the tensor product and duality. For example, $V \otimes V$ , the tensor product of $V$ with itself, has a basis consisting of tensors of the form $e ij = e i \otimes e j$ . Any tensor $T$ in $V \otimes V$ can be written as: $\mathbf {T} =T^{ij}\mathbf {e} _{ij}.$

$V *$ , the dual of $V$ , has a basis $e 1$ , $e 2$ , ..., $e n$ which obeys the rule $\mathbf {e} ^{i}(\mathbf {e} _{j})=\delta _{j}^{i}.$ where $δ$ is the Kronecker delta. As $\operatorname {Hom} (V,W)=V^{*}\otimes W$ the row/column coordinates on a matrix correspond to the upper/lower indices on the tensor product.

Common operations in this notation

In Einstein notation, the usual element reference $A_{mn}$ for the $m$ -th row and $n$ -th column of matrix $A$ becomes ${A^{m}}_{n}$ . We can then write the following operations in Einstein notation as follows.

Inner product

The inner product of two vectors is the sum of the products of their corresponding components, with the indices of one vector lowered (see #Raising and lowering indices): $\langle \mathbf {u} ,\mathbf {v} \rangle =\langle \mathbf {e} _{i},\mathbf {e} _{j}\rangle u^{i}v^{j}=u_{j}v^{j}$ In the case of an orthonormal basis, we have $u^{j}=u_{j}$ , and the expression simplifies to: $\langle \mathbf {u} ,\mathbf {v} \rangle =\sum _{j}u^{j}v^{j}=u_{j}v^{j}$

Vector cross product

In three dimensions, the cross product of two vectors with respect to a positively oriented orthonormal basis, meaning that $\mathbf {e} _{1}\times \mathbf {e} _{2}=\mathbf {e} _{3}$ , can be expressed as: $\mathbf {u} \times \mathbf {v} =\varepsilon _{\,jk}^{i}u^{j}v^{k}\mathbf {e} _{i}$

Here, $\varepsilon _{\,jk}^{i}=\varepsilon _{ijk}$ is the Levi-Civita symbol. Since the basis is orthonormal, raising the index $i$ does not alter the value of $\varepsilon _{ijk}$ , when treated as a tensor.

Matrix-vector multiplication

The product of a matrix $A ij$ with a column vector $v j$ is: $\mathbf {u} _{i}=(\mathbf {A} \mathbf {v} )_{i}=\sum _{j=1}^{N}A_{ij}v_{j}$ equivalent to $u^{i}={A^{i}}_{j}v^{j}$

This is a special case of matrix multiplication.

Matrix multiplication

The matrix product of two matrices $A ij$ and $B jk$ is: $\mathbf {C} _{ik}=(\mathbf {A} \mathbf {B} )_{ik}=\sum _{j=1}^{N}A_{ij}B_{jk}$

equivalent to ${C^{i}}_{k}={A^{i}}_{j}{B^{j}}_{k}$

Trace

For a square matrix $A i j$ , the trace is the sum of the diagonal elements, hence the sum over a common index $A i i$ .

Outer product

The outer product of the column vector $u i$ by the row vector $v j$ yields an $m \times n$ matrix $A$ : ${A^{i}}_{j}=u^{i}v_{j}={(uv)^{i}}_{j}$

Since $i$ and $j$ represent two different indices, there is no summation and the indices are not eliminated by the multiplication.

Raising and lowering indices

Given a tensor, one can raise an index or lower an index by contracting the tensor with the metric tensor, $g μν$ . For example, taking the tensor $T α β$ , one can lower an index: $g_{\mu \sigma }{T^{\sigma }}_{\beta }=T_{\mu \beta }$

Or one can raise an index: $g^{\mu \sigma }{T_{\sigma }}^{\alpha }=T^{\mu \alpha }$

Notes

This applies only for numerical indices. The situation is the opposite for abstract indices. Then, vectors themselves carry upper abstract indices and covectors carry lower abstract indices, as per the example in the introduction of this article. Elements of a basis of vectors may carry a lower numerical index and an upper abstract index.

Related Research Articles

<span class="mw-page-title-main">Gradient</span> Multivariate derivative (mathematics)

In vector calculus, the gradient of a scalar-valued differentiable function $of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point, the direction of the gradient is the direction in which the function increases most quickly from, and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:$

In physics, the Lorentz transformations are a six-parameter family of linear transformations from a coordinate frame in spacetime to another frame that moves at a constant velocity relative to the former. The respective inverse transformation is then parameterized by the negative of this velocity. The transformations are named after the Dutch physicist Hendrik Lorentz.

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In mathematics, a product is the result of multiplication, or an expression that identifies objects to be multiplied, called factors. For example, 21 is the product of 3 and 7, and $is the product of and . When one factor is an integer, the product is called a multiple .$

In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors may map between different objects such as vectors, scalars, and even other tensors. There are many types of tensors, including scalars and vectors, dual vectors, multilinear maps between vector spaces, and even some operations such as the dot product. Tensors are defined independent of any basis, although they are often referred to by their components in a basis related to a particular coordinate system; those components form an array, which can be thought of as a high-dimensional matrix.

In linear algebra, the outer product of two coordinate vectors is the matrix whose entries are all products of an element in the first vector with an element in the second vector. If the two coordinate vectors have dimensions n and m, then their outer product is an n × m matrix. More generally, given two tensors, their outer product is a tensor. The outer product of tensors is also referred to as their tensor product, and can be used to define the tensor algebra.

In mathematics, the cross product or vector product is a binary operation on two vectors in a three-dimensional oriented Euclidean vector space, and is denoted by the symbol $. Given two linearly independent vectors a and b, the cross product, a \times b, is a vector that is perpendicular to both a and b, and thus normal to the plane containing them. It has many applications in mathematics, physics, engineering, and computer programming. It should not be confused with the dot product.$

In the mathematical field of differential geometry, a metric tensor is an additional structure on a manifold $M$ that allows defining distances and angles, just as the inner product on a Euclidean space allows defining distances and angles there. More precisely, a metric tensor at a point $p$ of $M$ is a bilinear form defined on the tangent space at $p$ , and a metric field on $M$ consists of a metric tensor at each point $p$ of $M$ that varies smoothly with $p$ .

In mathematics, particularly in linear algebra, tensor analysis, and differential geometry, the Levi-Civita symbol or Levi-Civita epsilon represents a collection of numbers defined from the sign of a permutation of the natural numbers $1, 2, ..., n$ , for some positive integer $n$ . It is named after the Italian mathematician and physicist Tullio Levi-Civita. Other names include the permutation symbol, antisymmetric symbol, or alternating symbol, which refer to its antisymmetric property and definition in terms of permutations.

In multilinear algebra, a tensor contraction is an operation on a tensor that arises from the canonical pairing of a vector space and its dual. In components, it is expressed as a sum of products of scalar components of the tensor(s) caused by applying the summation convention to a pair of dummy indices that are bound to each other in an expression. The contraction of a single mixed tensor occurs when a pair of literal indices of the tensor are set equal to each other and summed over. In Einstein notation this summation is built into the notation. The result is another tensor with order reduced by 2.

In physics, especially in multilinear algebra and tensor analysis, covariance and contravariance describe how the quantitative description of certain geometric or physical entities changes with a change of basis. Briefly, a contravariant vector is a list of numbers that transforms oppositely to a change of basis, and a covariant vector is a list of numbers that transforms in the same way. Contravariant vectors are often just called vectors and covariant vectors are called covectors or dual vectors. The terms covariant and contravariant were introduced by James Joseph Sylvester in 1851.

In mathematics, the exterior algebra or Grassmann algebra of a vector space $is an associative algebra that contains which has a product, called exterior product or wedge product and denoted with, such that for every vector in The exterior algebra is named after Hermann Grassmann, and the names of the product come from the "wedge" symbol and the fact that the product of two elements of is "outside"$

This is a glossary of tensor theory. For expositions of tensor theory from different points of view, see:

<span class="mw-page-title-main">Curvilinear coordinates</span> Coordinate system whose directions vary in space

In geometry, curvilinear coordinates are a coordinate system for Euclidean space in which the coordinate lines may be curved. These coordinates may be derived from a set of Cartesian coordinates by using a transformation that is locally invertible at each point. This means that one can convert a point given in a Cartesian coordinate system to its curvilinear coordinates and back. The name curvilinear coordinates, coined by the French mathematician Lamé, derives from the fact that the coordinate surfaces of the curvilinear systems are curved.

In mathematics and physics, the Christoffel symbols are an array of numbers describing a metric connection. The metric connection is a specialization of the affine connection to surfaces or other manifolds endowed with a metric, allowing distances to be measured on that surface. In differential geometry, an affine connection can be defined without reference to a metric, and many additional concepts follow: parallel transport, covariant derivatives, geodesics, etc. also do not require the concept of a metric. However, when a metric is available, these concepts can be directly tied to the "shape" of the manifold itself; that shape is determined by how the tangent space is attached to the cotangent space by the metric tensor. Abstractly, one would say that the manifold has an associated (orthonormal) frame bundle, with each "frame" being a possible choice of a coordinate frame. An invariant metric implies that the structure group of the frame bundle is the orthogonal group $O(p, q)$ . As a result, such a manifold is necessarily a (pseudo-)Riemannian manifold. The Christoffel symbols provide a concrete representation of the connection of (pseudo-)Riemannian geometry in terms of coordinates on the manifold. Additional concepts, such as parallel transport, geodesics, etc. can then be expressed in terms of Christoffel symbols.

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of a multivariate function with respect to a single variable, into vectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems of differential equations. The notation used here is commonly used in statistics and engineering, while the tensor index notation is preferred in physics.

In geometry and linear algebra, a Cartesian tensor uses an orthonormal basis to represent a tensor in a Euclidean space in the form of components. Converting a tensor's components from one such basis to another is done through an orthogonal transformation.

In mathematics—more specifically, in differential geometry—the musical isomorphism is an isomorphism between the tangent bundle $and the cotangent bundle of a Riemannian or pseudo-Riemannian manifold induced by its metric tensor. There are similar isomorphisms on symplectic manifolds. The term musical refers to the use of the musical notation symbols (flat) and (sharp) .$

The tetrad formalism is an approach to general relativity that generalizes the choice of basis for the tangent bundle from a coordinate basis to the less restrictive choice of a local basis, i.e. a locally defined set of four linearly independent vector fields called a tetrad or vierbein. It is a special case of the more general idea of a vielbein formalism, which is set in (pseudo-)Riemannian geometry. This article as currently written makes frequent mention of general relativity; however, almost everything it says is equally applicable to (pseudo-)Riemannian manifolds in general, and even to spin manifolds. Most statements hold simply by substituting arbitrary $for . In German, " vier " translates to "four", " viel " to "many", and " bein " to "leg".$

In mathematics and mathematical physics, raising and lowering indices are operations on tensors which change their type. Raising and lowering indices are a form of index manipulation in tensor expressions.

References

↑ Einstein, Albert (1916). "The Foundation of the General Theory of Relativity". Annalen der Physik. 354 (7): 769. Bibcode:1916AnP...354..769E. doi:10.1002/andp.19163540702. Archived from the original (PDF) on 2006-08-29. Retrieved 2006-09-03.
↑ "Einstein Summation". Wolfram Mathworld. Retrieved 13 April 2011.

Bibliography

Kuptsov, L. P. (2001) [1994], "Einstein rule", Encyclopedia of Mathematics , EMS Press .

External links

Rawlings, Steve (2007-02-01). "Lecture 10 – Einstein Summation Convention and Vector Identities". Oxford University. Archived from the original on 2017-01-06. Retrieved 2008-07-02.
"Vector Calculation in Index Notation (Einstein's Summation Convention)" (PDF).
"Understanding NumPy's einsum". Stack Overflow.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Ein1916-1] Einstein, Albert (1916). "The Foundation of the General Theory of Relativity". Annalen der Physik. 354 (7): 769. Bibcode:1916AnP...354..769E. doi:10.1002/andp.19163540702. Archived from the original (PDF) on 2006-08-29. Retrieved 2006-09-03.

[wolfram-2] "Einstein Summation". Wolfram Mathworld. Retrieved 13 April 2011.

[1]

[2]