In probability theory, Isserlis' theorem or Wick's probability theorem is a formula that allows one to compute higher-order moments of the multivariate normal distribution in terms of its covariance matrix. It is named after Leon Isserlis.
This theorem is also particularly important in particle physics, where it is known as Wick's theorem after the work of Wick (1950). [1] Other applications include the analysis of portfolio returns, [2] quantum field theory [3] and generation of colored noise. [4]
If is a zero-mean multivariate normal random vector, thenwhere the sum is over all the pairings of , i.e. all distinct ways of partitioning into pairs , and the product is over the pairs contained in . [5] [6]
More generally, if is a zero-mean complex-valued multivariate normal random vector, then the formula still holds.
The expression on the right-hand side is also known as the hafnian of the covariance matrix of .
If is odd, there does not exist any pairing of . Under this hypothesis, Isserlis' theorem implies that This also follows from the fact that has the same distribution as , which implies that .
In his original paper, [7] Leon Isserlis proves this theorem by mathematical induction, generalizing the formula for the order moments, [8] which takes the appearance
If is even, there exist (see double factorial) pair partitions of : this yields terms in the sum. For example, for order moments (i.e. random variables) there are three terms. For -order moments there are terms, and for -order moments there are terms.
We can evaluate the characteristic function of gaussians by the Isserlis theorem:
Since both sides of the formula are multilinear in , if we can prove the real case, we get the complex case for free.
Let be the covariance matrix, so that we have the zero-mean multivariate normal random vector . Since both sides of the formula are continuous with respect to , it suffices to prove the case when is invertible.
Using quadratic factorization , we get
Differentiate under the integral sign with to obtain
.
That is, we need only find the coefficient of term in the Taylor expansion of .
If is odd, this is zero. So let , then we need only find the coefficient of term in the polynomial .
Expand the polynomial and count, we obtain the formula.
An equivalent formulation of the Wick's probability formula is the Gaussian integration by parts. If is a zero-mean multivariate normal random vector, then
This is a generalization of Stein's lemma.
The Wick's probability formula can be recovered by induction, considering the function defined by . Among other things, this formulation is important in Liouville conformal field theory to obtain conformal Ward identities, BPZ equations [9] and to prove the Fyodorov-Bouchaud formula. [10]
For non-Gaussian random variables, the moment-cumulants formula [11] replaces the Wick's probability formula. If is a vector of random variables, then where the sum is over all the partitions of , the product is over the blocks of and is the joint cumulant of .
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.
Covariance in probability theory and statistics is a measure of the joint variability of two random variables.
In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.
In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.
In linear algebra, the permanent of a square matrix is a function of the matrix similar to the determinant. The permanent, as well as the determinant, is a polynomial in the entries of the matrix. Both are special cases of a more general function of a matrix called the immanant.
In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of children from a primary school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.
In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.
In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.
In probability theory, the central limit theorem states that, under certain circumstances, the probability distribution of the scaled mean of a random sample converges to a normal distribution as the sample size increases to infinity. Under stronger assumptions, the Berry–Esseen theorem, or Berry–Esseen inequality, gives a more quantitative result, because it also specifies the rate at which this convergence takes place by giving a bound on the maximal error of approximation between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. In the case of independent samples, the convergence rate is n−1/2, where n is the sample size, and the constant is estimated in terms of the third absolute normalized moment.
Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory. The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.
In algebra, the Leibniz formula, named in honor of Gottfried Leibniz, expresses the determinant of a square matrix in terms of permutations of the matrix elements. If is an matrix, where is the entry in the -th row and -th column of , the formula is
In statistics, an exchangeable sequence of random variables is a sequence X1, X2, X3, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. In other words, the joint distribution is invariant to finite permutation. Thus, for example the sequences
In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.
Kernel methods are a well-established tool to analyze the relationship between input data and the corresponding output of a function. Kernels encapsulate the properties of functions in a computationally efficient way and allow algorithms to easily swap functions of varying complexity.
In mathematics, the hafnian is a scalar function of a symmetric matrix that generalizes the permanent.
q-Gaussian processes are deformations of the usual Gaussian distribution. There are several different versions of this; here we treat a multivariate deformation, also addressed as q-Gaussian process, arising from free probability theory and corresponding to deformations of the canonical commutation relations. For other deformations of Gaussian distributions, see q-Gaussian distribution and Gaussian q-distribution.