Isserlis' theorem

Last updated

In probability theory, Isserlis' theorem or Wick's probability theorem is a formula that allows one to compute higher-order moments of the multivariate normal distribution in terms of its covariance matrix. It is named after Leon Isserlis.

Contents

This theorem is also particularly important in particle physics, where it is known as Wick's theorem after the work of Wick (1950). [1] Other applications include the analysis of portfolio returns, [2] quantum field theory [3] and generation of colored noise. [4]

Statement

If is a zero-mean multivariate normal random vector, thenwhere the sum is over all the pairings of , i.e. all distinct ways of partitioning into pairs , and the product is over the pairs contained in . [5] [6]

More generally, if is a zero-mean complex-valued multivariate normal random vector, then the formula still holds.

The expression on the right-hand side is also known as the hafnian of the covariance matrix of .

Odd case

If is odd, there does not exist any pairing of . Under this hypothesis, Isserlis' theorem implies that This also follows from the fact that has the same distribution as , which implies that .

Even case

In his original paper, [7] Leon Isserlis proves this theorem by mathematical induction, generalizing the formula for the order moments, [8] which takes the appearance

If is even, there exist (see double factorial) pair partitions of : this yields terms in the sum. For example, for order moments (i.e. random variables) there are three terms. For -order moments there are terms, and for -order moments there are terms.

Example

We can evaluate the characteristic function of gaussians by the Isserlis theorem:

Proof

Since both sides of the formula are multilinear in , if we can prove the real case, we get the complex case for free.

Let be the covariance matrix, so that we have the zero-mean multivariate normal random vector . Since both sides of the formula are continuous with respect to , it suffices to prove the case when is invertible.

Using quadratic factorization , we get

Differentiate under the integral sign with to obtain

.

That is, we need only find the coefficient of term in the Taylor expansion of .

If is odd, this is zero. So let , then we need only find the coefficient of term in the polynomial .

Expand the polynomial and count, we obtain the formula.

Generalizations

Gaussian integration by parts

An equivalent formulation of the Wick's probability formula is the Gaussian integration by parts. If is a zero-mean multivariate normal random vector, then

This is a generalization of Stein's lemma.

The Wick's probability formula can be recovered by induction, considering the function defined by . Among other things, this formulation is important in Liouville conformal field theory to obtain conformal Ward identities, BPZ equations [9] and to prove the Fyodorov-Bouchaud formula. [10]

Non-Gaussian random variables

For non-Gaussian random variables, the moment-cumulants formula [11] replaces the Wick's probability formula. If is a vector of random variables, then where the sum is over all the partitions of , the product is over the blocks of and is the joint cumulant of .

See also

Related Research Articles

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In linear algebra, the permanent of a square matrix is a function of the matrix similar to the determinant. The permanent, as well as the determinant, is a polynomial in the entries of the matrix. Both are special cases of a more general function of a matrix called the immanant.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of children from a primary school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Kriging</span> Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

In probability theory, the central limit theorem states that, under certain circumstances, the probability distribution of the scaled mean of a random sample converges to a normal distribution as the sample size increases to infinity. Under stronger assumptions, the Berry–Esseen theorem, or Berry–Esseen inequality, gives a more quantitative result, because it also specifies the rate at which this convergence takes place by giving a bound on the maximal error of approximation between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. In the case of independent samples, the convergence rate is n−1/2, where n is the sample size, and the constant is estimated in terms of the third absolute normalized moment.

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory. The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

In algebra, the Leibniz formula, named in honor of Gottfried Leibniz, expresses the determinant of a square matrix in terms of permutations of the matrix elements. If is an matrix, where is the entry in the -th row and -th column of , the formula is

In statistics, an exchangeable sequence of random variables is a sequence X1X2X3, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. In other words, the joint distribution is invariant to finite permutation. Thus, for example the sequences

In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

Kernel methods are a well-established tool to analyze the relationship between input data and the corresponding output of a function. Kernels encapsulate the properties of functions in a computationally efficient way and allow algorithms to easily swap functions of varying complexity.

In mathematics, the hafnian is a scalar function of a symmetric matrix that generalizes the permanent.

q-Gaussian processes are deformations of the usual Gaussian distribution. There are several different versions of this; here we treat a multivariate deformation, also addressed as q-Gaussian process, arising from free probability theory and corresponding to deformations of the canonical commutation relations. For other deformations of Gaussian distributions, see q-Gaussian distribution and Gaussian q-distribution.

References

  1. Wick, G.C. (1950). "The evaluation of the collision matrix". Physical Review . 80 (2): 268–272. Bibcode:1950PhRv...80..268W. doi:10.1103/PhysRev.80.268.
  2. Repetowicz, Przemysław; Richmond, Peter (2005). "Statistical inference of multivariate distribution parameters for non-Gaussian distributed time series" (PDF). Acta Physica Polonica B. 36 (9): 2785–2796. Bibcode:2005AcPPB..36.2785R.
  3. Perez-Martin, S.; Robledo, L.M. (2007). "Generalized Wick's theorem for multiquasiparticle overlaps as a limit of Gaudin's theorem". Physical Review C. 76 (6): 064314. arXiv: 0707.3365 . Bibcode:2007PhRvC..76f4314P. doi:10.1103/PhysRevC.76.064314. S2CID   119627477.
  4. Bartosch, L. (2001). "Generation of colored noise". International Journal of Modern Physics C. 12 (6): 851–855. Bibcode:2001IJMPC..12..851B. doi:10.1142/S0129183101002012. S2CID   54500670.
  5. Janson, Svante (June 1997). Gaussian Hilbert Spaces. Cambridge Core. doi:10.1017/CBO9780511526169. ISBN   9780521561280 . Retrieved 2019-11-30.
  6. Michalowicz, J.V.; Nichols, J.M.; Bucholtz, F.; Olson, C.C. (2009). "An Isserlis' theorem for mixed Gaussian variables: application to the auto-bispectral density". Journal of Statistical Physics. 136 (1): 89–102. Bibcode:2009JSP...136...89M. doi:10.1007/s10955-009-9768-3. S2CID   119702133.
  7. Isserlis, L. (1918). "On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables". Biometrika . 12 (1–2): 134–139. doi:10.1093/biomet/12.1-2.134. JSTOR   2331932.
  8. Isserlis, L. (1916). "On Certain Probable Errors and Correlation Coefficients of Multiple Frequency Distributions with Skew Regression". Biometrika . 11 (3): 185–190. doi:10.1093/biomet/11.3.185. JSTOR   2331846.
  9. Kupiainen, Antti; Rhodes, Rémi; Vargas, Vincent (2019-11-01). "Local Conformal Structure of Liouville Quantum Gravity". Communications in Mathematical Physics. 371 (3): 1005–1069. arXiv: 1512.01802 . Bibcode:2019CMaPh.371.1005K. doi:10.1007/s00220-018-3260-3. ISSN   1432-0916. S2CID   55282482.
  10. Remy, Guillaume (2020). "The Fyodorov–Bouchaud formula and Liouville conformal field theory". Duke Mathematical Journal. 169. arXiv: 1710.06897 . doi:10.1215/00127094-2019-0045. S2CID   54777103.
  11. Leonov, V. P.; Shiryaev, A. N. (January 1959). "On a Method of Calculation of Semi-Invariants". Theory of Probability & Its Applications. 4 (3): 319–329. doi:10.1137/1104031.

Further reading