Cochran's theorem

Last updated

In statistics, Cochran's theorem, devised by William G. Cochran, [1] is a theorem used to justify results relating to the probability distributions of statistics that are used in the analysis of variance. [2]

Contents

Examples

Sample mean and sample variance

If X1, ..., Xn are independent normally distributed random variables with mean μ and standard deviation σ then

is standard normal for each i. Note that the total Q is equal to sum of squared Us as shown here:

which stems from the original assumption that . So instead we will calculate this quantity and later separate it into Qi's. It is possible to write

(here is the sample mean). To see this identity, multiply throughout by and note that

and expand to give

The third term is zero because it is equal to a constant times

and the second term has just n identical terms added together. Thus

and hence

Now with the matrix of ones which has rank 1. In turn given that . This expression can be also obtained by expanding in matrix notation. It can be shown that the rank of is as the addition of all its rows is equal to zero. Thus the conditions for Cochran's theorem are met.

Cochran's theorem then states that Q1 and Q2 are independent, with chi-squared distributions with n 1 and 1 degree of freedom respectively. This shows that the sample mean and sample variance are independent. This can also be shown by Basu's theorem, and in fact this property characterizes the normal distribution – for no other distribution are the sample mean and sample variance independent. [3]

Distributions

The result for the distributions is written symbolically as

Both these random variables are proportional to the true but unknown variance σ2. Thus their ratio does not depend on σ2 and, because they are statistically independent. The distribution of their ratio is given by

where F1,n  1 is the F-distribution with 1 and n  1 degrees of freedom (see also Student's t-distribution). The final step here is effectively the definition of a random variable having the F-distribution.

Estimation of variance

To estimate the variance σ2, one estimator that is sometimes used is the maximum likelihood estimator of the variance of a normal distribution

Cochran's theorem shows that

and the properties of the chi-squared distribution show that

Alternative formulation

The following version is often seen when considering linear regression. [4] Suppose that is a standard multivariate normal random vector (here denotes the n-by-n identity matrix), and if are all n-by-n symmetric matrices with . Then, on defining , any one of the following conditions implies the other two:


Statement

Let U1, ..., UN be i.i.d. standard normally distributed random variables, and . Let be symmetric matrices. Define ri to be the rank of . Define , so that the Qi are quadratic forms. Further assume .

Cochran's theorem states that the following are equivalent:

Often it's stated as , where is idempotent, and is replaced by . But after an orthogonal transform, , and so we reduce to the above theorem.

Proof

Claim: Let be a standard Gaussian in , then for any symmetric matrices , if and have the same distribution, then have the same eigenvalues (up to multiplicity).

Proof

Let the eigenvalues of be , then calculate the characteristic function of . It comes out to be

(To calculate it, first diagonalize , change into that frame, then use the fact that the characteristic function of the sum of independent variables is the product of their characteristic functions.)

For and to be equal, their characteristic functions must be equal, so have the same eigenvalues (up to multiplicity).

Claim: .

Proof

. Since is symmetric, and , by the previous claim, has the same eigenvalues as 0.

Lemma: If , all symmetric, and have eigenvalues 0, 1, then they are simultaneously diagonalizable.

Proof

Fix i, and consider the eigenvectors v of such that . Then we have , so all . Thus we obtain a split of into , such that V is the 1-eigenspace of , and in the 0-eigenspaces of all other . Now induct by moving into .

Now we prove the original theorem. We prove that the three cases are equivalent by proving that each case implies the next one in a cycle ().

Proof

Case: All are independent

Fix some , define , and diagonalize by an orthogonal transform . Then consider . It is diagonalized as well.

Let , then it is also standard Gaussian. Then we have

Inspect their diagonal entries, to see that implies that their nonzero diagonal entries are disjoint.

Thus all eigenvalues of are 0, 1, so is a dist with degrees of freedom.

Case: Each is a distribution.

Fix any , diagonalize it by orthogonal transform , and reindex, so that . Then for some , a spherical rotation of .

Since , we get all . So all , and have eigenvalues .

So diagonalize them simultaneously, add them up, to find .

Case: .

We first show that the matrices B(i) can be simultaneously diagonalized by an orthogonal matrix and that their non-zero eigenvalues are all equal to +1. Once that's shown, take this orthogonal transform to this simultaneous eigenbasis, in which the random vector becomes , but all are still independent and standard Gaussian. Then the result follows.

Each of the matrices B(i) has rank ri and thus ri non-zero eigenvalues. For each i, the sum has at most rank . Since , it follows that C(i) has exactly rank N  ri.

Therefore B(i) and C(i) can be simultaneously diagonalized. This can be shown by first diagonalizing B(i), by the spectral theorem. In this basis, it is of the form:

Thus the lower rows are zero. Since , it follows that these rows in C(i) in this basis contain a right block which is a unit matrix, with zeros in the rest of these rows. But since C(i) has rank N  ri, it must be zero elsewhere. Thus it is diagonal in this basis as well. It follows that all the non-zero eigenvalues of both B(i) and C(i) are +1. This argument applies for all i, thus all B(i) are positive semidefinite.

Moreover, the above analysis can be repeated in the diagonal basis for . In this basis is the identity of an vector space, so it follows that both B(2) and are simultaneously diagonalizable in this vector space (and hence also together with B(1)). By iteration it follows that all B-s are simultaneously diagonalizable.

Thus there exists an orthogonal matrix such that for all , is diagonal, where any entry with indices , , is equal to 1, while any entry with other indices is equal to 0.


See also

Related Research Articles

In number theory, an arithmetic, arithmetical, or number-theoretic function is generally any function f(n) whose domain is the positive integers and whose range is a subset of the complex numbers. Hardy & Wright include in their definition the requirement that an arithmetical function "expresses some arithmetical property of n". There is a larger class of number-theoretic functions that do not fit this definition, for example, the prime-counting functions. This article provides links to functions of both classes.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

<span class="mw-page-title-main">Skewness</span> Measure of the asymmetry of random variables

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

<span class="mw-page-title-main">Chi-squared distribution</span> Probability distribution and special case of gamma distribution

In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

<span class="mw-page-title-main">Rayleigh distribution</span> Probability distribution

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.

In quantum field theory, the Dirac spinor is the spinor that describes all known fundamental particles that are fermions, with the possible exception of neutrinos. It appears in the plane-wave solution to the Dirac equation, and is a certain combination of two Weyl spinors, specifically, a bispinor that transforms "spinorially" under the action of the Lorentz group.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

The spectrum of a linear operator that operates on a Banach space is a fundamental concept of functional analysis. The spectrum consists of all scalars such that the operator does not have a bounded inverse on . The spectrum has a standard decomposition into three parts:

In theoretical physics, the superconformal algebra is a graded Lie algebra or superalgebra that combines the conformal algebra and supersymmetry. In two dimensions, the superconformal algebra is infinite-dimensional. In higher dimensions, superconformal algebras are finite-dimensional and generate the superconformal group.

In mathematics, an Azumaya algebra is a generalization of central simple algebras to -algebras where need not be a field. Such a notion was introduced in a 1951 paper of Goro Azumaya, for the case where is a commutative local ring. The notion was developed further in ring theory, and in algebraic geometry, where Alexander Grothendieck made it the basis for his geometric theory of the Brauer group in Bourbaki seminars from 1964–65. There are now several points of access to the basic definitions.

In the mathematical discipline of functional analysis, the concept of a compact operator on Hilbert space is an extension of the concept of a matrix acting on a finite-dimensional vector space; in Hilbert space, compact operators are precisely the closure of finite-rank operators in the topology induced by the operator norm. As such, results from matrix theory can sometimes be extended to compact operators using similar arguments. By contrast, the study of general operators on infinite-dimensional spaces often requires a genuinely different approach.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

<span class="mw-page-title-main">Wrapped normal distribution</span>

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

<span class="mw-page-title-main">Generalized chi-squared distribution</span>

In probability theory and statistics, the generalized chi-squared distribution is the distribution of a quadratic form of a multinormal variable, or a linear combination of different normal variables and squares of normal variables. Equivalently, it is also a linear sum of independent noncentral chi-square variables and a normal variable. There are several other such generalizations for which the same term is sometimes used; some of them are special cases of the family discussed here, for example the gamma distribution.

<span class="mw-page-title-main">Weyl equation</span> Relativistic wave equation describing massless fermions

In physics, particularly in quantum field theory, the Weyl equation is a relativistic wave equation for describing massless spin-1/2 particles called Weyl fermions. The equation is named after Hermann Weyl. The Weyl fermions are one of the three possible types of elementary fermions, the other two being the Dirac and the Majorana fermions.

In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity.

References

  1. 1 2 Cochran, W. G. (April 1934). "The distribution of quadratic forms in a normal system, with applications to the analysis of covariance". Mathematical Proceedings of the Cambridge Philosophical Society . 30 (2): 178–191. doi:10.1017/S0305004100016595.
  2. Bapat, R. B. (2000). Linear Algebra and Linear Models (Second ed.). Springer. ISBN   978-0-387-98871-9.
  3. Geary, R.C. (1936). "The Distribution of "Student's" Ratio for Non-Normal Samples". Supplement to the Journal of the Royal Statistical Society. 3 (2): 178–184. doi:10.2307/2983669. JFM   63.1090.03. JSTOR   2983669.
  4. "Cochran's Theorem (A quick tutorial)" (PDF).
  5. "Cochran's theorem", A Dictionary of Statistics, Oxford University Press, 2008-01-01, doi:10.1093/acref/9780199541454.001.0001/acref-9780199541454-e-294, ISBN   978-0-19-954145-4 , retrieved 2022-05-18