Elliptical distribution

Last updated August 20, 2024

In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution. Intuitively, in the simplified two and three dimensional case, the joint distribution forms an ellipse and an ellipsoid, respectively, in iso-density plots.

In statistics, the normal distribution is used in classical multivariate analysis, while elliptical distributions are used in generalized multivariate analysis, for the study of symmetric distributions with tails that are heavy, like the multivariate t-distribution, or light (in comparison with the normal distribution). Some statistical methods that were originally motivated by the study of the normal distribution have good performance for general elliptical distributions (with finite variance), particularly for spherical distributions (which are defined below). Elliptical distributions are also used in robust statistics to evaluate proposed multivariate-statistical procedures.

Definition

Elliptical distributions are defined in terms of the characteristic function of probability theory. A random vector $X$ on a Euclidean space has an elliptical distribution if its characteristic function $\phi$ satisfies the following functional equation (for every column-vector $t$ )

\phi _{X-\mu }(t)=\psi (t'\Sigma t)

for some location parameter $\mu$ , some nonnegative-definite matrix $\Sigma$ and some scalar function $\psi$ .^[1] The definition of elliptical distributions for real random-vectors has been extended to accommodate random vectors in Euclidean spaces over the field of complex numbers, so facilitating applications in time-series analysis.^[2] Computational methods are available for generating pseudo-random vectors from elliptical distributions, for use in Monte Carlo simulations for example.^[3]

Some elliptical distributions are alternatively defined in terms of their density functions. An elliptical distribution with a density function f has the form:

f(x)=k\cdot g((x-\mu )'\Sigma ^{-1}(x-\mu ))

where $k$ is the normalizing constant, $x$ is an $n$ -dimensional random vector with median vector $\mu$ (which is also the mean vector if the latter exists), and $\Sigma$ is a positive definite matrix which is proportional to the covariance matrix if the latter exists.^[4]

Examples

Examples include the following multivariate probability distributions:

Multivariate normal distribution
Multivariate t-distribution
Symmetric multivariate stable distribution ^[5]
Symmetric multivariate Laplace distribution ^[6]
Multivariate logistic distribution ^[7]
Multivariate symmetric general hyperbolic distribution ^[7]

Properties

In the 2-dimensional case, if the density exists, each iso-density locus (the set of x₁,x₂ pairs all giving a particular value of $f(x)$ ) is an ellipse or a union of ellipses (hence the name elliptical distribution). More generally, for arbitrary n, the iso-density loci are unions of ellipsoids. All these ellipsoids or ellipses have the common center μ and are scaled copies (homothets) of each other.

The multivariate normal distribution is the special case in which $g(z)=e^{-z/2}$ . While the multivariate normal is unbounded (each element of $x$ can take on arbitrarily large positive or negative values with non-zero probability, because $e^{-z/2}>0$ for all non-negative $z$ ), in general elliptical distributions can be bounded or unbounded—such a distribution is bounded if $g(z)=0$ for all $z$ greater than some value.

There exist elliptical distributions that have undefined mean, such as the Cauchy distribution (even in the univariate case). Because the variable x enters the density function quadratically, all elliptical distributions are symmetric about $\mu .$

If two subsets of a jointly elliptical random vector are uncorrelated, then if their means exist they are mean independent of each other (the mean of each subvector conditional on the value of the other subvector equals the unconditional mean).^[8]^{: p. 748}

If random vector X is elliptically distributed, then so is DX for any matrix D with full row rank. Thus any linear combination of the components of X is elliptical (though not necessarily with the same elliptical distribution), and any subset of X is elliptical.^[8]^{: p. 748}

Applications

Elliptical distributions are used in statistics and in economics. They are also used to calculate the landing footprints of spacecraft.

In mathematical economics, elliptical distributions have been used to describe portfolios in mathematical finance.^[9]^[10]

Statistics: Generalized multivariate analysis

In statistics, the multivariate normal distribution (of Gauss) is used in classical multivariate analysis, in which most methods for estimation and hypothesis-testing are motivated for the normal distribution. In contrast to classical multivariate analysis, generalized multivariate analysis refers to research on elliptical distributions without the restriction of normality.

For suitable elliptical distributions, some classical methods continue to have good properties.^[11]^[12] Under finite-variance assumptions, an extension of Cochran's theorem (on the distribution of quadratic forms) holds.^[13]

Spherical distribution

An elliptical distribution with a zero mean and variance in the form $\alpha I$ where $I$ is the identity-matrix is called a spherical distribution.^[14] For spherical distributions, classical results on parameter-estimation and hypothesis-testing hold have been extended.^[15]^[16] Similar results hold for linear models,^[17] and indeed also for complicated models (especially for the growth curve model). The analysis of multivariate models uses multilinear algebra (particularly Kronecker products and vectorization) and matrix calculus.^[12]^[18]^[19]

Robust statistics: Asymptotics

Another use of elliptical distributions is in robust statistics, in which researchers examine how statistical procedures perform on the class of elliptical distributions, to gain insight into the procedures' performance on even more general problems,^[20] for example by using the limiting theory of statistics ("asymptotics").^[21]

Economics and finance

Elliptical distributions are important in portfolio theory because, if the returns on all assets available for portfolio formation are jointly elliptically distributed, then all portfolios can be characterized completely by their location and scale – that is, any two portfolios with identical location and scale of portfolio return have identical distributions of portfolio return.^[22]^[8] Various features of portfolio analysis, including mutual fund separation theorems and the Capital Asset Pricing Model, hold for all elliptical distributions.^[8]^{: p. 748}

Notes

↑ Cambanis, Huang & Simons (1981 , p. 368)
↑ Fang, Kotz & Ng (1990 , Chapter 2.9 "Complex elliptically symmetric distributions", pp. 64-66)
↑ Johnson (1987 , Chapter 6, "Elliptically contoured distributions, pp. 106-124): Johnson, Mark E. (1987). Multivariate statistical simulation: A guide to selecting and generating continuous multivariate distributions. John Wiley and Sons., "an admirably lucid discussion" according to Fang, Kotz & Ng (1990 , p. 27).
↑ Frahm, G., Junker, M., & Szimayer, A. (2003). Elliptical copulas: Applicability and limitations. Statistics & Probability Letters, 63(3), 275–286.
↑ Nolan, John (September 29, 2014). "Multivariate stable densities and distribution functions: general and elliptical case" . Retrieved 2017-05-26.
↑ Pascal, F.; et al. (2013). "Parameter Estimation For Multivariate Generalized Gaussian Distributions". IEEE Transactions on Signal Processing. 61 (23): 5960–5971. arXiv: 1302.6498 . Bibcode:2013ITSP...61.5960P. doi:10.1109/TSP.2013.2282909. S2CID 3909632.
1 2 Schmidt, Rafael (2012). "Credit Risk Modeling and Estimation via Elliptical Copulae". In Bol, George; et al. (eds.). Credit Risk: Measurement, Evaluation and Management. Springer. p. 274. ISBN 9783642593659.
1 2 3 4 Owen & Rabinovitch (1983)
↑ ( Gupta, Varga & Bodnar 2013 )
↑ (Chamberlain 1983; Owen and Rabinovitch 1983)
↑ Anderson (2004 , The final section of the text (before "Problems") that are always entitled "Elliptically contoured distributions", of the following chapters: Chapters 3 ("Estimation of the mean vector and the covariance matrix", Section 3.6, pp. 101-108), 4 ("The distributions and uses of sample correlation coefficients", Section 4.5, pp. 158-163), 5 ("The generalized T²-statistic", Section 5.7, pp. 199-201), 7 ("The distribution of the sample covariance matrix and the sample generalized variance", Section 7.9, pp. 242-248), 8 ("Testing the general linear hypothesis; multivariate analysis of variance", Section 8.11, pp. 370-374), 9 ("Testing independence of sets of variates", Section 9.11, pp. 404-408), 10 ("Testing hypotheses of equality of covariance matrices and equality of mean vectors and covariance vectors", Section 10.11, pp. 449-454), 11 ("Principal components", Section 11.8, pp. 482-483), 13 ("The distribution of characteristic roots and vectors", Section 13.8, pp. 563-567))
1 2 Fang & Zhang (1990)
↑ Fang & Zhang (1990 , Chapter 2.8 "Distribution of quadratic forms and Cochran's theorem", pp. 74-81)
↑ Fang & Zhang (1990 , Chapter 2.5 "Spherical distributions", pp. 53-64)
↑ Fang & Zhang (1990 , Chapter IV "Estimation of parameters", pp. 127-153)
↑ Fang & Zhang (1990 , Chapter V "Testing hypotheses", pp. 154-187)
↑ Fang & Zhang (1990 , Chapter VII "Linear models", pp. 188-211)
↑ Pan & Fang (2007 , p. ii)
↑ Kollo & von Rosen (2005 , p. xiii)
↑ Kariya, Takeaki; Sinha, Bimal K. (1989). Robustness of statistical tests. Academic Press. ISBN 0123982308.
↑ Kollo & von Rosen (2005 , p. 221)
↑ Chamberlain (1983)

Related Research Articles

In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In probability and statistics, Student's $t$ distribution $is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.$

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In statistics, the Wishart distribution is a generalization of the gamma distribution to multiple dimensions. It is named in honor of John Wishart, who first formulated the distribution in 1928. Other names include Wishart ensemble, or Wishart–Laguerre ensemble, or LOE, LUE, LSE.

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory. The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

The Mahalanobis distance is a measure of the distance between a point $and a distribution, introduced by P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance has appeared in the Journal of The Asiatic Society of Bengal. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based on measurements. The sampling distribution of Mahalanobis distance has been obtained by Professor R.C. Bose, under the assumption of equal dispersion.$

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in R^p×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

The sensitivity index or discriminability index or detectability index is a dimensionless statistic used in signal detection theory. A higher index indicates that the signal can be more readily detected.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

<span class="mw-page-title-main">Truncated normal distribution</span> Type of probability distribution

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics.

The generalized normal distribution or generalized Gaussian distribution (GGD) is either of two families of parametric continuous probability distributions on the real line. Both families add a shape parameter to the normal distribution. To distinguish the two families, they are referred to below as "symmetric" and "asymmetric"; however, this is not a standard nomenclature.

Linear belief functions are an extension of the Dempster–Shafer theory of belief functions to the case when variables of interest are continuous. Examples of such variables include financial asset prices, portfolio performance, and other antecedent and consequent variables. The theory was originally proposed by Arthur P. Dempster in the context of Kalman Filters and later was elaborated, refined, and applied to knowledge representation in artificial intelligence and decision making in finance and accounting by Liping Liu.

In probability theory, the family of complex normal distributions, denoted $or, characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix, and the relation matrix . The standard complex normal is the univariate distribution with,, and .$

In probability theory and statistics, the generalized chi-squared distribution is the distribution of a quadratic form of a multinormal variable, or a linear combination of different normal variables and squares of normal variables. Equivalently, it is also a linear sum of independent noncentral chi-square variables and a normal variable. There are several other such generalizations for which the same term is sometimes used; some of them are special cases of the family discussed here, for example the gamma distribution.

In the mathematical theory of probability, multivariate Laplace distributions are extensions of the Laplace distribution and the asymmetric Laplace distribution to multiple variables. The marginal distributions of symmetric multivariate Laplace distribution variables are Laplace distributions. The marginal distributions of asymmetric multivariate Laplace distribution variables are asymmetric Laplace distributions.

In probability theory, a log-t distribution or log-Student t distribution is a probability distribution of a random variable whose logarithm is distributed in accordance with a Student's t-distribution. If X is a random variable with a Student's t-distribution, then Y = exp(X) has a log-t distribution; likewise, if Y has a log-t distribution, then X = log(Y) has a Student's t-distribution.

References

Anderson, T. W. (2004). An introduction to multivariate statistical analysis (3rd ed.). New York: John Wiley and Sons. ISBN 9789812530967.
Cambanis, Stamatis; Huang, Steel; Simons, Gordon (1981). "On the theory of elliptically contoured distributions". Journal of Multivariate Analysis. 11 (3): 368–385. doi: 10.1016/0047-259x(81)90082-8 .
Chamberlain, Gary (February 1983). "A characterization of the distributions that imply mean—Variance utility functions". Journal of Economic Theory. 29 (1): 185–201. doi:10.1016/0022-0531(83)90129-1.
Fang, Kai-Tai; Zhang, Yao-Ting (1990). Generalized multivariate analysis. Science Press (Beijing) and Springer-Verlag (Berlin). ISBN 3540176519. OCLC 622932253.
Fang, Kai-Tai; Kotz, Samuel; Ng, Kai Wang ("Kai-Wang" on front cover) (1990). Symmetric multivariate and related distributions. Monographs on statistics and applied probability. Vol. 36. London: Chapman and Hall. ISBN 0-412-314-304. OCLC 123206055.
Gupta, Arjun K.; Varga, Tamas; Bodnar, Taras (2013). Elliptically contoured models in statistics and portfolio theory (2nd ed.). New York: Springer-Verlag. doi:10.1007/978-1-4614-8154-6. ISBN 978-1-4614-8153-9.
Originally Gupta, Arjun K.; Varga, Tamas (1993). Elliptically contoured models in statistics. Mathematics and Its Applications (1st ed.). Dordrecht: Kluwer Academic Publishers. ISBN 0792326083.
Kollo, Tõnu; von Rosen, Dietrich (2005). Advanced multivariate statistics with matrices. Dordrecht: Springer. ISBN 978-1-4020-3418-3.
Owen, Joel; Rabinovitch, Ramon (June 1983). "On the Class of Elliptical Distributions and their Applications to the Theory of Portfolio Choice". The Journal of Finance. 38 (3): 745–752. doi:10.2307/2328079. JSTOR 2328079.
Pan, Jianxin; Fang, Kaitai (2007). Growth curve models and statistical diagnostics (PDF). Springer series in statistics. Science Press (Beijing) and Springer-Verlag (New York). doi:10.1007/978-0-387-21812-0. ISBN 9780387950532. OCLC 44162563.