Multivariate t-distribution

Last updated
Multivariate t
Notation
Parameters location (real vector)
scale matrix (positive-definite real matrix)
(real) represents the degrees of freedom
Support
PDF
CDF No analytic expression, but see text for approximations
Mean if ; else undefined
Median
Mode
Variance if ; else undefined
Skewness 0

In statistics, the multivariate t-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Contents

Definition

One common method of construction of a multivariate t-distribution, for the case of dimensions, is based on the observation that if and are independent and distributed as and (i.e. multivariate normal and chi-squared distributions) respectively, the matrix is a p × p matrix, and is a constant vector then the random variable has the density [1]

and is said to be distributed as a multivariate t-distribution with parameters . Note that is not the covariance matrix since the covariance is given by (for ).

The constructive definition of a multivariate t-distribution simultaneously serves as a sampling algorithm:

  1. Generate and , independently.
  2. Compute .

This formulation gives rise to the hierarchical representation of a multivariate t-distribution as a scale-mixture of normals: where indicates a gamma distribution with density proportional to , and conditionally follows .

In the special case , the distribution is a multivariate Cauchy distribution.

Derivation

There are in fact many candidates for the multivariate generalization of Student's t-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (), with and , we have the probability density function

and one approach is to use a corresponding function of several variables. This is the basic idea of elliptical distribution theory, where one writes down a corresponding function of variables that replaces by a quadratic function of all the . It is clear that this only makes sense when all the marginal distributions have the same degrees of freedom . With , one has a simple choice of multivariate density function

which is the standard but not the only choice.

An important special case is the standard bivariate t-distribution, p = 2:

Note that .

Now, if is the identity matrix, the density is

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When is diagonal the standard representation can be shown to have zero correlation but the marginal distributions are not statistically independent.

A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios. [2]

Cumulative distribution function

The definition of the cumulative distribution function (cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here is a real vector):

There is no simple formula for , but it can be approximated numerically via Monte Carlo integration. [3] [4] [5]

Conditional Distribution

This was developed by Muirhead [6] and Cornish. [7] but later derived using the simpler chi-squared ratio representation above, by Roth [1] and Ding. [8] Let vector follow a multivariate t distribution and partition into two subvectors of elements:

where , the known mean vectors are and the scale matrix is .

Roth and Ding find the conditional distribution to be a new t-distribution with modified parameters.

An equivalent expression in Kotz et. al. is somewhat less concise.

Forming first an intermediate distribution , the explicit conditional distribution renders as:

where

Effective degrees of freedom, augmented by the disused variables.
is the conditional mean of
is the Schur complement of ; the conditional covariance.
is the squared Mahalanobis distance of from with scale matrix

Copulas based on the multivariate t

The use of such distributions is enjoying renewed interest due to applications in mathematical finance, especially through the use of the Student's t copula. [9]

Elliptical Representation

Constructed as an elliptical distribution, [10] take the simplest centralised case with spherical symmetry and no scaling, , then the multivariate t-PDF takes the form

where and = degrees of freedom as defined in Muirhead [6] section 1.5. The covariance of is

The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder, [11] define radial measure and, noting that the density is dependent only on r2, we get

which is equivalent to the variance of -element vector treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.

Radial Distribution

follows the Fisher-Snedecor or distribution:

having mean value . -distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation.

By a change of random variable to in the equation above, retaining -vector , we have and probability distribution

which is a regular Beta-prime distribution having mean value .

Cumulative Radial Distribution

Given the Beta-prime distribution, the radial cumulative distribution function of is known:

where is the incomplete Beta function and applies with a spherical assumption.

In the scalar case, , the distribution is equivalent to Student-t with the equivalence , the variable t having double-sided tails for CDF purposes, i.e. the "two-tail-t-test".

The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at with PDF is an iso-density surface. Given this density value, the quantum of probability on a shell of surface area and thickness at is .

The enclosed -sphere of radius has surface area . Substitution into shows that the shell has element of probability which is equivalent to radial density function

which further simplifies to where is the Beta function.

Changing the radial variable to returns the previous Beta Prime distribution

To scale the radial variables without changing the radial shape function, define scale matrix , yielding a 3-parameter Cartesian density function, ie. the probability in volume element is

or, in terms of scalar radial variable ,

Radial Moments

The moments of all the radial variables , with the spherical distribution assumption, can be derived from the Beta Prime distribution. If then , a known result. Thus, for variable we have

The moments of are

while introducing the scale matrix yields

Moments relating to radial variable are found by setting and whereupon

Linear Combinations and Affine Transformation

Full Rank Transform

This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf: , where is a constant and is arbitrary but fixed, let be a full-rank matrix and form vector . Then, by straightforward change of variables

The matrix of partial derivatives is and the Jacobian becomes . Thus

The denominator reduces to

In full:

which is a regular MV-t distribution.

In general if and has full rank then

Marginal Distributions

This is a special case of the rank-reducing linear transform below. Kotz defines marginal distributions as follows. Partition into two subvectors of elements:

with , means , scale matrix

then , such that

If a transformation is constructed in the form

then vector , as discussed below, has the same distribution as the marginal distribution of .

Rank-Reducing Linear Transform

In the linear transform case, if is a rectangular matrix , of rank the result is dimensionality reduction. Here, Jacobian is seemingly rectangular but the value in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken. [12] In general if and has full rank then

In extremis, if m = 1 and becomes a row vector, then scalar Y follows a univariate double-sided Student-t distribution defined by with the same degrees of freedom. Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-t.

See also

Related Research Articles

<span class="mw-page-title-main">Lorentz transformation</span> Family of linear transformations

In physics, the Lorentz transformations are a six-parameter family of linear transformations from a coordinate frame in spacetime to another frame that moves at a constant velocity relative to the former. The respective inverse transformation is then parameterized by the negative of this velocity. The transformations are named after the Dutch physicist Hendrik Lorentz.

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In particle physics, the Dirac equation is a relativistic wave equation derived by British physicist Paul Dirac in 1928. In its free form, or including electromagnetic interactions, it describes all spin-12 massive particles, called "Dirac particles", such as electrons and quarks for which parity is a symmetry. It is consistent with both the principles of quantum mechanics and the theory of special relativity, and was the first theory to account fully for special relativity in the context of quantum mechanics. It was validated by accounting for the fine structure of the hydrogen spectrum in a completely rigorous way.

<span class="mw-page-title-main">Navier–Stokes equations</span> Equations describing the motion of viscous fluid substances

The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances. They were named after French engineer and physicist Claude-Louis Navier and the Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).

Linear elasticity is a mathematical model of how solid objects deform and become internally stressed due to prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In physics and astronomy, the Reissner–Nordström metric is a static solution to the Einstein–Maxwell field equations, which corresponds to the gravitational field of a charged, non-rotating, spherically symmetric body of mass M. The analogous solution for a charged, rotating body is given by the Kerr–Newman metric.

In probability and statistics, a circular distribution or polar distribution is a probability distribution of a random variable whose values are angles, usually taken to be in the range [0, 2π). A circular distribution is often a continuous probability distribution, and hence has a probability density, but such distributions can also be discrete, in which case they are called circular lattice distributions. Circular distributions can be used even when the variables concerned are not explicitly angles: the main consideration is that there is not usually any real distinction between events occurring at the opposite ends of the range, and the division of the range could notionally be made at any point.

<span class="mw-page-title-main">Rice distribution</span> Probability distribution

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice (1907–1986).

<span class="mw-page-title-main">Ornstein–Uhlenbeck process</span> Stochastic process modeling random walk with friction

In mathematics, the Ornstein–Uhlenbeck process is a stochastic process with applications in financial mathematics and the physical sciences. Its original application in physics was as a model for the velocity of a massive Brownian particle under the influence of friction. It is named after Leonard Ornstein and George Eugene Uhlenbeck.

Noncentral <i>t</i>-distribution Probability distribution

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

In directional statistics, the von Mises–Fisher distribution, is a probability distribution on the -sphere in . If the distribution reduces to the von Mises distribution on the circle.

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members often asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

<span class="mw-page-title-main">Shifted log-logistic distribution</span>

The shifted log-logistic distribution is a probability distribution also known as the generalized log-logistic or the three-parameter log-logistic distribution. It has also been called the generalized logistic distribution, but this conflicts with other uses of the term: see generalized logistic distribution.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

<span class="mw-page-title-main">Variance gamma process</span> Concept in probability

In the theory of stochastic processes, a part of the mathematical theory of probability, the variance gamma (VG) process, also known as Laplace motion, is a Lévy process determined by a random time change. The process has finite moments, distinguishing it from many Lévy processes. There is no diffusion component in the VG process and it is thus a pure jump process. The increments are independent and follow a variance-gamma distribution, which is a generalization of the Laplace distribution.

The Carter constant is a conserved quantity for motion around black holes in the general relativistic formulation of gravity. Its SI base units are kg2⋅m4⋅s−2. Carter's constant was derived for a spinning, charged black hole by Australian theoretical physicist Brandon Carter in 1968. Carter's constant along with the energy , axial angular momentum , and particle rest mass provide the four conserved quantities necessary to uniquely determine all orbits in the Kerr–Newman spacetime.

In probability theory and statistics, the normal-inverse-Wishart distribution is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix.

In probability theory and statistics, the generalized multivariate log-gamma (G-MVLG) distribution is a multivariate distribution introduced by Demirhan and Hamurkaroglu in 2011. The G-MVLG is a flexible distribution. Skewness and kurtosis are well controlled by the parameters of the distribution. This enables one to control dispersion of the distribution. Because of this property, the distribution is effectively used as a joint prior distribution in Bayesian analysis, especially when the likelihood is not from the location-scale family of distributions such as normal distribution.

References

  1. 1 2 Roth, Michael (17 April 2013). "On the Multivariate t Distribution" (PDF). Automatic Control group. Linköpin University, Sweden. Archived (PDF) from the original on 31 July 2022. Retrieved 1 June 2022.
  2. 1 2 Bodnar, T; Okhrin, Y (2008). "Properties of the Singular, Inverse and Generalized inverse Partitioned Wishart Distribution" (PDF). Journal of Multivariate Analysis. 99 (Eqn.20): 2389–2405.
  3. Botev, Z.; Chen, Y.-L. (2022). "Chapter 4: Truncated Multivariate Student Computations via Exponential Tilting.". In Botev, Zdravko; Keller, Alexander; Lemieux, Christiane; Tuffin, Bruno (eds.). Advances in Modeling and Simulation: Festschrift for Pierre L'Ecuyer. Springer. pp. 65–87. ISBN   978-3-031-10192-2.
  4. Botev, Z. I.; L'Ecuyer, P. (6 December 2015). "Efficient probability estimation and simulation of the truncated multivariate student-t distribution". 2015 Winter Simulation Conference (WSC). Huntington Beach, CA, USA: IEEE. pp. 380–391. doi:10.1109/WSC.2015.7408180.
  5. Genz, Alan (2009). Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics. Vol. 195. Springer. doi:10.1007/978-3-642-01689-9. ISBN   978-3-642-01689-9. Archived from the original on 2022-08-27. Retrieved 2017-09-05.
  6. 1 2 Muirhead, Robb (1982). Aspects of Multivariate Statistical Theory. USA: Wiley. pp. 32–36 Theorem 1.5.4. ISBN   978-0-47 1-76985-9.
  7. Cornish, E A (1954). "The Multivariate t-Distribution Associated with a Set of Normal Sample Deviates". Australian Journal of Physics. 7: 531–542. doi: 10.1071/PH550193 .
  8. Ding, Peng (2016). "On the Conditional Distribution of the Multivariate t Distribution". The American Statistician. 70 (3): 293–295. arXiv: 1604.00561 . doi:10.1080/00031305.2016.1164756. S2CID   55842994.
  9. Demarta, Stefano; McNeil, Alexander (2004). "The t Copula and Related Copulas" (PDF). Risknet.
  10. Osiewalski, Jacek; Steele, Mark (1996). "Posterior Moments of Scale Parameters in Elliptical Sampling Models". Bayesian Analysis in Statistics and Econometrics. Wiley. pp. 323–335. ISBN   0-471-11856-7.
  11. Kibria, K M G; Joarder, A H (Jan 2006). "A short review of multivariate t distribution" (PDF). Journal of Statistical Research. 40 (1): 59–72. doi:10.1007/s42979-021-00503-0. S2CID   232163198.
  12. Aitken, A C - (1948). Determinants and Matrices (5th ed.). Edinburgh: Oliver and Boyd. pp. Chapter IV, section 36.
  13. Giron, Javier; del Castilo, Carmen (2010). "The multivariate Behrens–Fisher distribution". Journal of Multivariate Analysis. 101 (9): 2091–2102. doi: 10.1016/j.jmva.2010.04.008 .
  14. Okhrin, Y; Schmid, W (2006). "Distributional Properties of Portfolio Weights". Journal of Econometrics. 134: 235–256.
  15. Bodnar, T; Dmytriv, S; Parolya, N; Schmid, W (2019). "Tests for the Weights of the Global Minimum Variance Portfolio in a High-Dimensional Setting". IEEE Trans. on Signal Processing. 67 (17): 4479–4493.

Literature