Definition
One common method of construction of a multivariate t-distribution, for the case of  dimensions, is based on the observation that if
 dimensions, is based on the observation that if  and
 and  are independent and distributed as
 are independent and distributed as  and
 and  (i.e. multivariate normal and chi-squared distributions) respectively, the matrix
 (i.e. multivariate normal and chi-squared distributions) respectively, the matrix  is a p × p matrix, and
 is a p × p matrix, and  is a constant vector then the random variable
 is a constant vector then the random variable  has the density [1]
 has the density [1] 

and is said to be distributed as a multivariate t-distribution with parameters  . Note that
. Note that  is not the covariance matrix since the covariance is given by
 is not the covariance matrix since the covariance is given by  (for
 (for  ).
).
The constructive definition of a multivariate t-distribution simultaneously serves as a sampling algorithm:
- Generate  and and , independently. , independently.
- Compute  . .
This formulation gives rise to the hierarchical representation of a multivariate t-distribution as a scale-mixture of normals:  where
 where  indicates a gamma distribution with density proportional to
 indicates a gamma distribution with density proportional to  , and
, and  conditionally follows
 conditionally follows  .
.
In the special case  , the distribution is a multivariate Cauchy distribution.
, the distribution is a multivariate Cauchy distribution.
Derivation
There are in fact many candidates for the multivariate generalization of Student's t-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension ( ), with
), with  and
 and  , we have the probability density function
, we have the probability density function  and one approach is to use a corresponding function of several variables. This is the basic idea of elliptical distribution theory, where one writes down a corresponding function of
 and one approach is to use a corresponding function of several variables. This is the basic idea of elliptical distribution theory, where one writes down a corresponding function of  variables
 variables  that replaces
 that replaces  by a quadratic function of all the
 by a quadratic function of all the  . It is clear that this only makes sense when all the marginal distributions have the same degrees of freedom
. It is clear that this only makes sense when all the marginal distributions have the same degrees of freedom  . With
. With  , one has a simple choice of multivariate density function
, one has a simple choice of multivariate density function

which is the standard but not the only choice.
An important special case is the standard bivariate t-distribution, p = 2:

Note that  .
.
Now, if  is the identity matrix, the density is
 is the identity matrix, the density is

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When  is diagonal the standard representation can be shown to have zero correlation but the marginal distributions are not statistically independent.
 is diagonal the standard representation can be shown to have zero correlation but the marginal distributions are not statistically independent.
A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios. [2] 
Conditional Distribution
This was developed by Muirhead  [6]  and Cornish. [7]  but later derived using the simpler chi-squared ratio representation above, by Roth [1]  and Ding. [8]  Let vector  follow a multivariate t distribution and partition into two subvectors of
 follow a multivariate t distribution and partition into two subvectors of  elements:
 elements: 
where  , the known mean vectors are
, the known mean vectors are  and the scale matrix is
 and the scale matrix is  .
.
Roth and Ding find the conditional distribution  to be a new t-distribution with modified parameters.
 to be a new t-distribution with modified parameters. 

An equivalent expression in Kotz et. al. is somewhat less concise.
Thus the conditional distribution is most easily represented as a two-step procedure. Form first the intermediate distribution  above then, using the parameters below, the explicit conditional distribution becomes
 above then, using the parameters below, the explicit conditional distribution becomes
 where
 where  Effective degrees of freedom,
Effective degrees of freedom,  is augmented by the number of disused variables
 is augmented by the number of disused variables  .
.  is the conditional mean of
 is the conditional mean of 
 is the Schur complement of
 is the Schur complement of  .
.  is the squared Mahalanobis distance of
 is the squared Mahalanobis distance of  from
 from  with scale matrix
 with scale matrix 
 is the conditional scale matrix for
 is the conditional scale matrix for  and
 and  is the conditional covariance matrix for
 is the conditional covariance matrix for  .
.
Elliptical representation
Constructed as an elliptical distribution, [10]  take the simplest centralised case with spherical symmetry and no scaling,  , then the multivariate t-PDF takes the form
, then the multivariate t-PDF takes the form

where  is a
 is a  -vector and
-vector and  is the degrees of freedom as defined in Muirhead [6]  section 1.5.  The covariance of
 is the degrees of freedom as defined in Muirhead [6]  section 1.5.  The covariance of  is
 is

  The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder, [11]  define radial measure  and, noting that the density is dependent only on r2, we get
 and, noting that the density is dependent only on r2, we get

which is equivalent to the variance of  -element vector
-element vector  treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.
 treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.
Radial Distribution
 follows the Fisher-Snedecor or
 follows the Fisher-Snedecor or  distribution:
 distribution:

having mean value  .
.   -distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation.
-distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation.
By a change of random variable to  in the equation above, retaining
 in the equation above, retaining  -vector
-vector  , we have
, we have  and probability distribution
 and probability distribution  
which is a regular Beta-prime distribution  having mean value
 having mean value  .
.
Cumulative Radial Distribution
Given the Beta-prime distribution, the radial cumulative distribution function of  is known:
 is known: 
where  is the incomplete Beta function and applies with a spherical
 is the incomplete Beta function and applies with a spherical  assumption.
 assumption.
In the scalar case,  , the distribution is equivalent to Student-t with the equivalence
, the distribution is equivalent to Student-t with the equivalence  , the variable t having double-sided tails for CDF purposes, i.e. the "two-tail-t-test".
, the variable t having double-sided tails for CDF purposes, i.e. the "two-tail-t-test".
The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical.  A constant radius surface at  with PDF
 with PDF  is an iso-density surface.  Given this density value, the quantum of probability on a shell of surface area
 is an iso-density surface.  Given this density value, the quantum of probability on a shell of surface area  and thickness
 and thickness  at
 at  is
 is  .
.
The enclosed  -sphere of radius
-sphere of radius  has surface area
 has surface area  .  Substitution into
.  Substitution into  shows that the shell has element of probability
 shows that the shell has element of probability  which is equivalent to radial density function
 which is equivalent to radial density function  which further simplifies to
  which further simplifies to  where
 where  is the Beta function.
 is the Beta function.
Changing the radial variable to  returns the previous Beta Prime distribution
 returns the previous Beta Prime distribution 
To scale the radial variables without changing the radial shape function, define scale matrix  , yielding a 3-parameter Cartesian density function, ie. the probability
, yielding a 3-parameter Cartesian density function, ie. the probability  in volume element
 in volume element  is
 is

or, in terms of scalar radial variable  ,
,

Radial Moments
The moments of all the radial variables , with the spherical distribution assumption, can be derived from the Beta Prime distribution.  If  then
 then  , a known result. Thus, for variable
, a known result. Thus, for variable  we have
 we have   The moments of
 The moments of  are
 are   while introducing the scale matrix
 while introducing the scale matrix  yields
 yields  Moments relating to radial variable
 Moments relating to radial variable  are found by setting
 are found by setting  and
 and  whereupon
 whereupon 
This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish.  Starting from a somewhat simplified version of the central MV-t pdf:  , where
, where  is a constant and
 is a constant and  is arbitrary but fixed, let
 is arbitrary but fixed, let  be a full-rank matrix and form vector
 be a full-rank matrix and form vector  .  Then, by straightforward change of variables
.  Then, by straightforward change of variables

The matrix of partial derivatives is  and the Jacobian becomes
 and the Jacobian becomes  .  Thus
.  Thus 
The denominator reduces to   In full:
 In full: 
which is a regular MV-t distribution.
In general if  and
 and  has full rank
 has full rank  then
 then 
Marginal Distributions
This is a special case of the rank-reducing linear transform below.  Kotz defines marginal distributions as follows.  Partition  into two subvectors of
 into two subvectors of  elements:
 elements: 
with  , means
, means  , scale matrix
, scale matrix 
then  ,
,  such that
 such that 

If a transformation is constructed in the form 
then vector  , as discussed below, has the same distribution as the marginal distribution of
, as discussed below, has the same distribution as the marginal distribution of  .
.
In the linear transform case, if  is a rectangular matrix
 is a rectangular matrix  , of rank
, of rank  the result is dimensionality reduction. Here, Jacobian
 the result is dimensionality reduction. Here, Jacobian  is seemingly rectangular but the value
 is seemingly rectangular but the value  in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken. [12]  In general if
 in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken. [12]  In general if  and
 and  has full rank
 has full rank  then
 then


In extremis, if m = 1 and  becomes a row vector, then scalar Y follows a univariate double-sided Student-t distribution defined by
 becomes a row vector, then scalar Y follows a univariate double-sided Student-t distribution defined by  with the same
 with the same  degrees of freedom.  Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-t.
 degrees of freedom.  Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-t.
- During affine transformations of variables with elliptical distributions all vectors must ultimately derive from one initial isotropic spherical vector  whose elements remain 'entangled' and are not statistically independent. whose elements remain 'entangled' and are not statistically independent.
- A vector of independent student-t samples is not consistent with the multivariate t distribution.
- Adding two sample multivariate t vectors generated with independent Chi-squared samples and different  values: values: will not produce internally consistent distributions, though they will yield a Behrens-Fisher problem. [13] will not produce internally consistent distributions, though they will yield a Behrens-Fisher problem. [13]
- Taleb compares many examples of fat-tail elliptical vs non-elliptical multivariate distributions