Matrix normal distribution

Last updated
Matrix normal
Notation
Parameters

location (real matrix)
scale (positive-definite real matrix)

Contents

scale (positive-definite real matrix)
Support
PDF
Mean
Variance (among-row) and (among-column)

In statistics, the matrix normal distribution or matrix Gaussian distribution is a probability distribution that is a generalization of the multivariate normal distribution to matrix-valued random variables.

Definition

The probability density function for the random matrix X (n × p) that follows the matrix normal distribution has the form:

where denotes trace and M is n × p, U is n × n and V is p × p, and the density is understood as the probability density function with respect to the standard Lebesgue measure in , i.e.: the measure corresponding to integration with respect to .

The matrix normal is related to the multivariate normal distribution in the following way:

if and only if

where denotes the Kronecker product and denotes the vectorization of .

Proof

The equivalence between the above matrix normal and multivariate normal density functions can be shown using several properties of the trace and Kronecker product, as follows. We start with the argument of the exponent of the matrix normal PDF:

which is the argument of the exponent of the multivariate normal PDF with respect to Lebesgue measure in . The proof is completed by using the determinant property:

Properties

If , then we have the following properties: [1] [2]

Expected values

The mean, or expected value is:

and we have the following second-order expectations:

where denotes trace.

More generally, for appropriately dimensioned matrices A,B,C:

Transformation

Transpose transform:

Linear transform: let D (r-by-n), be of full rank r ≤ n and C (p-by-s), be of full rank s ≤ p, then:

Example

Let's imagine a sample of n independent p-dimensional random variables identically distributed according to a multivariate normal distribution:

.

When defining the n × p matrix for which the ith row is , we obtain:

where each row of is equal to , that is , is the n × n identity matrix, that is the rows are independent, and .

Maximum likelihood parameter estimation

Given k matrices, each of size n × p, denoted , which we assume have been sampled i.i.d. from a matrix normal distribution, the maximum likelihood estimate of the parameters can be obtained by maximizing:

The solution for the mean has a closed form, namely

but the covariance parameters do not. However, these parameters can be iteratively maximized by zero-ing their gradients at:

and

See for example [3] and references therein. The covariance parameters are non-identifiable in the sense that for any scale factor, s>0, we have:

Drawing values from the distribution

Sampling from the matrix normal distribution is a special case of the sampling procedure for the multivariate normal distribution. Let be an n by p matrix of np independent samples from the standard normal distribution, so that

Then let

so that

where A and B can be chosen by Cholesky decomposition or a similar matrix square root operation.

Relation to other distributions

Dawid (1981) provides a discussion of the relation of the matrix-valued normal distribution to other distributions, including the Wishart distribution, inverse-Wishart distribution and matrix t-distribution, but uses different notation from that employed here.

See also

Related Research Articles

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928.

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a generalization of the outer product from vectors to matrices, and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

Hotellings <i>T</i>-squared distribution

In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution. The Hotelling's t-squared statistic (t2) is a generalization of Student's t-statistic that is used in multivariate hypothesis testing.

In wireless communications, channel state information (CSI) is the known channel properties of a communication link. This information describes how a signal propagates from the transmitter to the receiver and represents the combined effect of, for example, scattering, fading, and power decay with distance. The method is called Channel estimation. The CSI makes it possible to adapt transmissions to current channel conditions, which is crucial for achieving reliable communication with high data rates in multiantenna systems.

<span class="mw-page-title-main">Vectorization (mathematics)</span> Conversion of a matrix to a vector

In mathematics, especially in linear algebra and matrix theory, the vectorization of a matrix is a linear transformation which converts the matrix into a column vector. Specifically, the vectorization of a m × n matrix A, denoted vec(A), is the mn × 1 column vector obtained by stacking the columns of the matrix A on top of one another:

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .

<span class="mw-page-title-main">Logit-normal distribution</span>

In probability theory, a logit-normal distribution is a probability distribution of a random variable whose logit has a normal distribution. If Y is a random variable with a normal distribution, and P is the standard logistic function, then X = P(Y) has a logit-normal distribution; likewise, if X is logit-normally distributed, then Y = logit(X)= log is normally distributed. It is also known as the logistic normal distribution, which often refers to a multinomial logit version (e.g.).

In probability theory and statistics, the normal-Wishart distribution is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix.

In probability theory and statistics, the normal-inverse-Wishart distribution is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix.

In statistics, the matrix t-distribution is the generalization of the multivariate t-distribution from vectors to matrices. The matrix t-distribution shares the same relationship with the multivariate t-distribution that the matrix normal distribution shares with the multivariate normal distribution. For example, the matrix t-distribution is the compound distribution that results from sampling from a matrix normal distribution having sampled the covariance matrix of the matrix normal from an inverse Wishart distribution.

Curvilinear coordinates can be formulated in tensor calculus, with important applications in physics and engineering, particularly for describing transportation of physical quantities and deformation of matter in fluid mechanics and continuum mechanics.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

In the mathematical theory of probability, multivariate Laplace distributions are extensions of the Laplace distribution and the asymmetric Laplace distribution to multiple variables. The marginal distributions of symmetric multivariate Laplace distribution variables are Laplace distributions. The marginal distributions of asymmetric multivariate Laplace distribution variables are asymmetric Laplace distributions.

In statistics, the complex Wishart distribution is a complex version of the Wishart distribution. It is the distribution of times the sample Hermitian covariance matrix of zero-mean independent Gaussian random variables. It has support for Hermitian positive definite matrices.

The complex inverse Wishart distribution is a matrix probability distribution defined on complex-valued positive-definite matrices and is the complex analog of the real inverse Wishart distribution. The complex Wishart distribution was extensively investigated by Goodman while the derivation of the inverse is shown by Shaman and others. It has greatest application in least squares optimization theory applied to complex valued data samples in digital radio communications systems, often related to Fourier Domain complex filtering.

References

  1. A K Gupta; D K Nagar (22 October 1999). "Chapter 2: MATRIX VARIATE NORMAL DISTRIBUTION". Matrix Variate Distributions. CRC Press. ISBN   978-1-58488-046-2 . Retrieved 23 May 2014.
  2. Ding, Shanshan; R. Dennis Cook (2014). "DIMENSION FOLDING PCA AND PFC FOR MATRIX- VALUED PREDICTORS". Statistica Sinica. 24 (1): 463–492.
  3. Glanz, Hunter; Carvalho, Luis (2013). "An Expectation-Maximization Algorithm for the Matrix Normal Distribution". arXiv: 1309.6609 [stat.ME].