Multivariate random variable

Last updated

In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

Contents

Random vectors are often used as the underlying implementation of various types of aggregate random variables, e.g. a random matrix, random tree, random sequence, stochastic process, etc.

More formally, a multivariate random variable is a column vector (or its transpose, which is a row vector) whose components are scalar-valued random variables on the same probability space as each other, , where is the sample space, is the sigma-algebra (the collection of all events), and is the probability measure (a function returning each event's probability).

Probability distribution

Every random vector gives rise to a probability measure on with the Borel algebra as the underlying sigma-algebra. This measure is also known as the joint probability distribution, the joint distribution, or the multivariate distribution of the random vector.

The distributions of each of the component random variables are called marginal distributions. The conditional probability distribution of given is the probability distribution of when is known to be a particular value.

The cumulative distribution function of a random vector is defined as [1] :p.15

 

 

 

 

(Eq.1)

where .

Operations on random vectors

Random vectors can be subjected to the same kinds of algebraic operations as can non-random vectors: addition, subtraction, multiplication by a scalar, and the taking of inner products.

Affine transformations

Similarly, a new random vector can be defined by applying an affine transformation to a random vector :

, where is an matrix and is an column vector.

If is an invertible matrix and has a probability density function , then the probability density of is

.

Invertible mappings

More generally we can study invertible mappings of random vectors. [2] :p.290–291

Let be a one-to-one mapping from an open subset of onto a subset of , let have continuous partial derivatives in and let the Jacobian determinant of be zero at no point of . Assume that the real random vector has a probability density function and satisfies . Then the random vector is of probability density

where denotes the indicator function and set denotes support of .

Expected value

The expected value or mean of a random vector is a fixed vector whose elements are the expected values of the respective random variables. [3] :p.333

 

 

 

 

(Eq.2)

Covariance and cross-covariance

Definitions

The covariance matrix (also called second central moment or variance-covariance matrix) of an random vector is an matrix whose (i,j)th element is the covariance between the i th and the j th random variables. The covariance matrix is the expected value, element by element, of the matrix computed as , where the superscript T refers to the transpose of the indicated vector: [2] :p. 464 [3] :p.335

 

 

 

 

(Eq.3)

By extension, the cross-covariance matrix between two random vectors and ( having elements and having elements) is the matrix [3] :p.336

 

 

 

 

(Eq.4)

where again the matrix expectation is taken element-by-element in the matrix. Here the (i,j)th element is the covariance between the i th element of and the j th element of .

Properties

The covariance matrix is a symmetric matrix, i.e. [2] :p. 466

.

The covariance matrix is a positive semidefinite matrix, i.e. [2] :p. 465

.

The cross-covariance matrix is simply the transpose of the matrix , i.e.

.

Uncorrelatedness

Two random vectors and are called uncorrelated if

.

They are uncorrelated if and only if their cross-covariance matrix is zero. [3] :p.337

Correlation and cross-correlation

Definitions

The correlation matrix (also called second moment) of an random vector is an matrix whose (i,j)th element is the correlation between the i th and the j th random variables. The correlation matrix is the expected value, element by element, of the matrix computed as , where the superscript T refers to the transpose of the indicated vector: [4] :p.190 [3] :p.334

 

 

 

 

(Eq.5)

By extension, the cross-correlation matrix between two random vectors and ( having elements and having elements) is the matrix

 

 

 

 

(Eq.6)

Properties

The correlation matrix is related to the covariance matrix by

.

Similarly for the cross-correlation matrix and the cross-covariance matrix:

Orthogonality

Two random vectors of the same size and are called orthogonal if

.

Independence

Two random vectors and are called independent if for all and

where and denote the cumulative distribution functions of and and denotes their joint cumulative distribution function. Independence of and is often denoted by . Written component-wise, and are called independent if for all

.

Characteristic function

The characteristic function of a random vector with components is a function that maps every vector to a complex number. It is defined by [2] :p. 468

.

Further properties

Expectation of a quadratic form

One can take the expectation of a quadratic form in the random vector as follows: [5] :p.170–171

where is the covariance matrix of and refers to the trace of a matrix — that is, to the sum of the elements on its main diagonal (from upper left to lower right). Since the quadratic form is a scalar, so is its expectation.

Proof: Let be an random vector with and and let be an non-stochastic matrix.

Then based on the formula for the covariance, if we denote and , we see that:

Hence

which leaves us to show that

This is true based on the fact that one can cyclically permute matrices when taking a trace without changing the end result (e.g.: ).

We see that

And since

is a scalar, then

trivially. Using the permutation we get:

and by plugging this into the original formula we get:

Expectation of the product of two different quadratic forms

One can take the expectation of the product of two different quadratic forms in a zero-mean Gaussian random vector as follows: [5] :pp. 162–176

where again is the covariance matrix of . Again, since both quadratic forms are scalars and hence their product is a scalar, the expectation of their product is also a scalar.

Applications

Portfolio theory

In portfolio theory in finance, an objective often is to choose a portfolio of risky assets such that the distribution of the random portfolio return has desirable properties. For example, one might want to choose the portfolio return having the lowest variance for a given expected value. Here the random vector is the vector of random returns on the individual assets, and the portfolio return p (a random scalar) is the inner product of the vector of random returns with a vector w of portfolio weights — the fractions of the portfolio placed in the respective assets. Since p = wT, the expected value of the portfolio return is wTE() and the variance of the portfolio return can be shown to be wTCw, where C is the covariance matrix of .

Regression theory

In linear regression theory, we have data on n observations on a dependent variable y and n observations on each of k independent variables xj. The observations on the dependent variable are stacked into a column vector y; the observations on each independent variable are also stacked into column vectors, and these latter column vectors are combined into a design matrix X (not denoting a random vector in this context) of observations on the independent variables. Then the following regression equation is postulated as a description of the process that generated the data:

where β is a postulated fixed but unknown vector of k response coefficients, and e is an unknown random vector reflecting random influences on the dependent variable. By some chosen technique such as ordinary least squares, a vector is chosen as an estimate of β, and the estimate of the vector e, denoted , is computed as

Then the statistician must analyze the properties of and , which are viewed as random vectors since a randomly different selection of n cases to observe would have resulted in different values for them.

Vector time series

The evolution of a k×1 random vector through time can be modelled as a vector autoregression (VAR) as follows:

where the i-periods-back vector observation is called the i-th lag of , c is a k × 1 vector of constants (intercepts), Ai is a time-invariant k × k matrix and is a k × 1 random vector of error terms.

Related Research Articles

<span class="mw-page-title-main">Autocorrelation</span> Correlation of a signal with a time-shifted copy of itself, as a function of shift

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

In probability theory and statistics, two real-valued random variables, , , are said to be uncorrelated if their covariance, , is zero. If two variables are uncorrelated, there is no linear relationship between them.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

The cross-correlation matrix of two random vectors is a matrix containing as elements the cross-correlations of all pairs of elements of the random vectors. The cross-correlation matrix is used in various digital signal processing algorithms.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Cross-correlation</span> Covariance and correlation

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

In probability and statistics, given two stochastic processes and , the cross-covariance is a function that gives the covariance of one process with the other at pairs of time points. With the usual notation for the expectation operator, if the processes have the mean functions and , then the cross-covariance is given by

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In probability theory and statistics, a cross-covariance matrix is a matrix whose element in the i, j position is the covariance between the i-th element of a random vector and j-th element of another random vector. A random vector is a random variable with multiple dimensions. Each element of the vector is a scalar random variable. Each element has either a finite number of observed empirical values or a finite or infinite number of potential values. The potential values are specified by a theoretical joint probability distribution. Intuitively, the cross-covariance matrix generalizes the notion of covariance to multiple dimensions.

In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .

<span class="mw-page-title-main">Distance correlation</span>

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t:

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

<span class="mw-page-title-main">Complex random vector</span>

In probability theory and statistics, a complex random vector is typically a tuple of complex-valued random variables, and generally is a random variable taking values in a vector space over the field of complex numbers. If are complex-valued random variables, then the n-tuple is a complex random vector. Complex random variables can always be considered as pairs of real random vectors: their real and imaginary parts.

References

  1. Gallager, Robert G. (2013). Stochastic Processes Theory for Applications. Cambridge University Press. ISBN   978-1-107-03975-9.
  2. 1 2 3 4 5 Lapidoth, Amos (2009). A Foundation in Digital Communication. Cambridge University Press. ISBN   978-0-521-19395-5.
  3. 1 2 3 4 5 Gubner, John A. (2006). Probability and Random Processes for Electrical and Computer Engineers. Cambridge University Press. ISBN   978-0-521-86470-1.
  4. Papoulis, Athanasius (1991). Probability, Random Variables and Stochastic Processes (Third ed.). McGraw-Hill. ISBN   0-07-048477-5.
  5. 1 2 Kendrick, David (1981). Stochastic Control for Economic Models. McGraw-Hill. ISBN   0-07-033962-7.

Further reading