Cross-covariance matrix

Last updated January 30, 2025

In probability theory and statistics, a cross-covariance matrix is a matrix whose element in the i, j position is the covariance between the i-th element of a random vector and j-th element of another random vector. When the two random vectors are the same, the cross-covariance matrix is referred to as covariance matrix. A random vector is a random variable with multiple dimensions. Each element of the vector is a scalar random variable. Each element has either a finite number of observed empirical values or a finite or infinite number of potential values. The potential values are specified by a theoretical joint probability distribution. Intuitively, the cross-covariance matrix generalizes the notion of covariance to multiple dimensions.

Definition

For random vectors $\mathbf {X}$ and $\mathbf {Y}$ , each containing random elements whose expected value and variance exist, the cross-covariance matrix of $\mathbf {X}$ and $\mathbf {Y}$ is defined by^[1]^: 336

\operatorname {K} _{\mathbf {X} \mathbf {Y} }=\operatorname {cov} (\mathbf {X} ,\mathbf {Y} ){\stackrel {\mathrm {def} }{=}}\ \operatorname {E} [(\mathbf {X} -\mathbf {\mu _{X}} )(\mathbf {Y} -\mathbf {\mu _{Y}} )^{\rm {T}}]

Eq.1

where $\mathbf {\mu _{X}} =\operatorname {E} [\mathbf {X} ]$ and $\mathbf {\mu _{Y}} =\operatorname {E} [\mathbf {Y} ]$ are vectors containing the expected values of $\mathbf {X}$ and $\mathbf {Y}$ . The vectors $\mathbf {X}$ and $\mathbf {Y}$ need not have the same dimension, and either might be a scalar value.

The cross-covariance matrix is the matrix whose $(i,j)$ entry is the covariance

\operatorname {K} _{X_{i}Y_{j}}=\operatorname {cov} [X_{i},Y_{j}]=\operatorname {E} [(X_{i}-\operatorname {E} [X_{i}])(Y_{j}-\operatorname {E} [Y_{j}])]

between the i-th element of $\mathbf {X}$ and the j-th element of $\mathbf {Y}$ . This gives the following component-wise definition of the cross-covariance matrix.

\operatorname {K} _{\mathbf {X} \mathbf {Y} }={\begin{bmatrix}\mathrm {E} [(X_{1}-\operatorname {E} [X_{1}])(Y_{1}-\operatorname {E} [Y_{1}])]&\mathrm {E} [(X_{1}-\operatorname {E} [X_{1}])(Y_{2}-\operatorname {E} [Y_{2}])]&\cdots &\mathrm {E} [(X_{1}-\operatorname {E} [X_{1}])(Y_{n}-\operatorname {E} [Y_{n}])]\\\\\mathrm {E} [(X_{2}-\operatorname {E} [X_{2}])(Y_{1}-\operatorname {E} [Y_{1}])]&\mathrm {E} [(X_{2}-\operatorname {E} [X_{2}])(Y_{2}-\operatorname {E} [Y_{2}])]&\cdots &\mathrm {E} [(X_{2}-\operatorname {E} [X_{2}])(Y_{n}-\operatorname {E} [Y_{n}])]\\\\\vdots &\vdots &\ddots &\vdots \\\\\mathrm {E} [(X_{m}-\operatorname {E} [X_{m}])(Y_{1}-\operatorname {E} [Y_{1}])]&\mathrm {E} [(X_{m}-\operatorname {E} [X_{m}])(Y_{2}-\operatorname {E} [Y_{2}])]&\cdots &\mathrm {E} [(X_{m}-\operatorname {E} [X_{m}])(Y_{n}-\operatorname {E} [Y_{n}])]\end{bmatrix}}

Example

For example, if $\mathbf {X} =\left(X_{1},X_{2},X_{3}\right)^{\rm {T}}$ and $\mathbf {Y} =\left(Y_{1},Y_{2}\right)^{\rm {T}}$ are random vectors, then $\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )$ is a $3\times 2$ matrix whose $(i,j)$ -th entry is $\operatorname {cov} (X_{i},Y_{j})$ .

Properties

For the cross-covariance matrix, the following basic properties apply:^[2]

$\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\operatorname {E} [\mathbf {X} \mathbf {Y} ^{\rm {T}}]-\mathbf {\mu _{X}} \mathbf {\mu _{Y}} ^{\rm {T}}$
$\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )^{\rm {T}}$
$\operatorname {cov} (\mathbf {X_{1}} +\mathbf {X_{2}} ,\mathbf {Y} )=\operatorname {cov} (\mathbf {X_{1}} ,\mathbf {Y} )+\operatorname {cov} (\mathbf {X_{2}} ,\mathbf {Y} )$
$\operatorname {cov} (A\mathbf {X} +\mathbf {a} ,B^{\rm {T}}\mathbf {Y} +\mathbf {b} )=A\,\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )\,B$
If $\mathbf {X}$ and $\mathbf {Y}$ are independent (or somewhat less restrictedly, if every random variable in $\mathbf {X}$ is uncorrelated with every random variable in $\mathbf {Y}$ ), then $\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=0_{p\times q}$

where $\mathbf {X}$ , $\mathbf {X_{1}}$ and $\mathbf {X_{2}}$ are random $p\times 1$ vectors, $\mathbf {Y}$ is a random $q\times 1$ vector, $\mathbf {a}$ is a $q\times 1$ vector, $\mathbf {b}$ is a $p\times 1$ vector, $A$ and $B$ are $q\times p$ matrices of constants, and $0_{p\times q}$ is a $p\times q$ matrix of zeroes.

Definition for complex random vectors

If $\mathbf {Z}$ and $\mathbf {W}$ are complex random vectors, the definition of the cross-covariance matrix is slightly changed. Transposition is replaced by Hermitian transposition:

\operatorname {K} _{\mathbf {Z} \mathbf {W} }=\operatorname {cov} (\mathbf {Z} ,\mathbf {W} ){\stackrel {\mathrm {def} }{=}}\ \operatorname {E} [(\mathbf {Z} -\mathbf {\mu _{Z}} )(\mathbf {W} -\mathbf {\mu _{W}} )^{\rm {H}}]

For complex random vectors, another matrix called the pseudo-cross-covariance matrix is defined as follows:

\operatorname {J} _{\mathbf {Z} \mathbf {W} }=\operatorname {cov} (\mathbf {Z} ,{\overline {\mathbf {W} }}){\stackrel {\mathrm {def} }{=}}\ \operatorname {E} [(\mathbf {Z} -\mathbf {\mu _{Z}} )(\mathbf {W} -\mathbf {\mu _{W}} )^{\rm {T}}]

Uncorrelatedness

Two random vectors $\mathbf {X}$ and $\mathbf {Y}$ are called uncorrelated if their cross-covariance matrix $\operatorname {K} _{\mathbf {X} \mathbf {Y} }$ matrix is a zero matrix.^[1]^: 337

Complex random vectors $\mathbf {Z}$ and $\mathbf {W}$ are called uncorrelated if their covariance matrix and pseudo-covariance matrix is zero, i.e. if $\operatorname {K} _{\mathbf {Z} \mathbf {W} }=\operatorname {J} _{\mathbf {Z} \mathbf {W} }=0$ .

Related Research Articles

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not. Standard deviation may be abbreviated SD or std dev, and is most commonly represented in mathematical texts and equations by the lowercase Greek letter σ (sigma), for the population standard deviation, or the Latin letter s, for the sample standard deviation.

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables.

In probability theory and statistics, two real-valued random variables, $,, are said to be uncorrelated if their covariance,, is zero. If two variables are uncorrelated, there is no linear relationship between them.$

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

The cross-correlation matrix of two random vectors is a matrix containing as elements the cross-correlations of all pairs of elements of the random vectors. The cross-correlation matrix is used in various digital signal processing algorithms.

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in R^p×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In probability and statistics, given two stochastic processes $and, the cross-covariance is a function that gives the covariance of one process with the other at pairs of time points. With the usual notation for the expectation operator, if the processes have the mean functions and, then the cross-covariance is given by$

The sample mean or empirical mean, and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables.

In probability theory, the family of complex normal distributions, denoted $or, characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix, and the relation matrix . The standard complex normal is the univariate distribution with,, and .$

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(x₀, p)) to more than two outcomes.

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

In probability theory and statistics, a complex random vector is typically a tuple of complex-valued random variables, and generally is a random variable taking values in a vector space over the field of complex numbers. If $are complex-valued random variables, then the n -tuple is a complex random vector. Complex random variables can always be considered as pairs of real random vectors: their real and imaginary parts.$

In statistics, modes of variation are a continuously indexed set of vectors or functions that are centered at a mean and are used to depict the variation in a population or sample. Typically, variation patterns in the data can be decomposed in descending order of eigenvalues with the directions represented by the corresponding eigenvectors or eigenfunctions. Modes of variation provide a visualization of this decomposition and an efficient description of variation around the mean. Both in principal component analysis (PCA) and in functional principal component analysis (FPCA), modes of variation play an important role in visualizing and describing the variation in the data contributed by each eigencomponent. In real-world applications, the eigencomponents and associated modes of variation aid to interpret complex data, especially in exploratory data analysis (EDA).

References

1 2 Gubner, John A. (2006). Probability and Random Processes for Electrical and Computer Engineers. Cambridge University Press. ISBN 978-0-521-86470-1.
↑ Taboga, Marco (2010). "Lectures on probability theory and mathematical statistics".

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Gubner-1] 1 2 Gubner, John A. (2006). Probability and Random Processes for Electrical and Computer Engineers. Cambridge University Press. ISBN 978-0-521-86470-1.

[taboga-2] Taboga, Marco (2010). "Lectures on probability theory and mathematical statistics".

[1]

[2]