Generalized chi-squared distribution

Generalized chi-squared distribution
	Probability density function
	Cumulative distribution function
Notation
Parameters	, vector of weights of noncentral chi-square components; , vector of degrees of freedom of noncentral chi-square components; , vector of non-centrality parameters of chi-square components; , scale of normal term; , offset
Support	;
PDF	no closed-form expression
CDF	no closed-form expression
Mean
Variance
MGF
CF

Last updated January 10, 2025

In probability theory and statistics, the generalized chi-squared distribution (or generalized chi-square distribution) is the distribution of a quadratic form of a multinormal variable (normal vector), or a linear combination of different normal variables and squares of normal variables. Equivalently, it is also a linear sum of independent noncentral chi-square variables and a normal variable. There are several other such generalizations for which the same term is sometimes used; some of them are special cases of the family discussed here, for example the gamma distribution.

Definition

The generalized chi-squared variable may be described in multiple ways. One is to write it as a weighted sum of independent noncentral chi-square variables ${{\chi }'}^{2}$ and a standard normal variable $z$ :^[1]^[2]

{\tilde {\chi }}({\boldsymbol {w}},{\boldsymbol {k}},{\boldsymbol {\lambda }},s,m)=\sum _{i}w_{i}{{\chi }'}^{2}(k_{i},\lambda _{i})+sz+m.

Here the parameters are the weights $w_{i}$ , the degrees of freedom $k_{i}$ and non-centralities $\lambda _{i}$ of the constituent non-central chi-squares, and the coefficients $s$ and $m$ of the normal. Some important special cases of this have all weights $w_{i}$ of the same sign, or have central chi-squared components, or omit the normal term.

Since a non-central chi-squared variable is a sum of squares of normal variables with different means, the generalized chi-square variable is also defined as a sum of squares of independent normal variables, plus an independent normal variable: that is, a quadratic in normal variables.

Another equivalent way is to formulate it as a quadratic form of a normal vector ${\boldsymbol {x}}$ :^[3]^[4]

{\tilde {\chi }}=q({\boldsymbol {x}})={\boldsymbol {x}}'\mathbf {Q_{2}} {\boldsymbol {x}}+{\boldsymbol {q_{1}}}'{\boldsymbol {x}}+q_{0}

.

Here $\mathbf {Q_{2}}$ is a matrix, ${\boldsymbol {q_{1}}}$ is a vector, and $q_{0}$ is a scalar. These, together with the mean ${\boldsymbol {\mu }}$ and covariance matrix $\mathbf {\Sigma }$ of the normal vector ${\boldsymbol {x}}$ , parameterize the distribution.

For the most general case, a reduction towards a common standard form can be made by using a representation of the following form:^[5]

X=(z+a)^{\mathrm {T} }A(z+a)+c^{\mathrm {T} }z=(x+b)^{\mathrm {T} }D(x+b)+d^{\mathrm {T} }x+e,

where D is a diagonal matrix and where x represents a vector of uncorrelated standard normal random variables.

Parameter conversions

A generalized chi-square variable or distribution can be parameterized in two ways. The first is in terms of the weights $w_{i}$ , the degrees of freedom $k_{i}$ and non-centralities $\lambda _{i}$ of the constituent non-central chi-squares, and the coefficients $s$ and $m$ of the added normal term. The second parameterization is using the quadratic form of a normal vector, where the paremeters are the matrix $\mathbf {Q_{2}}$ , the vector ${\boldsymbol {q_{1}}}$ , and the scalar $q_{0}$ , and the mean ${\boldsymbol {\mu }}$ and covariance matrix $\mathbf {\Sigma }$ of the normal vector.

The parameters of the first expression (in terms of non-central chi-squares, a normal and a constant) can be calculated in terms of the parameters of the second expression (quadratic form of a normal vector).^[4]

The parameters of the second expression (quadratic form of a normal vector) can also be calculated in terms of the parameters of the first expression (in terms of non-central chi-squares, a normal and a constant).^[6]

There exists Matlab code to convert from one set of parameters to another.

Computing the PDF/CDF/inverse CDF/random numbers

The probability density, cumulative distribution, and inverse cumulative distribution functions of a generalized chi-squared variable do not have simple closed-form expressions. But there exist several methods to compute them numerically: Ruben's method,^[7] Imhof's method,^[8] IFFT method,^[6] ray method,^[6] and ellipse approximation.^[6]

Numerical algorithms ^[5]^[2]^[8]^[4] and computer code (Fortran and C, Matlab, R, Python, Julia) have been published that implement some of these methods to compute the PDF, CDF, and inverse CDF, and to generate random numbers.

The following table shows the best methods to use to compute the CDF and PDF for the different parts of the generalized chi-square distribution in different cases: ^[6]

${\tilde {\chi }}$ type	part	best cdf/pdf method(s)
ellipse: $w_{i}$ same sign, $s=0$	body	Ruben, Imhof, IFFT, ray
	finite tail	Ruben, ray (if $\lambda _{i}=0$ ), ellipse
	infinite tail	Ruben, ray
not ellipse: $w_{i}$ mixed signs, and/or $s\neq 0$	body	Imhof, IFFT, ray
not ellipse: $w_{i}$ mixed signs, and/or $s\neq 0$	infinite tails	ray

Applications

The generalized chi-squared is the distribution of statistical estimates in cases where the usual statistical theory does not hold, as in the examples below.

In model fitting and selection

If a predictive model is fitted by least squares, but the residuals have either autocorrelation or heteroscedasticity, then alternative models can be compared (in model selection) by relating changes in the sum of squares to an asymptotically valid generalized chi-squared distribution.^[3]

Classifying normal vectors using Gaussian discriminant analysis

If ${\boldsymbol {x}}$ is a normal vector, its log likelihood is a quadratic form of ${\boldsymbol {x}}$ , and is hence distributed as a generalized chi-squared. The log likelihood ratio that ${\boldsymbol {x}}$ arises from one normal distribution versus another is also a quadratic form, so distributed as a generalized chi-squared.^[4]

In Gaussian discriminant analysis, samples from multinormal distributions are optimally separated by using a quadratic classifier, a boundary that is a quadratic function (e.g. the curve defined by setting the likelihood ratio between two Gaussians to 1). The classification error rates of different types (false positives and false negatives) are integrals of the normal distributions within the quadratic regions defined by this classifier. Since this is mathematically equivalent to integrating a quadratic form of a normal vector, the result is an integral of a generalized-chi-squared variable.^[4]

In signal processing

The following application arises in the context of Fourier analysis in signal processing, renewal theory in probability theory, and multi-antenna systems in wireless communication. The common factor of these areas is that the sum of exponentially distributed variables is of importance (or identically, the sum of squared magnitudes of circularly-symmetric centered complex Gaussian variables).

If $Z_{i}$ are k independent, circularly-symmetric centered complex Gaussian random variables with mean 0 and variance $\sigma _{i}^{2}$ , then the random variable

{\tilde {Q}}=\sum _{i=1}^{k}|Z_{i}|^{2}

has a generalized chi-squared distribution of a particular form. The difference from the standard chi-squared distribution is that $Z_{i}$ are complex and can have different variances, and the difference from the more general generalized chi-squared distribution is that the relevant scaling matrix A is diagonal. If $\mu =\sigma _{i}^{2}$ for all i, then ${\tilde {Q}}$ , scaled down by $\mu /2$ (i.e. multiplied by $2/\mu$ ), has a chi-squared distribution, $\chi ^{2}(2k)$ , also known as an Erlang distribution. If $\sigma _{i}^{2}$ have distinct values for all i, then ${\tilde {Q}}$ has the pdf^[9]

f(x;k,\sigma _{1}^{2},\ldots ,\sigma _{k}^{2})=\sum _{i=1}^{k}{\frac {e^{-{\frac {x}{\sigma _{i}^{2}}}}}{\sigma _{i}^{2}\prod _{j=1,j\neq i}^{k}\left(1-{\frac {\sigma _{j}^{2}}{\sigma _{i}^{2}}}\right)}}\quad {\text{for }}x\geq 0.

If there are sets of repeated variances among $\sigma _{i}^{2}$ , assume that they are divided into M sets, each representing a certain variance value. Denote $\mathbf {r} =(r_{1},r_{2},\dots ,r_{M})$ to be the number of repetitions in each group. That is, the mth set contains $r_{m}$ variables that have variance $\sigma _{m}^{2}.$ It represents an arbitrary linear combination of independent $\chi ^{2}$ -distributed random variables with different degrees of freedom:

{\tilde {Q}}=\sum _{m=1}^{M}\sigma _{m}^{2}/2*Q_{m},\quad Q_{m}\sim \chi ^{2}(2r_{m})\,.

The pdf of ${\tilde {Q}}$ is^[10]

f(x;\mathbf {r} ,\sigma _{1}^{2},\dots \sigma _{M}^{2})=\prod _{m=1}^{M}{\frac {1}{\sigma _{m}^{2r_{m}}}}\sum _{k=1}^{M}\sum _{l=1}^{r_{k}}{\frac {\Psi _{k,l,\mathbf {r} }}{(r_{k}-l)!}}(-x)^{r_{k}-l}e^{-{\frac {x}{\sigma _{k}^{2}}}},\quad {\text{ for }}x\geq 0,

where

\Psi _{k,l,\mathbf {r} }=(-1)^{r_{k}-1}\sum _{\mathbf {i} \in \Omega _{k,l}}\prod _{j\neq k}{\binom {i_{j}+r_{j}-1}{i_{j}}}\left({\frac {1}{\sigma _{j}^{2}}}\!-\!{\frac {1}{\sigma _{k}^{2}}}\right)^{-(r_{j}+i_{j})},

with $\mathbf {i} =[i_{1},\ldots ,i_{M}]^{T}$ from the set $\Omega _{k,l}$ of all partitions of $l-1$ (with $i_{k}=0$ ) defined as

\Omega _{k,l}=\left\{[i_{1},\ldots ,i_{m}]\in \mathbb {Z} ^{m};\sum _{j=1}^{M}i_{j}\!=l-1,i_{k}=0,i_{j}\geq 0{\text{ for all }}j\right\}.

Related Research Articles

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not. Standard deviation may be abbreviated SD or std dev, and is most commonly represented in mathematical texts and equations by the lowercase Greek letter σ (sigma), for the population standard deviation, or the Latin letter s, for the sample standard deviation.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In probability theory and statistics, the chi-squared distribution with $degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.$

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

Hotellings <i>T</i>-squared distribution Type of probability distribution

In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T²), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution. The Hotelling's t-squared statistic (t²) is a generalization of Student's t-statistic that is used in multivariate hypothesis testing.

In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE) was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models. The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression. MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models. MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.

The sensitivity index or discriminability index or detectability index is a dimensionless statistic used in signal detection theory. A higher index indicates that the signal can be more readily detected.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which $given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.$

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ( $).$

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

<span class="mw-page-title-main">Logit-normal distribution</span> Probability distribution

In probability theory, a logit-normal distribution is a probability distribution of a random variable whose logit has a normal distribution. If Y is a random variable with a normal distribution, and t is the standard logistic function, then X = t(Y) has a logit-normal distribution; likewise, if X is logit-normally distributed, then Y = logit(X)= log (X/(1-X)) is normally distributed. It is also known as the logistic normal distribution, which often refers to a multinomial logit version (e.g.).

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

In the mathematical theory of probability, multivariate Laplace distributions are extensions of the Laplace distribution and the asymmetric Laplace distribution to multiple variables. The marginal distributions of symmetric multivariate Laplace distribution variables are Laplace distributions. The marginal distributions of asymmetric multivariate Laplace distribution variables are asymmetric Laplace distributions.

References

↑ Davies, R. B. (1973). "Numerical inversion of a characteristic function". Biometrika . 60 (2): 415–417. doi:10.1093/biomet/60.2.415.
1 2 Davies, R. B. (1980). "Algorithm AS155: The distribution of a linear combination of χ² random variables". Journal of the Royal Statistical Society. Series C (Applied Statistics). 29: 323–333. doi:10.2307/2346911.
1 2 Jones, D. A. (1983). "Statistical analysis of empirical models fitted by optimisation". Biometrika . 70 (1): 67–88. doi:10.1093/biomet/70.1.67.
1 2 3 4 5 Das, Abhranil; Wilson S Geisler (2020). "Methods to integrate multinormals and compute classification measures". arXiv: 2012.14331 [stat.ML].
1 2 Sheil, J.; O'Muircheartaigh, I. (1977). "Algorithm AS106: The distribution of non-negative quadratic forms in normal variables". Journal of the Royal Statistical Society. Series C (Applied Statistics). 26 (1): 92–98. doi:10.2307/2346884.
1 2 3 4 5 Das, Abhranil (2024). "New methods to compute the generalized chi-square distribution". arXiv: 2404.05062 .
↑ Ruben, Harold (1962). "Probability content of regions under spherical normal distributions, IV: The distribution of homogeneous and non-homogeneous quadratic functions of normal variables". The Annals of Mathematical Statistics: 542-570.
1 2 Imhof, J. P. (1961). "Computing the Distribution of Quadratic Forms in Normal Variables" (PDF). Biometrika. 48 (3/4): 419–426. doi:10.2307/2332763. JSTOR 2332763.
↑ D. Hammarwall, M. Bengtsson, B. Ottersten (2008) "Acquiring Partial CSI for Spatially Selective Transmission by Instantaneous Channel Norm Feedback", IEEE Transactions on Signal Processing, 56, 1188–1204
↑ E. Björnson, D. Hammarwall, B. Ottersten (2009) "Exploiting Quantized Channel Norm Feedback through Conditional Statistics in Arbitrarily Correlated MIMO Systems", IEEE Transactions on Signal Processing, 57, 4027–4041

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Davies1-1] Davies, R. B. (1973). "Numerical inversion of a characteristic function". Biometrika . 60 (2): 415–417. doi:10.1093/biomet/60.2.415.

[Davies2-2] 1 2 Davies, R. B. (1980). "Algorithm AS155: The distribution of a linear combination of χ² random variables". Journal of the Royal Statistical Society. Series C (Applied Statistics). 29: 323–333. doi:10.2307/2346911.

[Jones1-3] 1 2 Jones, D. A. (1983). "Statistical analysis of empirical models fitted by optimisation". Biometrika . 70 (1): 67–88. doi:10.1093/biomet/70.1.67.

[Das-4] 1 2 3 4 5 Das, Abhranil; Wilson S Geisler (2020). "Methods to integrate multinormals and compute classification measures". arXiv: 2012.14331 [stat.ML].

[Sheil-5] 1 2 Sheil, J.; O'Muircheartaigh, I. (1977). "Algorithm AS106: The distribution of non-negative quadratic forms in normal variables". Journal of the Royal Statistical Society. Series C (Applied Statistics). 26 (1): 92–98. doi:10.2307/2346884.

[Das2-6] 1 2 3 4 5 Das, Abhranil (2024). "New methods to compute the generalized chi-square distribution". arXiv: 2404.05062 .

[7] Ruben, Harold (1962). "Probability content of regions under spherical normal distributions, IV: The distribution of homogeneous and non-homogeneous quadratic functions of normal variables". The Annals of Mathematical Statistics: 542-570.

[Imhof-8] 1 2 Imhof, J. P. (1961). "Computing the Distribution of Quadratic Forms in Normal Variables" (PDF). Biometrika. 48 (3/4): 419–426. doi:10.2307/2332763. JSTOR 2332763.

[9] D. Hammarwall, M. Bengtsson, B. Ottersten (2008) "Acquiring Partial CSI for Spatially Selective Transmission by Instantaneous Channel Norm Feedback", IEEE Transactions on Signal Processing, 56, 1188–1204

[10] E. Björnson, D. Hammarwall, B. Ottersten (2009) "Exploiting Quantized Channel Norm Feedback through Conditional Statistics in Arbitrarily Correlated MIMO Systems", IEEE Transactions on Signal Processing, 57, 4027–4041

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]