Compound probability distribution

Last updated November 28, 2023

In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

Definition

A compound probability distribution is the probability distribution that results from assuming that a random variable $X$ is distributed according to some parametrized distribution $F$ with an unknown parameter $\theta$ that is again distributed according to some other distribution $G$ . The resulting distribution $H$ is said to be the distribution that results from compounding $F$ with $G$ . The parameter's distribution $G$ is also called the mixing distribution or latent distribution. Technically, the unconditional distribution $H$ results from marginalizing over $G$ , i.e., from integrating out the unknown parameter(s) $\theta$ . Its probability density function is given by:

p_{H}(x)={\displaystyle \int \limits p_{F}(x|\theta )\,p_{G}(\theta )\operatorname {d} \!\theta }

The same formula applies analogously if some or all of the variables are vectors.

From the above formula, one can see that a compound distribution essentially is a special case of a marginal distribution: The joint distribution of $x$ and $\theta$ is given by $p(x,\theta )=p(x|\theta )p(\theta )$ , and the compound results as its marginal distribution: ${\textstyle p(x)=\int p(x,\theta )\operatorname {d} \!\theta }$ . If the domain of $\theta$ is discrete, then the distribution is again a special case of a mixture distribution.

Properties

General

The compound distribution $H$ will depend on the specific expression of each distribution, as well as which parameter of $F$ is distributed according to the distribution $G$ , and the parameters of $H$ will include any parameters of $G$ that are not marginalized, or integrated, out. The support of $H$ is the same as that of $F$ , and if the latter is a two-parameter distribution parameterized with the mean and variance, some general properties exist.

Mean and variance

The compound distribution's first two moments are given by the law of total expectation and the law of total variance:

\operatorname {E} _{H}[X]=\operatorname {E} _{G}{\bigl [}\operatorname {E} _{F}[X|\theta ]{\bigr ]}

\operatorname {Var} _{H}(X)=\operatorname {E} _{G}{\bigl [}\operatorname {Var} _{F}(X|\theta ){\bigr ]}+\operatorname {Var} _{G}{\bigl (}\operatorname {E} _{F}[X|\theta ]{\bigr )}

If the mean of $F$ is distributed as $G$ , which in turn has mean $\mu$ and variance $\sigma ^{2}$ the expressions above imply $\operatorname {E} _{H}[X]=\operatorname {E} _{G}[\theta ]=\mu$ and $\operatorname {Var} _{H}(X)=\operatorname {Var} _{F}(X|\theta )+\operatorname {Var} _{G}(Y)=\tau ^{2}+\sigma ^{2}$ , where $\tau ^{2}$ is the variance of $F$ .

Proof

let $F$ and $G$ be probability distributions parameterized with mean a variance as

{\begin{aligned}x&\sim {\mathcal {F}}(\theta ,\tau ^{2})\\\theta &\sim {\mathcal {G}}(\mu ,\sigma ^{2})\end{aligned}}

then denoting the probability density functions as $f(x|\theta )=p_{F}(x|\theta )$ and $g(\theta )=p_{G}(\theta )$ respectively, and $h(x)$ being the probability density of $H$ we have

{\begin{aligned}\operatorname {E} _{H}[X]=\int _{F}xh(x)dx&=\int _{F}x\int _{G}f(x|\theta )g(\theta )d\theta dx\\&=\int _{G}\int _{F}xf(x|\theta )dx\ g(\theta )d\theta \\&=\int _{G}\operatorname {E} _{F}[X|\theta ]g(\theta )d\theta \end{aligned}}

and we have from the parameterization ${\mathcal {F}}$ and ${\mathcal {G}}$ that

{\begin{aligned}\operatorname {E} _{F}[X|\theta ]&=\int _{F}xf(x|\theta )dx=\theta \\\operatorname {E} _{G}[\theta ]&=\int _{G}\theta g(\theta )d\theta =\mu \end{aligned}}

and therefore the mean of the compound distribution $\operatorname {E} _{H}[X]=\mu$ as per the expression for its first moment above.

The variance of $H$ is given by $\operatorname {E} _{H}[X^{2}]-(\operatorname {E} _{H}[X])^{2}$ , and

{\begin{aligned}\operatorname {E} _{H}[X^{2}]=\int _{F}x^{2}h(x)dx&=\int _{F}x^{2}\int _{G}f(x|\theta )g(\theta )d\theta dx\\&=\int _{G}g(\theta )\int _{F}x^{2}f(x|\theta )dx\ d\theta \\&=\int _{G}g(\theta )(\tau ^{2}+\theta ^{2})d\theta \\&=\tau ^{2}\int _{G}g(\theta )d\theta +\int _{G}g(\theta )\theta ^{2}d\theta \\&=\tau ^{2}+(\sigma ^{2}+\mu ^{2}),\end{aligned}}

given the fact that $\int _{F}x^{2}f(x\mid \theta )dx=\operatorname {E} _{F}[X^{2}\mid \theta ]=\operatorname {Var} _{F}(X\mid \theta )+(\operatorname {E} _{F}[X\mid \theta ])^{2}$ and $\int _{G}\theta ^{2}g(\theta )d\theta =\operatorname {E} _{G}[\theta ^{2}]=\operatorname {Var} _{G}(\theta )+(\operatorname {E} _{G}[\theta ])^{2}$ . Finally we get

{\begin{aligned}\operatorname {Var} _{H}(X)&=\operatorname {E} _{H}[X^{2}]-(\operatorname {E} _{H}[X])^{2}\\&=\tau ^{2}+\sigma ^{2}\end{aligned}}

Applications

Testing

Distributions of common test statistics result as compound distributions under their null hypothesis, for example in Student's t-test (where the test statistic results as the ratio of a normal and a chi-squared random variable), or in the F-test (where the test statistic is the ratio of two chi-squared random variables).

Overdispersion modeling

Compound distributions are useful for modeling outcomes exhibiting overdispersion, i.e., a greater amount of variability than would be expected under a certain model. For example, count data are commonly modeled using the Poisson distribution, whose variance is equal to its mean. The distribution may be generalized by allowing for variability in its rate parameter, implemented via a gamma distribution, which results in a marginal negative binomial distribution. This distribution is similar in its shape to the Poisson distribution, but it allows for larger variances. Similarly, a binomial distribution may be generalized to allow for additional variability by compounding it with a beta distribution for its success probability parameter, which results in a beta-binomial distribution.

Bayesian inference

Besides ubiquitous marginal distributions that may be seen as special cases of compound distributions, in Bayesian inference, compound distributions arise when, in the notation above, F represents the distribution of future observations and G is the posterior distribution of the parameters of F, given the information in a set of observed data. This gives a posterior predictive distribution. Correspondingly, for the prior predictive distribution, F is the distribution of a new data point while G is the prior distribution of the parameters.

Convolution

Convolution of probability distributions (to derive the probability distribution of sums of random variables) may also be seen as a special case of compounding; here the sum's distribution essentially results from considering one summand as a random location parameter for the other summand.^[1]

Computation

Compound distributions derived from exponential family distributions often have a closed form. If analytical integration is not possible, numerical methods may be necessary.

Compound distributions may relatively easily be investigated using Monte Carlo methods, i.e., by generating random samples. It is often easy to generate random numbers from the distributions $p(\theta )$ as well as $p(x|\theta )$ and then utilize these to perform collapsed Gibbs sampling to generate samples from $p(x)$ .

A compound distribution may usually also be approximated to a sufficient degree by a mixture distribution using a finite number of mixture components, allowing to derive approximate density, distribution function etc.^[1]

Parameter estimation (maximum-likelihood or maximum-a-posteriori estimation) within a compound distribution model may sometimes be simplified by utilizing the EM-algorithm.^[2]

Examples

Gaussian scale mixtures:^[3]^[4]
- Compounding a normal distribution with variance distributed according to an inverse gamma distribution (or equivalently, with precision distributed as a gamma distribution) yields a non-standardized Student's t-distribution .^[5] This distribution has the same symmetrical shape as a normal distribution with the same central point, but has greater variance and heavy tails.
- Compounding a Gaussian (or normal) distribution with variance distributed according to an exponential distribution (or with standard deviation according to a Rayleigh distribution) yields a Laplace distribution . More generally, compounding a Gaussian (or normal) distribution with variance distributed according to a gamma distribution yields a variance-gamma distribution .
- Compounding a Gaussian distribution with variance distributed according to an exponential distribution whose rate parameter is itself distributed according to a gamma distribution yields a Normal-exponential-gamma distribution . (This involves two compounding stages. The variance itself then follows a Lomax distribution; see below.)
- Compounding a Gaussian distribution with standard deviation distributed according to a (standard) inverse uniform distribution yields a Slash distribution .
- Compounding a Gaussian (normal) distribution with a Kolmogorov distribution yields a logistic distribution .^[3]
other Gaussian mixtures:
- Compounding a Gaussian distribution with mean distributed according to another Gaussian distribution yields (again) a Gaussian distribution .
- Compounding a Gaussian distribution with mean distributed according to a shifted exponential distribution yields an exponentially modified Gaussian distribution .

Compounding a Bernoulli distribution with probability of success $p$ distributed according to a distribution $X$ that has a defined expected value yields a Bernoulli distribution with success probability $E[X]$ . An interesting consequence is that the dispersion of $X$ does not influence the dispersion of the resulting compound distribution.
Compounding a binomial distribution with probability of success distributed according to a beta distribution yields a beta-binomial distribution . It possesses three parameters, a parameter $n$ (number of samples) from the binomial distribution and shape parameters $\alpha$ and $\beta$ from the beta distribution.^[6]^[7]
Compounding a multinomial distribution with probability vector distributed according to a Dirichlet distribution yields a Dirichlet-multinomial distribution .
Compounding a Poisson distribution with rate parameter distributed according to a gamma distribution yields a negative binomial distribution .^[8]^[9]
Compounding a Poisson distribution with rate parameter distributed according to a exponential distribution yields a geometric distribution .
Compounding an exponential distribution with its rate parameter distributed according to a gamma distribution yields a Lomax distribution .^[10]
Compounding a gamma distribution with inverse scale parameter distributed according to another gamma distribution yields a three-parameter beta prime distribution .^[11]
Compounding a half-normal distribution with its scale parameter distributed according to a Rayleigh distribution yields an exponential distribution . This follows immediately from the Laplace distribution resulting as a normal scale mixture; see above. The roles of conditional and mixing distributions may also be exchanged here; consequently, compounding a Rayleigh distribution with its scale parameter distributed according to a half-normal distribution also yields an exponential distribution.
A Gamma(k=2,θ) - distributed random variable whose scale parameter θ again is uniformly distributed marginally yields an exponential distribution .

Similar terms

The notion of "compound distribution" as used e.g. in the definition of a Compound Poisson distribution or Compound Poisson process is different from the definition found in this article. The meaning in this article corresponds to what is used in e.g. Bayesian hierarchical modeling.

The special case for compound probability distributions where the parametrized distribution $F$ is the Poisson distribution is also called mixed Poisson distribution.

Related Research Articles

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

With a shape parameter $and a scale parameter .$
With a shape parameter $and an inverse scale parameter, called a rate parameter.$

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. The terms "distribution" and "family" are often used loosely: specifically, an exponential family is a set of distributions, where the specific distribution varies with the parameter; however, a parametric family of distributions is often referred to as "a distribution", and the set of all exponential families is sometimes loosely referred to as "the" exponential family. They are distinct because they possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out. Empirical Bayes, also known as maximum marginal likelihood, represents a convenient approach for setting hyperparameters, but has been mostly supplanted by fully Bayesian hierarchical analyses since the 2000s with the increasing availability of well-performing computation techniques. It is still commonly used, however, for variational methods in Deep Learning, such as variational autoencoders, where latent variable spaces are high-dimensional.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In Bayesian probability theory, if the posterior distribution $is in the same probability distribution family as the prior probability distribution, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function .$

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In probability and statistics, the class of exponential dispersion models (EDM), also called exponential dispersion family (EDF), is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

In probability theory and statistics, the normal-exponential-gamma distribution is a three-parameter family of continuous probability distributions. It has a location parameter $, scale parameter and a shape parameter .$

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product $is a product distribution .$

In probability theory and statistics, the beta rectangular distribution is a probability distribution that is a finite mixture distribution of the beta distribution and the continuous uniform distribution. The support is of the distribution is indicated by the parameters a and b, which are the minimum and maximum values respectively. The distribution provides an alternative to the beta distribution such that it allows more density to be placed at the extremes of the bounded interval of support. Thus it is a bounded distribution that allows for outliers to have a greater chance of occurring than does the beta distribution.

In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

A mixed Poisson distribution is a univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a Poisson distribution, and that the rate parameter itself is considered as a random variable. Hence it is a special case of a compound probability distribution. Mixed Poisson distributions can be found in actuarial mathematics as a general approach for the distribution of the number of claims and is also examined as an epidemiological model. It should not be confused with compound Poisson distribution or compound Poisson process.

References

1 2 Röver, C.; Friede, T. (2017). "Discrete approximation of a mixture distribution via restricted divergence". Journal of Computational and Graphical Statistics. 26 (1): 217–222. arXiv: 1602.04060 . doi: 10.1080/10618600.2016.1276840 .
↑ Gelman, A.; Carlin, J. B.; Stern, H.; Rubin, D. B. (1997). "9.5 Finding marginal posterior modes using EM and related algorithms". Bayesian Data Analysis (1st ed.). Boca Raton: Chapman & Hall / CRC. p. 276.
1 2 Lee, S.X.; McLachlan, G.J. (2019). "Scale mixture distribution". Wiley StatsRef: Statistics Reference Online. doi:10.1002/9781118445112.stat08201.
↑ Gneiting, T. (1997). "Normal scale mixtures and dual probability densities". Journal of Statistical Computation and Simulation. 59 (4): 375–384. doi:10.1080/00949659708811867.
↑ Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). Introduction to the theory of statistics (3rd ed.). New York: McGraw-Hill.
↑ Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005). "6.2.2". Univariate discrete distributions (3rd ed.). New York: Wiley. p. 253.
↑ Gelman, A.; Carlin, J. B.; Stern, H.; Dunson, D. B.; Vehtari, A.; Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Boca Raton: Chapman & Hall / CRC.
↑ Lawless, J.F. (1987). "Negative binomial and mixed Poisson regression". The Canadian Journal of Statistics. 15 (3): 209–225. doi:10.2307/3314912. JSTOR 3314912.
↑ Teich, M. C.; Diament, P. (1989). "Multiply stochastic representations for K distributions and their Poisson transforms". Journal of the Optical Society of America A. 6 (1): 80–91. Bibcode:1989JOSAA...6...80T. CiteSeerX 10.1.1.64.596 . doi:10.1364/JOSAA.6.000080.
↑ Johnson, N. L.; Kotz, S.; Balakrishnan, N. (1994). "20 Pareto distributions". Continuous univariate distributions. Vol. 1 (2nd ed.). New York: Wiley. p. 573.
↑ Dubey, S. D. (1970). "Compound gamma, beta and F distributions". Metrika. 16: 27–31. doi:10.1007/BF02613934.