Natural exponential family

Last updated

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

Contents

Definition

Univariate case

The natural exponential families (NEF) are a subset of the exponential families. A NEF is an exponential family in which the natural parameter η and the natural statistic T(x) are both the identity. A distribution in an exponential family with parameter θ can be written with probability density function (PDF)

where and are known functions. A distribution in a natural exponential family with parameter θ can thus be written with PDF

[Note that slightly different notation is used by the originator of the NEF, Carl Morris. [1] Morris uses ω instead of η and ψ instead of A.]

General multivariate case

Suppose that , then a natural exponential family of order p has density or mass function of the form:

where in this case the parameter

Moment and cumulant generating functions

A member of a natural exponential family has moment generating function (MGF) of the form

The cumulant generating function is by definition the logarithm of the MGF, so it is

Examples

The five most important univariate cases are:

These five examples Poisson, binomial, negative binomial, normal, and gamma are a special subset of NEF, called NEF with quadratic variance function (NEF-QVF) because the variance can be written as a quadratic function of the mean. NEF-QVF are discussed below.

Distributions such as the exponential, Bernoulli, and geometric distributions are special cases of the above five distributions. For example, the Bernoulli distribution is a binomial distribution with n = 1 trial, the exponential distribution is a gamma distribution with shape parameter α = 1 (or k = 1 ), and the geometric distribution is a special case of the negative binomial distribution.

Some exponential family distributions are not NEF. The lognormal and Beta distribution are in the exponential family, but not the natural exponential family. The gamma distribution with two parameters is an exponential family but not a NEF and the chi-squared distribution is a special case of the gamma distribution with fixed scale parameter, and thus is also an exponential family but not a NEF (note that only a gamma distribution with fixed shape parameter is a NEF).

The inverse Gaussian distribution is a NEF with a cubic variance function.

The parameterization of most of the above distributions has been written differently from the parameterization commonly used in textbooks and the above linked pages. For example, the above parameterization differs from the parameterization in the linked article in the Poisson case. The two parameterizations are related by , where λ is the mean parameter, and so that the density may be written as

for , so

This alternative parameterization can greatly simplify calculations in mathematical statistics. For example, in Bayesian inference, a posterior probability distribution is calculated as the product of two distributions. Normally this calculation requires writing out the probability distribution functions (PDF) and integrating; with the above parameterization, however, that calculation can be avoided. Instead, relationships between distributions can be abstracted due to the properties of the NEF described below.

An example of the multivariate case is the multinomial distribution with known number of trials.

Properties

The properties of the natural exponential family can be used to simplify calculations involving these distributions.

Univariate case

  1. Natural exponential families (NEF) are closed under convolution. [2] Given independent identically distributed (iid) with distribution from an NEF, then is an NEF, although not necessarily the original NEF. This follows from the properties of the cumulant generating function.
  2. The variance function for random variables with an NEF distribution can be written in terms of the mean. [2]
  3. The first two moments of a NEF distribution uniquely specify the distribution within that family of distributions. [2]

Multivariate case

In the multivariate case, the mean vector and covariance matrix are[ citation needed ]

where is the gradient and is the Hessian matrix.

Natural exponential families with quadratic variance functions (NEF-QVF)

A special case of the natural exponential families are those with quadratic variance functions. Six NEFs have quadratic variance functions (QVF) in which the variance of the distribution can be written as a quadratic function of the mean. These are called NEF-QVF. The properties of these distributions were first described by Carl Morris. [3]

The six NEF-QVFs

The six NEF-QVF are written here in increasing complexity of the relationship between variance and mean.

  1. The normal distribution with fixed variance is NEF-QVF because the variance is constant. The variance can be written , so variance is a degree 0 function of the mean.
  2. The Poisson distribution is NEF-QVF because all Poisson distributions have variance equal to the mean , so variance is a linear function of the mean.
  3. The Gamma distribution is NEF-QVF because the mean of the Gamma distribution is and the variance of the Gamma distribution is , so the variance is a quadratic function of the mean.
  4. The binomial distribution is NEF-QVF because the mean is and the variance is which can be written in terms of the mean as

  1. The negative binomial distribution is NEF-QVF because the mean is and the variance is
  2. The (not very famous) distribution generated by the generalized[ clarification needed ] hyperbolic secant distribution (NEF-GHS) has[ citation needed ] and

Properties of NEF-QVF

The properties of NEF-QVF can simplify calculations that use these distributions.

  1. Natural exponential families with quadratic variance functions (NEF-QVF) are closed under convolutions of a linear transformation. [4] That is, a convolution of a linear transformation of an NEF-QVF is also an NEF-QVF, although not necessarily the original one. Given independent identically distributed (iid) with distribution from a NEF-QVF. A convolution of a linear transformation of an NEF-QVF is also an NEF-QVF. Let be the convolution of a linear transformation of X. The mean of Y is . The variance of Y can be written in terms of the variance function of the original NEF-QVF. If the original NEF-QVF had variance function

    then the new NEF-QVF has variance function

    where

  2. Let and be independent NEF with the same parameter θ and let . Then the conditional distribution of given has quadratic variance in if and only if and are NEF-QVF. Examples of such conditional distributions are the normal, binomial, beta, hypergeometric and geometric distributions, which are not all NEF-QVF. [1]
  3. NEF-QVF have conjugate prior distributions on μ in the Pearson system of distributions (also called the Pearson distribution although the Pearson system of distributions is actually a family of distributions rather than a single distribution.) Examples of conjugate prior distributions of NEF-QVF distributions are the normal, gamma, reciprocal gamma, beta, F-, and t- distributions. Again, these conjugate priors are not all NEF-QVF. [1]
  4. If has an NEF-QVF distribution and μ has a conjugate prior distribution then the marginal distributions are well-known distributions. [1] These properties together with the above notation can simplify calculations in mathematical statistics that would normally be done using complicated calculations and calculus.

See also

Related Research Articles

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

<span class="mw-page-title-main">Cramér–Rao bound</span> Lower bound on variance of an estimator

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It is also known as Fréchet-Cramér–Rao or Fréchet-Darmois-Cramér-Rao lower bound. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.
<span class="mw-page-title-main">Ornstein–Uhlenbeck process</span> Stochastic process modeling random walk with friction

In mathematics, the Ornstein–Uhlenbeck process is a stochastic process with applications in financial mathematics and the physical sciences. Its original application in physics was as a model for the velocity of a massive Brownian particle under the influence of friction. It is named after Leonard Ornstein and George Eugene Uhlenbeck.

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In probability and statistics, the class of exponential dispersion models (EDM), also called exponential dispersion family (EDF), is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In probability and statistics, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

References

  1. 1 2 3 4 Morris C. (2006) "Natural exponential families", Encyclopedia of Statistical Sciences.
  2. 1 2 3 Carl N. Morris. "Natural Exponential Families with Quadratic Variance Functions: Statistical Theory." Ann. Statist. 11 (2) 515 - 529, June, 1983. doi:10.1214/aos/1176346158
  3. Morris, Carl (1982). "Natural Exponential Families with Quadratic Variance Functions". The Annals of Statistics. 10 (1): 65–80. doi: 10.1214/aos/1176345690 .
  4. Morris, Carl; Lock, Kari F. (2009). "Unifying the Named Natural Exponential Families and Their Relatives". The American Statistician. 63 (3): 247–253. doi:10.1198/tast.2009.08145. S2CID   7095121.