Probability-generating function

Last updated January 03, 2024

In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.

Definition

Univariate case

If X is a discrete random variable taking values in the non-negative integers {0,1, ...}, then the probability generating function of X is defined as ^[1]

G(z)=\operatorname {E} (z^{X})=\sum _{x=0}^{\infty }p(x)z^{x},

where p is the probability mass function of X. Note that the subscripted notations G_X and p_X are often used to emphasize that these pertain to a particular random variable X, and to its distribution. The power series converges absolutely at least for all complex numbers z with |z| ≤ 1; in many examples the radius of convergence is larger.

Multivariate case

If $X = (X 1,..., X d)$ is a discrete random variable taking values in the d-dimensional non-negative integer lattice ${0,1, ...} d$ , then the probability generating function of X is defined as

G(z)=G(z_{1},\ldots ,z_{d})=\operatorname {E} {\bigl (}z_{1}^{X_{1}}\cdots z_{d}^{X_{d}}{\bigr )}=\sum _{x_{1},\ldots ,x_{d}=0}^{\infty }p(x_{1},\ldots ,x_{d})z_{1}^{x_{1}}\cdots z_{d}^{x_{d}},

where $p$ is the probability mass function of $X$ . The power series converges absolutely at least for all complex vectors $z=(z_{1},...z_{d})\in \mathbb {C} ^{d}$ with ${\text{max}}\{|z_{1}|,...,|z_{d}|\}\leq 1.$

Properties

Power series

Probability generating functions obey all the rules of power series with non-negative coefficients. In particular, G(1⁻) = 1, where G(1⁻) = lim_z→1G(z) from below, since the probabilities must sum to one. So the radius of convergence of any probability generating function must be at least 1, by Abel's theorem for power series with non-negative coefficients.

Probabilities and expectations

The following properties allow the derivation of various basic quantities related to X:

The probability mass function of X is recovered by taking derivatives of G,
$p(k)=\operatorname {Pr} (X=k)={\frac {G^{(k)}(0)}{k!}}.$
It follows from Property 1 that if random variables X and Y have probability-generating functions that are equal, $G_{X}=G_{Y}$ , then $p_{X}=p_{Y}$ . That is, if X and Y have identical probability-generating functions, then they have identical distributions.
The normalization of the probability density function can be expressed in terms of the generating function by
$\operatorname {E} [1]=G(1^{-})=\sum _{i=0}^{\infty }p(i)=1.$
The expectation of $X$ is given by
$\operatorname {E} [X]=G'(1^{-}).$
More generally, the k^th factorial moment, $\operatorname {E} (X(X-1)\cdots (X-k+1))$ of X is given by
$\operatorname {E} \left[{\frac {X!}{(X-k)!}}\right]=G^{(k)}(1^{-}),\quad k\geq 0.$
So the variance of X is given by
$\operatorname {Var} (X)=G''(1^{-})+G'(1^{-})-\left[G'(1^{-})\right]^{2}.$
Finally, the k^th raw moment of X is given by
$\operatorname {E} [X^{k}]=\left(z{\frac {\partial }{\partial z}}\right)^{k}G(z){\Big |}_{z=1^{-}}$
$G_{X}(e^{t})=M_{X}(t)$ where X is a random variable, $G_{X}(t)$ is the probability generating function (of X) and $M_{X}(t)$ is the moment-generating function (of X) .

Functions of independent random variables

Probability generating functions are particularly useful for dealing with functions of independent random variables. For example:

If X₁, X₂, ..., X_N is a sequence of independent (and not necessarily identically distributed) random variables that take on natural-number values, and

S_{N}=\sum _{i=1}^{N}a_{i}X_{i},

where the a_i are constant natural numbers, then the probability generating function is given by

G_{S_{N}}(z)=\operatorname {E} (z^{S_{N}})=\operatorname {E} \left(z^{\sum _{i=1}^{N}a_{i}X_{i},}\right)=G_{X_{1}}(z^{a_{1}})G_{X_{2}}(z^{a_{2}})\cdots G_{X_{N}}(z^{a_{N}}).

For example, if

S_{N}=\sum _{i=1}^{N}X_{i},

then the probability generating function, G_{S_N}(z), is given by

G_{S_{N}}(z)=G_{X_{1}}(z)G_{X_{2}}(z)\cdots G_{X_{N}}(z).

It also follows that the probability generating function of the difference of two independent random variables S = X₁−X₂ is

G_{S}(z)=G_{X_{1}}(z)G_{X_{2}}(1/z).

Suppose that N, the number of independent random variables in the sum above, is not a fixed natural number but is also an independent, discrete random variable taking values on the non-negative integers, with probability generating function G_N. If the X₁, X₂, ..., X_N are independent and identically distributed with common probability generating function G_X, then

G_{S_{N}}(z)=G_{N}(G_{X}(z)).

This can be seen, using the law of total expectation, as follows:

{\begin{aligned}G_{S_{N}}(z)&=\operatorname {E} (z^{S_{N}})=\operatorname {E} (z^{\sum _{i=1}^{N}X_{i}})\\[4pt]&=\operatorname {E} {\big (}\operatorname {E} (z^{\sum _{i=1}^{N}X_{i}}\mid N){\big )}=\operatorname {E} {\big (}(G_{X}(z))^{N}{\big )}=G_{N}(G_{X}(z)).\end{aligned}}

This last fact is useful in the study of Galton–Watson processes and compound Poisson processes.

Suppose again that N is also an independent, discrete random variable taking values on the non-negative integers, with probability generating function G_N and probability mass function $f_{i}=\Pr\{N=i\}$ . If the X₁, X₂, ..., X_N are independent, but not identically distributed random variables, where $G_{X_{i}}$ denotes the probability generating function of $X_{i}$ , then

G_{S_{N}}(z)=\sum _{i\geq 1}f_{i}\prod _{k=1}^{i}G_{X_{i}}(z).

For identically distributed X_i this simplifies to the identity stated before. The general case is sometimes useful to obtain a decomposition of S_N by means of generating functions.

Examples

The probability generating function of an almost surely constant random variable, i.e. one with Pr(X = c) = 1, is

G(z)=z^{c}.

The probability generating function of a binomial random variable, the number of successes in n trials, with probability p of success in each trial, is

G(z)=\left[(1-p)+pz\right]^{n}.

Note that this is the n-fold product of the probability generating function of a Bernoulli random variable with parameter p.

So the probability generating function of a fair coin, is

G(z)=1/2+z/2.

The probability generating function of a negative binomial random variable on {0,1,2 ...}, the number of failures until the rth success with probability of success in each trial p, is

G(z)=\left({\frac {p}{1-(1-p)z}}\right)^{r}.

(Convergence for

|z|<{\frac {1}{1-p}}

).

Note that this is the r-fold product of the probability generating function of a geometric random variable with parameter 1 − p on {0,1,2,...}.

The probability generating function of a Poisson random variable with rate parameter λ is

G(z)=e^{\lambda (z-1)}.

Related concepts

The probability generating function is an example of a generating function of a sequence: see also formal power series. It is equivalent to, and sometimes called, the z-transform of the probability mass function.

Other generating functions of random variables include the moment-generating function, the characteristic function and the cumulant generating function. The probability generating function is also equivalent to the factorial moment generating function, which as $\operatorname {E} \left[z^{X}\right]$ can also be considered for continuous and other random variables.

Notes

↑ http://www.am.qub.ac.uk/users/g.gribakin/sor/Chap3.pdf ^{[ bare URL PDF ]}

Related Research Articles

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a die as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In probability theory and statistics, the cumulants $κ n$ of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment is the center of mass, and the second moment is the moment of inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

This article discusses how information theory is related to measure theory.

In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing constant in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated probability measure, the Gibbs measure, has the Markov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks, and applications such as genomics, corpus linguistics and artificial intelligence, which employ Markov networks, and Markov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing the entropy for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value.

In probability theory and statistics, the Conway–Maxwell–binomial (CMB) distribution is a three parameter discrete probability distribution that generalises the binomial distribution in an analogous manner to the way that the Conway–Maxwell–Poisson distribution generalises the Poisson distribution. The CMB distribution can be used to model both positive and negative association among the Bernoulli summands,.

References

Johnson, N.L.; Kotz, S.; Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9 (Section 1.B9)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] ttp://www.am.qub.ac.uk/users/g.gribakin/sor/Chap3.pdf ^{[ bare URL PDF ]}

[1]

v t e Theory of probability distributions
probability mass function (pmf) probability density function (pdf) cumulative distribution function (cdf) quantile function
raw moment central moment mean variance standard deviation skewness kurtosis L-moment
moment-generating function (mgf) characteristic function probability-generating function (pgf) cumulant combinant