Probability-generating function

Last updated September 22, 2024

In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.

Definition

Univariate case

If X is a discrete random variable taking values x in the non-negative integers {0,1, ...}, then the probability generating function of X is defined as ^[1]

G(z)=\operatorname {E} (z^{X})=\sum _{x=0}^{\infty }p(x)z^{x},

where $p$ is the probability mass function of $X$ . Note that the subscripted notations $G_{X}$ and $p_{X}$ are often used to emphasize that these pertain to a particular random variable $X$ , and to its distribution. The power series converges absolutely at least for all complex numbers $z$ with $|z|<1$ ; the radius of convergence being often larger.

Multivariate case

If $X = (X 1,..., X d)$ is a discrete random variable taking values (x₁,...,x_d) in the d-dimensional non-negative integer lattice ${0,1, ...} d$ , then the probability generating function of X is defined as

G(z)=G(z_{1},\ldots ,z_{d})=\operatorname {E} {\bigl (}z_{1}^{X_{1}}\cdots z_{d}^{X_{d}}{\bigr )}=\sum _{x_{1},\ldots ,x_{d}=0}^{\infty }p(x_{1},\ldots ,x_{d})z_{1}^{x_{1}}\cdots z_{d}^{x_{d}},

where $p$ is the probability mass function of $X$ . The power series converges absolutely at least for all complex vectors $z=(z_{1},...z_{d})\in \mathbb {C} ^{d}$ with ${\text{max}}\{|z_{1}|,...,|z_{d}|\}\leq 1.$

Properties

Power series

Probability generating functions obey all the rules of power series with non-negative coefficients. In particular, $G(1^{-})=1$ , where $G(1^{-})=\lim _{x\to 1,x<1}G(x)$ , x approaching 1 from below, since the probabilities must sum to one. So the radius of convergence of any probability generating function must be at least 1, by Abel's theorem for power series with non-negative coefficients.

Probabilities and expectations

The following properties allow the derivation of various basic quantities related to $X$ :

The probability mass function of $X$ is recovered by taking derivatives of $G$ ,
$p(k)=\operatorname {Pr} (X=k)={\frac {G^{(k)}(0)}{k!}}.$
It follows from Property 1 that if random variables $X$ and $Y$ have probability-generating functions that are equal, $G_{X}=G_{Y}$ , then $p_{X}=p_{Y}$ . That is, if $X$ and $Y$ have identical probability-generating functions, then they have identical distributions.
The normalization of the probability mass function can be expressed in terms of the generating function by
$\operatorname {E} [1]=G(1^{-})=\sum _{i=0}^{\infty }p(i)=1.$
The expectation of $X$ is given by
$\operatorname {E} [X]=G'(1^{-}).$
More generally, the $k^{th}$ factorial moment, $\operatorname {E} [X(X-1)\cdots (X-k+1)]$ of $X$ is given by
$\operatorname {E} \left[{\frac {X!}{(X-k)!}}\right]=G^{(k)}(1^{-}),\quad k\geq 0.$
So the variance of $X$ is given by
$\operatorname {Var} (X)=G''(1^{-})+G'(1^{-})-\left[G'(1^{-})\right]^{2}.$
Finally, the $k^{th}$ raw moment of X is given by
$\operatorname {E} [X^{k}]=\left(z{\frac {\partial }{\partial z}}\right)^{k}G(z){\Big |}_{z=1^{-}}$
$G_{X}(e^{t})=M_{X}(t)$ where X is a random variable, $G_{X}(t)$ is the probability generating function (of $X$ ) and $M_{X}(t)$ is the moment-generating function (of $X$ ).

Functions of independent random variables

Probability generating functions are particularly useful for dealing with functions of independent random variables. For example:

If $X_{i},i=1,2,\cdots ,N$ is a sequence of independent (and not necessarily identically distributed) random variables that take on natural-number values, and

S_{N}=\sum _{i=1}^{N}a_{i}X_{i},

where the

a_{i}

are constant natural numbers, then the probability generating function is given by

G_{S_{N}}(z)=\operatorname {E} (z^{S_{N}})=\operatorname {E} \left(z^{\sum _{i=1}^{N}a_{i}X_{i},}\right)=G_{X_{1}}(z^{a_{1}})G_{X_{2}}(z^{a_{2}})\cdots G_{X_{N}}(z^{a_{N}})

.

In particular, if $X$ and $Y$ are independent random variables:

G_{X+Y}(z)=G_{X}(z)\cdot G_{Y}(z)

and

G_{X-Y}(z)=G_{X}(z)\cdot G_{Y}(1/z)

.

In the above, the number $N$ of independent random variables in the sequence is fixed. Let'a assume $N$ is discrete random variable taking values on the non-negative integers, which is independent of the $X_{i}$ , and consider it's probability generating function $G_{N}$ . If the $X_{i}$ are not only independent but also identically distributed with common probability generating function $G_{X}=G_{X_{i}}$ , then

G_{S_{N}}(z)=G_{N}(G_{X}(z)).

This can be seen, using the law of total expectation, as follows:

{\begin{aligned}G_{S_{N}}(z)&=\operatorname {E} (z^{S_{N}})=\operatorname {E} (z^{\sum _{i=1}^{N}X_{i}})\\[4pt]&=\operatorname {E} {\big (}\operatorname {E} (z^{\sum _{i=1}^{N}X_{i}}\mid N){\big )}=\operatorname {E} {\big (}(G_{X}(z))^{N}{\big )}=G_{N}(G_{X}(z)).\end{aligned}}

This last fact is useful in the study of Galton–Watson processes and compound Poisson processes.

When the $X_{i}$ are not supposed identically distributed (but still independent and independent of $N$ ), we have

G_{S_{N}}(z)=\sum _{n\geq 1}f_{n}\prod _{i=1}^{n}G_{X_{i}}(z)

, where

f_{n}=Pr(N=n)

.

For identically distributed

X_{i}

s, this simplifies to the identity stated before, but the general case is sometimes useful to obtain a decomposition of

S_{N}

by means of generating functions.

Examples

The probability generating function of an almost surely constant random variable, i.e. one with $Pr(X=c)=1$ and $Pr(X\neq c)=0$ is

G(z)=z^{c}.

The probability generating function of a binomial random variable, the number of successes in $n$ trials, with probability $p$ of success in each trial, is

G(z)=\left[(1-p)+pz\right]^{n}.

Note: it is the

n

-fold product of the probability generating function of a Bernoulli random variable with parameter

p

.

So the probability generating function of a fair coin, is

G(z)=1/2+z/2.

The probability generating function of a negative binomial random variable on $\{0,1,2\cdots \}$ , the number of failures until the $r^{th]}$ success with probability of success in each trial $p$ , is

G(z)=\left({\frac {p}{1-(1-p)z}}\right)^{r}

, which converges for

|z|<{\frac {1}{1-p}}

.

Note that this is the

r

-fold product of the probability generating function of a geometric random variable with parameter

1-p

on

\{0,1,2,\cdots \}

.

The probability generating function of a Poisson random variable with rate parameter $\lambda$ is

G(z)=e^{\lambda (z-1)}.

Related concepts

The probability generating function is an example of a generating function of a sequence: see also formal power series. It is equivalent to, and sometimes called, the z-transform of the probability mass function.

Other generating functions of random variables include the moment-generating function, the characteristic function and the cumulant generating function. The probability generating function is also equivalent to the factorial moment generating function, which as $\operatorname {E} \left[z^{X}\right]$ can also be considered for continuous and other random variables.

Notes

↑ http://www.am.qub.ac.uk/users/g.gribakin/sor/Chap3.pdf ^{[ bare URL PDF ]}

Related Research Articles

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable $, or just distribution function of, evaluated at, is the probability that will take a value less than or equal to .$

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

In mathematics, a generating function is a representation of an infinite sequence of numbers as the coefficients of a formal power series. Generating functions are often expressed in closed form, by some expression involving operations on the formal series.

In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

<span class="mw-page-title-main">Jensen's inequality</span> Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

In probability theory and statistics, the cumulants $κ n$ of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment is the center of mass, and the second moment is the moment of inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis.

In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative way of expressing probability, much like odds or log-odds, but which has particular mathematical advantages in the setting of information theory.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution, often denoted $, is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.$

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

This article discusses how information theory is related to measure theory.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In probability theory and statistics, the Conway–Maxwell–binomial (CMB) distribution is a three parameter discrete probability distribution that generalises the binomial distribution in an analogous manner to the way that the Conway–Maxwell–Poisson distribution generalises the Poisson distribution. The CMB distribution can be used to model both positive and negative association among the Bernoulli summands,.

References

Johnson, N.L.; Kotz, S.; Kemp, A.W. (1993) Univariate Discrete distributions (2nd edition). Wiley. ISBN 0-471-54897-9 (Section 1.B9)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] ttp://www.am.qub.ac.uk/users/g.gribakin/sor/Chap3.pdf ^{[ bare URL PDF ]}

[1]

v t e Theory of probability distributions
probability mass function (pmf) probability density function (pdf) cumulative distribution function (cdf) quantile function
raw moment central moment mean variance standard deviation skewness kurtosis L-moment
moment-generating function (mgf) characteristic function probability-generating function (pgf) cumulant combinant