Geometric distribution

Geometric
	Probability mass function
	Cumulative distribution function
Parameters	success probability (real)
Support	k trials where
PMF
CDF	for ,; for
Mean
Median	; (not unique if Contents Definition ; Properties ; Memorylessness ; Moments and cumulants ; Summary statistics ; Entropy and Fisher's Information ; Entropy (Geometric Distribution, Failures Before Success) ; Fisher's Information (Geometric Distribution, Failures Before Success) ; Entropy (Geometric Distribution, Trials Until Success) ; Fisher's Information (Geometric Distribution, Trials Until Success) ; General properties ; Related distributions ; Statistical inference ; Method of moments ; Maximum likelihood estimation ; Bayesian inference ; Random variate generation ; Applications ; See also ; References ; is an integer)
Mode
Variance
Skewness
Excess kurtosis
Entropy
MGF	; for
CF
PGF
Fisher information

Last updated October 30, 2024

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

The probability distribution of the number $X$ of Bernoulli trials needed to get one success, supported on $\mathbb {N} =\{1,2,3,\ldots \}$ ;
The probability distribution of the number $Y=X-1$ of failures before the first success, supported on $\mathbb {N} _{0}=\{0,1,2,\ldots \}$ .

These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of $X$ ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires $k$ independent trials, each with success probability $p$ . If the probability of success on each trial is $p$ , then the probability that the $k$ -th trial is the first success is

\Pr(X=k)=(1-p)^{k-1}p

for $k=1,2,3,4,\dots$

The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:

\Pr(Y=k)=\Pr(X=k+1)=(1-p)^{k}p

for $k=0,1,2,3,\dots$

The geometric distribution gets its name because its probabilities follow a geometric sequence. It is sometimes called the Furry distribution after Wendell H. Furry.^[1]^: 210

Definition

The geometric distribution is the discrete probability distribution that describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. Its probability mass function depends on its parameterization and support. When supported on $\mathbb {N}$ , the probability mass function is $P(X=k)=(1-p)^{k-1}p$ where $k=1,2,3,\dotsc$ is the number of trials and $p$ is the probability of success in each trial.^[2]^{: 260–261}

The support may also be $\mathbb {N} _{0}$ , defining $Y=X-1$ . This alters the probability mass function into $P(Y=k)=(1-p)^{k}p$ where $k=0,1,2,\dotsc$ is the number of failures before the first success.^[3]^: 66

An alternative parameterization of the distribution gives the probability mass function $P(Y=k)=\left({\frac {P}{Q}}\right)^{k}\left(1-{\frac {P}{Q}}\right)$ where $P={\frac {1-p}{p}}$ and $Q={\frac {1}{p}}$ .^[1]^{: 208–209}

An example of a geometric distribution arises from rolling a six-sided dieuntil a "1" appears. Each roll is independent with a $1/6$ chance of success. The number of rolls needed follows a geometric distribution with $p=1/6$ .

Properties

Memorylessness

The geometric distribution is the only memoryless discrete probability distribution.^[4] It is the discrete version of the same property found in the exponential distribution.^[1]^: 228 The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success.

Because there are two definitions of the geometric distribution, there are also two definitions of memorylessness for discrete random variables.^[5] Expressed in terms of conditional probability, the two definitions are $\Pr(X>m+n\mid X>n)=\Pr(X>m),$

and $\Pr(Y>m+n\mid Y\geq n)=\Pr(Y>m),$

where $m$ and $n$ are natural numbers, $X$ is a geometrically distributed random variable defined over $\mathbb {N}$ , and $Y$ is a geometrically distributed random variable defined over $\mathbb {N} _{0}$ . Note that these definitions are not equivalent for discrete random variables; $Y$ does not satisfy the first equation and $X$ does not satisfy the second.

Moments and cumulants

The expected value and variance of a geometrically distributed random variable $X$ defined over $\mathbb {N}$ is^[2]^: 261 $\operatorname {E} (X)={\frac {1}{p}},\qquad \operatorname {var} (X)={\frac {1-p}{p^{2}}}.$ With a geometrically distributed random variable $Y$ defined over $\mathbb {N} _{0}$ , the expected value changes into $\operatorname {E} (Y)={\frac {1-p}{p}},$ while the variance stays the same.^[6]^{: 114–115}

For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is ${\frac {1}{1/6}}=6$ and the average number of failures is ${\frac {1-1/6}{1/6}}=5$ .

The moment generating function of the geometric distribution when defined over $\mathbb {N}$ and $\mathbb {N} _{0}$ respectively is^[7]^[6]^: 114 ${\begin{aligned}M_{X}(t)&={\frac {pe^{t}}{1-(1-p)e^{t}}}\\M_{Y}(t)&={\frac {p}{1-(1-p)e^{t}}},t<-\ln(1-p)\end{aligned}}$ The moments for the number of failures before the first success are given by

{\begin{aligned}\mathrm {E} (Y^{n})&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k^{n}\\&{}=p\operatorname {Li} _{-n}(1-p)&({\text{for }}n\neq 0)\end{aligned}}

where $\operatorname {Li} _{-n}(1-p)$ is the polylogarithm function.^[8]

The cumulant generating function of the geometric distribution defined over $\mathbb {N} _{0}$ is^[1]^: 216 $K(t)=\ln p-\ln(1-(1-p)e^{t})$ The cumulants $\kappa _{r}$ satisfy the recursion $\kappa _{r+1}=q{\frac {\delta \kappa _{r}}{\delta q}},r=1,2,\dotsc$ where $q=1-p$ , when defined over $\mathbb {N} _{0}$ .^[1]^: 216

Proof of expected value

Consider the expected value $\mathrm {E} (X)$ of X as above, i.e. the average number of trials until a success. On the first trial, we either succeed with probability $p$ , or we fail with probability $1-p$ . If we fail the remaining mean number of trials until a success is identical to the original mean. This follows from the fact that all trials are independent. From this we get the formula:

{\begin{aligned}\operatorname {\mathrm {E} } (X)&{}=p\mathrm {E} [X|X=1]+(1-p)\mathrm {E} [X|X>1]\\&{}=p\mathrm {E} [X|X=1]+(1-p)(1+\mathrm {E} [X-1|X>1])\\&{}=p\cdot 1+(1-p)\cdot (1+\mathrm {E} [X]),\end{aligned}}

which, if solved for $\mathrm {E} (X)$ , gives:^{[ citation needed ]}

\operatorname {E} (X)={\frac {1}{p}}.

The expected number of failures $Y$ can be found from the linearity of expectation, $\mathrm {E} (Y)=\mathrm {E} (X-1)=\mathrm {E} (X)-1={\frac {1}{p}}-1={\frac {1-p}{p}}$ . It can also be shown in the following way:^{[ citation needed ]}

{\begin{aligned}\operatorname {E} (Y)&{}=\sum _{k=0}^{\infty }(1-p)^{k}p\cdot k\\&{}=p\sum _{k=0}^{\infty }(1-p)^{k}k\\&{}=p(1-p)\sum _{k=0}^{\infty }(1-p)^{k-1}\cdot k\\&{}=p(1-p)\left[{\frac {d}{dp}}\left(-\sum _{k=0}^{\infty }(1-p)^{k}\right)\right]\\&{}=p(1-p){\frac {d}{dp}}\left(-{\frac {1}{p}}\right)\\&{}={\frac {1-p}{p}}.\end{aligned}}

The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.

Summary statistics

The mean of the geometric distribution is its expected value which is, as previously discussed in § Moments and cumulants, ${\frac {1}{p}}$ or ${\frac {1-p}{p}}$ when defined over $\mathbb {N}$ or $\mathbb {N} _{0}$ respectively.

The median of the geometric distribution is $\left\lceil -{\frac {\log 2}{\log(1-p)}}\right\rceil$ when defined over $\mathbb {N}$ ^[9] and $\left\lfloor -{\frac {\log 2}{\log(1-p)}}\right\rfloor$ when defined over $\mathbb {N} _{0}$ .^[3]^: 69

The mode of the geometric distribution is the first value in the support set. This is 1 when defined over $\mathbb {N}$ and 0 when defined over $\mathbb {N} _{0}$ .^[3]^: 69

The skewness of the geometric distribution is ${\frac {2-p}{\sqrt {1-p}}}$ .^[6]^: 115

The kurtosis of the geometric distribution is $9+{\frac {p^{2}}{1-p}}$ .^[6]^: 115 The excess kurtosis of a distribution is the difference between its kurtosis and the kurtosis of a normal distribution, $3$ .^[10]^: 217 Therefore, the excess kurtosis of the geometric distribution is $6+{\frac {p^{2}}{1-p}}$ . Since ${\frac {p^{2}}{1-p}}\geq 0$ , the excess kurtosis is always positive so the distribution is leptokurtic.^[3]^: 69 In other words, the tail of a geometric distribution decays faster than a Gaussian.^[10]^: 217

Entropy and Fisher's Information

Entropy (Geometric Distribution, Failures Before Success)

Entropy is a measure of uncertainty in a probability distribution. For the geometric distribution that models the number of failures before the first success, the probability mass function is:

P(X=k)=(1-p)^{k}p,\quad k=0,1,2,\dots

The entropy $H(X)$ for this distribution is defined as:

{\begin{aligned}H(X)&=-\sum _{k=0}^{\infty }P(X=k)\ln P(X=k)\\&=-\sum _{k=0}^{\infty }(1-p)^{k}p\ln \left((1-p)^{k}p\right)\\&=-\sum _{k=0}^{\infty }(1-p)^{k}p\left[k\ln(1-p)+\ln p\right]\\&=-\log p-{\frac {1-p}{p}}\log(1-p)\end{aligned}}

The entropy increases as the probability $p$ decreases, reflecting greater uncertainty as success becomes rarer.

Fisher's Information (Geometric Distribution, Failures Before Success)

Fisher information measures the amount of information that an observable random variable $X$ carries about an unknown parameter $p$ . For the geometric distribution (failures before the first success), the Fisher information with respect to $p$ is given by:

I(p)={\frac {1}{p^{2}(1-p)}}

Proof:

The Likelihood Function for a geometric random variable $X$ is:

L(p;X)=(1-p)^{X}p

The Log-Likelihood Function is:

\ln L(p;X)=X\ln(1-p)+\ln p

The Score Function (first derivative of the log-likelihood w.r.t. $p$ ) is:

{\frac {\partial }{\partial p}}\ln L(p;X)={\frac {1}{p}}-{\frac {X}{1-p}}

The second derivative of the log-likelihood function is:

{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)=-{\frac {1}{p^{2}}}-{\frac {X}{(1-p)^{2}}}

Fisher Information is calculated as the negative expected value of the second derivative:

{\begin{aligned}I(p)&=-E\left[{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)\right]\\&=-\left(-{\frac {1}{p^{2}}}-{\frac {1-p}{p(1-p)^{2}}}\right)\\&={\frac {1}{p^{2}(1-p)}}\end{aligned}}

Fisher information increases as $p$ decreases, indicating that rarer successes provide more information about the parameter $p$ .

Entropy (Geometric Distribution, Trials Until Success)

For the geometric distribution modeling the number of trials until the first success, the probability mass function is:

P(X=k)=(1-p)^{k-1}p,\quad k=1,2,3,\dots

The entropy $H(X)$ for this distribution is given by:

{\begin{aligned}H(X)&=-\sum _{k=1}^{\infty }P(X=k)\ln P(X=k)\\&=-\sum _{k=1}^{\infty }(1-p)^{k-1}p\ln \left((1-p)^{k-1}p\right)\\&=-\sum _{k=1}^{\infty }(1-p)^{k-1}p\left[(k-1)\ln(1-p)+\ln p\right]\\&=-\log p+{\frac {1-p}{p}}\log(1-p)\end{aligned}}

Entropy increases as $p$ decreases, reflecting greater uncertainty as the probability of success in each trial becomes smaller.

Fisher's Information (Geometric Distribution, Trials Until Success)

Fisher information for the geometric distribution modeling the number of trials until the first success is given by:

I(p)={\frac {1}{p^{2}(1-p)}}

Proof:

The Likelihood Function for a geometric random variable $X$ is:

L(p;X)=(1-p)^{X-1}p

The Log-Likelihood Function is:

\ln L(p;X)=(X-1)\ln(1-p)+\ln p

The Score Function (first derivative of the log-likelihood w.r.t. $p$ ) is:

{\frac {\partial }{\partial p}}\ln L(p;X)={\frac {1}{p}}-{\frac {X-1}{1-p}}

The second derivative of the log-likelihood function is:

{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)=-{\frac {1}{p^{2}}}-{\frac {X-1}{(1-p)^{2}}}

Fisher Information is calculated as the negative expected value of the second derivative:

{\begin{aligned}I(p)&=-E\left[{\frac {\partial ^{2}}{\partial p^{2}}}\ln L(p;X)\right]\\&=-\left(-{\frac {1}{p^{2}}}-{\frac {1-p}{p(1-p)^{2}}}\right)\\&={\frac {1}{p^{2}(1-p)}}\end{aligned}}

General properties

The probability generating functions of geometric random variables $X$ and $Y$ defined over $\mathbb {N}$ and $\mathbb {N} _{0}$ are, respectively,^[6]^{: 114–115}

{\begin{aligned}G_{X}(s)&={\frac {s\,p}{1-s\,(1-p)}},\\[10pt]G_{Y}(s)&={\frac {p}{1-s\,(1-p)}},\quad |s|<(1-p)^{-1}.\end{aligned}}

The characteristic function $\varphi (t)$ is equal to $G(e^{it})$ so the geometric distribution's characteristic function, when defined over $\mathbb {N}$ and $\mathbb {N} _{0}$ respectively, is^[11]^: 1630 ${\begin{aligned}\varphi _{X}(t)&={\frac {pe^{it}}{1-(1-p)e^{it}}},\\[10pt]\varphi _{Y}(t)&={\frac {p}{1-(1-p)e^{it}}}.\end{aligned}}$
The entropy of a geometric distribution with parameter $p$ is^[12] $-{\frac {p\log _{2}p+(1-p)\log _{2}(1-p)}{p}}$
Given a mean, the geometric distribution is the maximum entropy probability distribution of all discrete probability distributions. The corresponding continuous distribution is the exponential distribution.^[13]
The geometric distribution defined on $\mathbb {N} _{0}$ is infinitely divisible, that is, for any positive integer $n$ , there exist $n$ independent identically distributed random variables whose sum is also geometrically distributed. This is because the negative binomial distribution can be derived from a Poisson-stopped sum of logarithmic random variables.^[11]^{: 606–607}
The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not identically distributed) random variables.^{[ citation needed ]} For example, the hundreds digit D has this probability distribution:

\Pr(D=d)={q^{100d} \over 1+q^{100}+q^{200}+\cdots +q^{900}},

where q = 1 − p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.

Golomb coding is the optimal prefix code ^{[ clarification needed ]} for the geometric discrete distribution.^[12]

Related distributions

The sum of $r$ independent geometric random variables with parameter $p$ is a negative binomial random variable with parameters $r$ and $p$ .^[14] The geometric distribution is a special case of the negative binomial distribution, with $r=1$ .

The geometric distribution is a special case of discrete compound Poisson distribution.^[11]^: 606
The minimum of $n$ geometric random variables with parameters $p_{1},\dotsc ,p_{n}$ is also geometrically distributed with parameter $1-\prod _{i=1}^{n}(1-p_{i})$ .^[15]

Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable X_k has a Poisson distribution with expected value r^k/k. Then

\sum _{k=1}^{\infty }k\,X_{k}

has a geometric distribution taking values in

\mathbb {N} _{0}

, with expected value r/(1 − r).^{[ citation needed ]}

The exponential distribution is the continuous analogue of the geometric distribution. Applying the floor function to the exponential distribution with parameter $\lambda$ creates a geometric distribution with parameter $p=1-e^{-\lambda }$ defined over $\mathbb {N} _{0}$ .^[3]^: 74 This can be used to generate geometrically distributed random numbers as detailed in § Random variate generation.

If p = 1/n and X is geometrically distributed with parameter p, then the distribution of X/n approaches an exponential distribution with expected value 1 as n → ∞, since ${\begin{aligned}\Pr(X/n>a)=\Pr(X>na)&=(1-p)^{na}=\left(1-{\frac {1}{n}}\right)^{na}=\left[\left(1-{\frac {1}{n}}\right)^{n}\right]^{a}\\&\to [e^{-1}]^{a}=e^{-a}{\text{ as }}n\to \infty .\end{aligned}}$ More generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ: $\Pr(X>nx)=\lim _{n\to \infty }(1-\lambda /n)^{nx}=e^{-\lambda x}$ therefore the distribution function of X/n converges to $1-e^{-\lambda x}$ , which is that of an exponential random variable.^{[ citation needed ]}
The index of dispersion of the geometric distribution is ${\frac {1}{p}}$ and its coefficient of variation is ${\frac {1}{\sqrt {1-p}}}$ . The distribution is overdispersed.^[1]^: 216

Statistical inference

The true parameter $p$ of an unknown geometric distribution can be inferred through estimators and conjugate distributions.

Method of moments

Provided they exist, the first $l$ moments of a probability distribution can be estimated from a sample $x_{1},\dotsc ,x_{n}$ using the formula $m_{i}={\frac {1}{n}}\sum _{j=1}^{n}x_{j}^{i}$ where $m_{i}$ is the $i$ th sample moment and $1\leq i\leq l$ .^[16]^{: 349–350} Estimating $\mathrm {E} (X)$ with $m_{1}$ gives the sample mean, denoted ${\bar {x}}$ . Substituting this estimate in the formula for the expected value of a geometric distribution and solving for $p$ gives the estimators ${\hat {p}}={\frac {1}{\bar {x}}}$ and ${\hat {p}}={\frac {1}{{\bar {x}}+1}}$ when supported on $\mathbb {N}$ and $\mathbb {N} _{0}$ respectively. These estimators are biased since $\mathrm {E} \left({\frac {1}{\bar {x}}}\right)>{\frac {1}{\mathrm {E} ({\bar {x}})}}=p$ as a result of Jensen's inequality.^[17]^: 53–54

Maximum likelihood estimation

The maximum likelihood estimator of $p$ is the value that maximizes the likelihood function given a sample.^[16]^: 308 By finding the zero of the derivative of the log-likelihood function when the distribution is defined over $\mathbb {N}$ , the maximum likelihood estimator can be found to be ${\hat {p}}={\frac {1}{\bar {x}}}$ , where ${\bar {x}}$ is the sample mean.^[18] If the domain is $\mathbb {N} _{0}$ , then the estimator shifts to ${\hat {p}}={\frac {1}{{\bar {x}}+1}}$ . As previously discussed in § Method of moments, these estimators are biased.

Regardless of the domain, the bias is equal to

b\equiv \operatorname {E} {\bigg [}\;({\hat {p}}_{\mathrm {mle} }-p)\;{\bigg ]}={\frac {p\,(1-p)}{n}}

which yields the bias-corrected maximum likelihood estimator,^{[ citation needed ]}

{\hat {p\,}}_{\text{mle}}^{*}={\hat {p\,}}_{\text{mle}}-{\hat {b\,}}

Bayesian inference

In Bayesian inference, the parameter $p$ is a random variable from a prior distribution with a posterior distribution calculated using Bayes' theorem after observing samples.^[17]^: 167 If a beta distribution is chosen as the prior distribution, then the posterior will also be a beta distribution and it is called the conjugate distribution. In particular, if a $\mathrm {Beta} (\alpha ,\beta )$ prior is selected, then the posterior, after observing samples $k_{1},\dotsc ,k_{n}\in \mathbb {N}$ , is^[19] $p\sim \mathrm {Beta} \left(\alpha +n,\ \beta +\sum _{i=1}^{n}(k_{i}-1)\right).\!$ Alternatively, if the samples are in $\mathbb {N} _{0}$ , the posterior distribution is^[20] $p\sim \mathrm {Beta} \left(\alpha +n,\beta +\sum _{i=1}^{n}k_{i}\right).$ Since the expected value of a $\mathrm {Beta} (\alpha ,\beta )$ distribution is ${\frac {\alpha }{\alpha +\beta }}$ ,^[11]^: 145 as $\alpha$ and $\beta$ approach zero, the posterior mean approaches its maximum likelihood estimate.

Random variate generation

The geometric distribution can be generated experimentally from i.i.d. standard uniform random variables by finding the first such random variable to be less than or equal to $p$ . However, the number of random variables needed is also geometrically distributed and the algorithm slows as $p$ decreases.^[21]^: 498

Random generation can be done in constant time by truncating exponential random numbers. An exponential random variable $E$ can become geometrically distributed with parameter $p$ through $\lceil -E/\log(1-p)\rceil$ . In turn, $E$ can be generated from a standard uniform random variable $U$ altering the formula into $\lceil \log(U)/\log(1-p)\rceil$ .^[21]^{: 499–500}^[22]

Applications

The geometric distribution is used in many disciplines. In queueing theory, the M/M/1 queue has a steady state following a geometric distribution.^[23] In stochastic processes, the Yule Furry process is geometrically distributed.^[24] The distribution also arises when modeling the lifetime of a device in discrete contexts.^[25] It has also been used to fit data including modeling patients spreading COVID-19.^[26]

Related Research Articles

In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable $, which takes values in the set and is distributed according to, the entropy is where denotes the sum over the variable's possible values. The choice of base for, the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.$

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log₄5 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability $and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes-no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q . It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads. In particular, unfair coins would have$

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

With a shape parameter $k$ and a scale parameter $θ$
With a shape parameter $and an inverse scale parameter ⁠ ⁠, called a rate parameter.$

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in $with a density.$

In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative way of expressing probability, much like odds or log-odds, but which has particular mathematical advantages in the setting of information theory.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted $, is a type of statistical distance: a measure of how one reference probability distribution P is different from a second probability distribution Q . Mathematically, it is defined as$

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

Beliefs depend on the available information. This idea is formalized in probability theory by conditioning. Conditional probabilities, conditional expectations, and conditional probability distributions are treated on three levels: discrete probabilities, probability density functions, and measure theory. Conditioning leads to a non-random result if the condition is completely specified; otherwise, if the condition is left random, the result of conditioning is also random.

In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a subgaussian distribution are dominated by the tails of a Gaussian. This property gives subgaussian distributions their name.

References

1 2 3 4 5 6 Johnson, Norman L.; Kemp, Adrienne W.; Kotz, Samuel (2005-08-19). Univariate Discrete Distributions. Wiley Series in Probability and Statistics (1 ed.). Wiley. doi:10.1002/0471715816. ISBN 978-0-471-27246-5.
1 2 Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. doi:10.1002/9781119243496. ISBN 978-1-119-24352-6.
1 2 3 4 5 Chattamvelli, Rajan; Shanmugam, Ramalingam (2020). Discrete Distributions in Engineering and the Applied Sciences. Synthesis Lectures on Mathematics & Statistics. Cham: Springer International Publishing. doi:10.1007/978-3-031-02425-2. ISBN 978-3-031-01297-6.
↑ Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. p. 50. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.
↑ Weisstein, Eric W. "Memoryless". mathworld.wolfram.com. Retrieved 2024-07-25.
1 2 3 4 5 Forbes, Catherine; Evans, Merran; Hastings, Nicholas; Peacock, Brian (2010-11-29). Statistical Distributions (1st ed.). Wiley. doi:10.1002/9780470627242. ISBN 978-0-470-39063-4.
↑ Bertsekas, Dimitri P.; Tsitsiklis, John N. (2008). Introduction to probability. Optimization and computation series (2nd ed.). Belmont: Athena Scientific. p. 235. ISBN 978-1-886529-23-6.
↑ Weisstein, Eric W. "Geometric Distribution". MathWorld . Retrieved 2024-07-13.
↑ Aggarwal, Charu C. (2024). Probability and Statistics for Machine Learning: A Textbook. Cham: Springer Nature Switzerland. p. 138. doi:10.1007/978-3-031-53282-5. ISBN 978-3-031-53281-8.
1 2 Chan, Stanley (2021). Introduction to Probability for Data Science (1st ed.). Michigan Publishing. ISBN 978-1-60785-747-1.
1 2 3 4 Lovric, Miodrag, ed. (2011). International Encyclopedia of Statistical Science (1st ed.). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-04898-2. ISBN 978-3-642-04897-5.
1 2 Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.
↑ Lisman, J. H. C.; Zuylen, M. C. A. van (March 1972). "Note on the generation of most probable frequency distributions". Statistica Neerlandica . 26 (1): 19–23. doi:10.1111/j.1467-9574.1972.tb00152.x. ISSN 0039-0402.
↑ Pitman, Jim (1993). Probability. New York, NY: Springer New York. p. 372. doi:10.1007/978-1-4612-4374-8. ISBN 978-0-387-94594-1.
↑ Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl: 2060/19940028569 . S2CID 1505801.
1 2 Evans, Michael; Rosenthal, Jeffrey (2023). Probability and Statistics: The Science of Uncertainty (2nd ed.). Macmillan Learning. ISBN 978-1429224628.
1 2 Held, Leonhard; Sabanés Bové, Daniel (2020). Likelihood and Bayesian Inference: With Applications in Biology and Medicine. Statistics for Biology and Health. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-662-60792-3. ISBN 978-3-662-60791-6.
↑ Siegrist, Kyle (2020-05-05). "7.3: Maximum Likelihood". Statistics LibreTexts. Retrieved 2024-06-20.
↑ Fink, Daniel. "A Compendium of Conjugate Priors". CiteSeerX 10.1.1.157.5540 .
↑ "3. Conjugate families of distributions" (PDF). Archived (PDF) from the original on 2010-04-08.
1 2 Devroye, Luc (1986). Non-Uniform Random Variate Generation. New York, NY: Springer New York. doi:10.1007/978-1-4613-8643-8. ISBN 978-1-4613-8645-2.
↑ Knuth, Donald Ervin (1997). The Art of Computer Programming. Vol. 2 (3rd ed.). Reading, Mass: Addison-Wesley. p. 136. ISBN 978-0-201-89683-1.
↑ Daskin, Mark S. (2021). Bite-Sized Operations Management. Synthesis Lectures on Operations Research and Applications. Cham: Springer International Publishing. p. 127. doi:10.1007/978-3-031-02493-1. ISBN 978-3-031-01365-2.
↑ Madhira, Sivaprasad; Deshmukh, Shailaja (2023). Introduction to Stochastic Processes Using R. Singapore: Springer Nature Singapore. p. 449. doi:10.1007/978-981-99-5601-2. ISBN 978-981-99-5600-5.
↑ Gupta, Rakesh; Gupta, Shubham; Ali, Irfan (2023), Garg, Harish (ed.), "Some Discrete Parametric Markov–Chain System Models to Analyze Reliability", Advances in Reliability, Failure and Risk Analysis, Singapore: Springer Nature Singapore, pp. 305–306, doi:10.1007/978-981-19-9909-3_14, ISBN 978-981-19-9908-6 , retrieved 2024-07-13
↑ Polymenis, Athanase (2021-10-01). "An application of the geometric distribution for assessing the risk of infection with SARS-CoV-2 by location". Asian Journal of Medical Sciences. 12 (10): 8–11. doi: 10.3126/ajms.v12i10.38783 . ISSN 2091-0576.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:8-1] 1 2 3 4 5 6 Johnson, Norman L.; Kemp, Adrienne W.; Kotz, Samuel (2005-08-19). Univariate Discrete Distributions. Wiley Series in Probability and Statistics (1 ed.). Wiley. doi:10.1002/0471715816. ISBN 978-0-471-27246-5.

[:1-2] 1 2 Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. doi:10.1002/9781119243496. ISBN 978-1-119-24352-6.

[:2-3] 1 2 3 4 5 Chattamvelli, Rajan; Shanmugam, Ramalingam (2020). Discrete Distributions in Engineering and the Applied Sciences. Synthesis Lectures on Mathematics & Statistics. Cham: Springer International Publishing. doi:10.1007/978-3-031-02425-2. ISBN 978-3-031-01297-6.

[4] Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. p. 50. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.

[5] Weisstein, Eric W. "Memoryless". mathworld.wolfram.com. Retrieved 2024-07-25.

[:0-6] 1 2 3 4 5 Forbes, Catherine; Evans, Merran; Hastings, Nicholas; Peacock, Brian (2010-11-29). Statistical Distributions (1st ed.). Wiley. doi:10.1002/9780470627242. ISBN 978-0-470-39063-4.

[7] Bertsekas, Dimitri P.; Tsitsiklis, John N. (2008). Introduction to probability. Optimization and computation series (2nd ed.). Belmont: Athena Scientific. p. 235. ISBN 978-1-886529-23-6.

[8] Weisstein, Eric W. "Geometric Distribution". MathWorld . Retrieved 2024-07-13.

[9] Aggarwal, Charu C. (2024). Probability and Statistics for Machine Learning: A Textbook. Cham: Springer Nature Switzerland. p. 138. doi:10.1007/978-3-031-53282-5. ISBN 978-3-031-53281-8.

[:4-10] 1 2 Chan, Stanley (2021). Introduction to Probability for Data Science (1st ed.). Michigan Publishing. ISBN 978-1-60785-747-1.

[:9-11] 1 2 3 4 Lovric, Miodrag, ed. (2011). International Encyclopedia of Statistical Science (1st ed.). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-04898-2. ISBN 978-3-642-04897-5.

[:7-12] 1 2 Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.

[13] Lisman, J. H. C.; Zuylen, M. C. A. van (March 1972). "Note on the generation of most probable frequency distributions". Statistica Neerlandica . 26 (1): 19–23. doi:10.1111/j.1467-9574.1972.tb00152.x. ISSN 0039-0402.

[14] Pitman, Jim (1993). Probability. New York, NY: Springer New York. p. 372. doi:10.1007/978-1-4612-4374-8. ISBN 978-0-387-94594-1.

[15] Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl: 2060/19940028569 . S2CID 1505801.

[:5-16] 1 2 Evans, Michael; Rosenthal, Jeffrey (2023). Probability and Statistics: The Science of Uncertainty (2nd ed.). Macmillan Learning. ISBN 978-1429224628.

[:3-17] 1 2 Held, Leonhard; Sabanés Bové, Daniel (2020). Likelihood and Bayesian Inference: With Applications in Biology and Medicine. Statistics for Biology and Health. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-662-60792-3. ISBN 978-3-662-60791-6.

[18] Siegrist, Kyle (2020-05-05). "7.3: Maximum Likelihood". Statistics LibreTexts. Retrieved 2024-06-20.

[19] Fink, Daniel. "A Compendium of Conjugate Priors". CiteSeerX 10.1.1.157.5540 .

[20] "3. Conjugate families of distributions" (PDF). Archived (PDF) from the original on 2010-04-08.

[:6-21] 1 2 Devroye, Luc (1986). Non-Uniform Random Variate Generation. New York, NY: Springer New York. doi:10.1007/978-1-4613-8643-8. ISBN 978-1-4613-8645-2.

[22] Knuth, Donald Ervin (1997). The Art of Computer Programming. Vol. 2 (3rd ed.). Reading, Mass: Addison-Wesley. p. 136. ISBN 978-0-201-89683-1.

[23] Daskin, Mark S. (2021). Bite-Sized Operations Management. Synthesis Lectures on Operations Research and Applications. Cham: Springer International Publishing. p. 127. doi:10.1007/978-3-031-02493-1. ISBN 978-3-031-01365-2.

[24] Madhira, Sivaprasad; Deshmukh, Shailaja (2023). Introduction to Stochastic Processes Using R. Singapore: Springer Nature Singapore. p. 449. doi:10.1007/978-981-99-5601-2. ISBN 978-981-99-5600-5.

[25] Gupta, Rakesh; Gupta, Shubham; Ali, Irfan (2023), Garg, Harish (ed.), "Some Discrete Parametric Markov–Chain System Models to Analyze Reliability", Advances in Reliability, Failure and Risk Analysis, Singapore: Springer Nature Singapore, pp. 305–306, doi:10.1007/978-981-19-9909-3_14, ISBN 978-981-19-9908-6 , retrieved 2024-07-13

[26] Polymenis, Athanase (2021-10-01). "An application of the geometric distribution for assessing the risk of infection with SARS-CoV-2 by location". Asian Journal of Medical Sciences. 12 (10): 8–11. doi: 10.3126/ajms.v12i10.38783 . ISSN 2091-0576.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

Probability mass function
Cumulative distribution function
Parameters	$0<p\leq 1$ success probability (real)	$0<p\leq 1$ success probability (real)
Support	k trials where $k\in \mathbb {N} =\{1,2,3,\dotsc \}$	k failures where $k\in \mathbb {N} _{0}=\{0,1,2,\dotsc \}$
PMF	$(1-p)^{k-1}p$	$(1-p)^{k}p$
CDF	$1-(1-p)^{\lfloor x\rfloor }$ for $x\geq 1$ , $0$ for $x<1$	$1-(1-p)^{\lfloor x\rfloor +1}$ for $x\geq 0$ , $0$ for $x<0$
Mean	${\frac {1}{p}}$	${\frac {1-p}{p}}$
Median	$\left\lceil {\frac {-1}{\log _{2}(1-p)}}\right\rceil$ (not unique if Contents Definition Properties Memorylessness Moments and cumulants Summary statistics Entropy and Fisher's Information Entropy (Geometric Distribution, Failures Before Success) Fisher's Information (Geometric Distribution, Failures Before Success) Entropy (Geometric Distribution, Trials Until Success) Fisher's Information (Geometric Distribution, Trials Until Success) General properties Related distributions Statistical inference Method of moments Maximum likelihood estimation Bayesian inference Random variate generation Applications See also References $-1/\log _{2}(1-p)$ is an integer)	$\left\lceil {\frac {-1}{\log _{2}(1-p)}}\right\rceil -1$ (not unique if $-1/\log _{2}(1-p)$ is an integer)
Mode	$1$	$0$
Variance	${\frac {1-p}{p^{2}}}$	${\frac {1-p}{p^{2}}}$
Skewness	${\frac {2-p}{\sqrt {1-p}}}$	${\frac {2-p}{\sqrt {1-p}}}$
Excess kurtosis	$6+{\frac {p^{2}}{1-p}}$	$6+{\frac {p^{2}}{1-p}}$
Entropy	${\tfrac {-(1-p)\log(1-p)-p\log p}{p}}$	${\tfrac {-(1-p)\log(1-p)-p\log p}{p}}$
MGF	${\frac {pe^{t}}{1-(1-p)e^{t}}},$ for $t<-\ln(1-p)$	${\frac {p}{1-(1-p)e^{t}}},$ for $t<-\ln(1-p)$
CF	${\frac {pe^{it}}{1-(1-p)e^{it}}}$	${\frac {p}{1-(1-p)e^{it}}}$
PGF	${\frac {pz}{1-(1-p)z}}$	${\frac {p}{1-(1-p)z}}$
Fisher information	${\tfrac {1}{p^{2}\cdot (1-p)}}$	${\tfrac {1}{p^{2}\cdot (1-p)}}$