This article includes a list of general references, but it lacks sufficient corresponding inline citations .(August 2022) |
Probability mass function | |||
Cumulative distribution function | |||
Parameters | success probability (real) | success probability (real) | |
---|---|---|---|
Support | k trials where | k failures where | |
PMF | |||
CDF | for , for | for , for | |
Mean | |||
Median | Contents
| ||
Mode | |||
Variance | |||
Skewness | |||
Excess kurtosis | |||
Entropy | |||
MGF | for | for | |
CF | |||
PGF |
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
Which of these is called the geometric distribution is a matter of convention and convenience.
These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.
The geometric distribution gives the probability that the first occurrence of success requires independent trials, each with success probability . If the probability of success on each trial is , then the probability that the -th trial is the first success is
for
The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:
for
In either case, the sequence of probabilities is a geometric sequence.
For example, suppose an ordinary die is thrown repeatedly until the first time a "1" appears. The probability distribution of the number of times it is thrown is supported on the infinite set and is a geometric distribution with .
The geometric distribution is denoted by Geo(p) where . [1]
Consider a sequence of trials, where each trial has only two possible outcomes (designated failure and success). The probability of success is assumed to be the same for each trial. In such a sequence of trials, the geometric distribution is useful to model the number of failures before the first success since the experiment can have an indefinite number of trials until success, unlike the binomial distribution which has a set number of trials. The distribution gives the probability that there are zero failures before the first success, one failure before the first success, two failures before the first success, and so on. [2]
The geometric distribution is an appropriate model if the following assumptions are true. [3]
If these conditions are true, then the geometric random variable Y is the count of the number of failures before the first success. The possible number of failures before the first success is 0, 1, 2, 3, and so on. In the graphs above, this formulation is shown on the right.
An alternative formulation is that the geometric random variable X is the total number of trials up to and including the first success, and the number of failures is X − 1. In the graphs above, this formulation is shown on the left.
The general formula to calculate the probability of k failures before the first success, where the probability of success is p and the probability of failure is q = 1 − p, is
for k = 0, 1, 2, 3, ...
E1) A doctor is seeking an antidepressant for a newly diagnosed patient. Suppose that, of the available anti-depressant drugs, the probability that any particular drug will be effective for a particular patient is p = 0.6. What is the probability that the first drug found to be effective for this patient is the first drug tried, the second drug tried, and so on? What is the expected number of drugs that will be tried to find one that is effective?
The probability that the first drug works. There are zero failures before the first success. Y = 0 failures. The probability Pr(zero failures before first success) is simply the probability that the first drug works.
The probability that the first drug fails, but the second drug works. There is one failure before the first success. Y = 1 failure. The probability for this sequence of events is Pr(first drug fails) p(second drug succeeds), which is given by
The probability that the first drug fails, the second drug fails, but the third drug works. There are two failures before the first success. Y = 2 failures. The probability for this sequence of events is Pr(first drug fails) p(second drug fails) Pr(third drug is success)
E2) A newlywed couple plans to have children and will continue until the first girl. What is the probability that there are zero boys before the first girl, one boy before the first girl, two boys before the first girl, and so on?
The probability of having a girl (success) is p= 0.5 and the probability of having a boy (failure) is q = 1 − p = 0.5.
The probability of no boys before the first girl is
The probability of one boy before the first girl is
The probability of two boys before the first girl is
and so on.
The expected value for the number of independent trials to get the first success, and the variance of a geometrically distributed random variable X is:
Similarly, the expected value and variance of the geometrically distributed random variable Y = X - 1 (See definition of distribution ) is:
Consider the expected value of X as above, i.e. the average number of trials until a success. On the first trial we either succeed with probability , or we fail with probability . If we fail the remaining mean number of trials until a success is identical to the original mean. This follows from the fact that all trials are independent. From this we get the formula:
which if solved for gives:
That the expected value of Y as above is (1 − p)/p can be trivially seen from which follows from the linearity of expectation, or can be shown in the following way:
The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.
Let μ = (1 − p)/p be the expected value of Y. Then the cumulants of the probability distribution of Y satisfy the recursion
E3) A patient is waiting for a suitable matching kidney donor for a transplant. If the probability that a randomly selected donor is a suitable match is p = 0.1, what is the expected number of donors who will be tested before a matching donor is found?
With p = 0.1, the mean number of failures before the first success is E(Y) = (1 − p)/p =(1 − 0.1)/0.1 = 9.
For the alternative formulation, where X is the number of trials up to and including the first success, the expected value is E(X) = 1/p = 1/0.1 = 10.
For example 1 above, with p = 0.6, the mean number of failures before the first success is E(Y) = (1 − p)/p = (1 − 0.6)/0.6 = 0.67.
The moments for the number of failures before the first success are given by
where is the polylogarithm function.
More generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ:
therefore the distribution function of X/n converges to , which is that of an exponential random variable.
For both variants of the geometric distribution, the parameter p can be estimated by equating the expected value with the sample mean. This is the method of moments, which in this case happens to yield maximum likelihood estimates of p. [9] [10]
Specifically, for the first variant let k = k1, ..., kn be a sample where ki ≥ 1 for i = 1, ..., n. Then p can be estimated as
In Bayesian inference, the Beta distribution is the conjugate prior distribution for the parameter p. If this parameter is given a Beta(α, β) prior, then the posterior distribution is
The posterior mean E[p] approaches the maximum likelihood estimate as α and β approach zero.
In the alternative case, let k1, ..., kn be a sample where ki ≥ 0 for i = 1, ..., n. Then p can be estimated as
The posterior distribution of p given a Beta(α, β) prior is [11]
Again the posterior mean E[p] approaches the maximum likelihood estimate as α and β approach zero.
For either estimate of using Maximum Likelihood, the bias is equal to
which yields the bias-corrected maximum likelihood estimator
The R function dgeom(k,prob)
calculates the probability that there are k failures before the first success, where the argument "prob" is the probability of success on each trial.
For example,
dgeom(0,0.6)=0.6
dgeom(1,0.6)=0.24
R uses the convention that k is the number of failures, so that the number of trials up to and including the first success is k + 1.
The following R code creates a graph of the geometric distribution from Y = 0 to 10, with p = 0.6.
Y=0:10plot(Y,dgeom(Y,0.6),type="h",ylim=c(0,1),main="Geometric distribution for p=0.6",ylab="Pr(Y=Y)",xlab="Y=Number of failures before first success")
The geometric distribution, for the number of failures before the first success, is a special case of the negative binomial distribution, for the number of failures before s successes.
The Excel function NEGBINOMDIST(number_f, number_s, probability_s)
calculates the probability of k = number_f failures before s = number_s successes where p = probability_s is the probability of success on each trial. For the geometric distribution, let number_s = 1 success. [12]
For example,
=NEGBINOMDIST(0, 1, 0.6)
= 0.6=NEGBINOMDIST(1, 1, 0.6)
= 0.24Like R, Excel uses the convention that k is the number of failures, so that the number of trials up to and including the first success is k + 1.
In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.
In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of successes in draws, without replacement, from a finite population of size that contains exactly objects with that feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of successes in draws with replacement.
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads. In particular, unfair coins would have
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.
In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.
In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.
In probability theory and statistics, the cumulantsκn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.
Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).
In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponential distributions spliced together along the abscissa, although the term is also sometimes used to refer to the Gumbel distribution. The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance gamma process evaluated over the time scale also have a Laplace distribution.
In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of n trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.
In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.
A geometric stable distribution or geo-stable distribution is a type of leptokurtic probability distribution. Geometric stable distributions were introduced in Klebanov, L. B., Maniya, G. M., and Melamed, I. A. (1985). A problem of Zolotarev and analogs of infinitely divisible and stable distributions in a scheme for summing a random number of random variables. These distributions are analogues for stable distributions for the case when the number of summands is random, independent of the distribution of summand, and having geometric distribution. The geometric stable distribution may be symmetric or asymmetric. A symmetric geometric stable distribution is also referred to as a Linnik distribution. The Laplace distribution and asymmetric Laplace distribution are special cases of the geometric stable distribution. The Mittag-Leffler distribution is also a special case of a geometric stable distribution.
In probability theory and statistics, an inverse distribution is the distribution of the reciprocal of a random variable. Inverse distributions arise in particular in the Bayesian context of prior distributions and posterior distributions for scale parameters. In the algebra of random variables, inverse distributions are special cases of the class of ratio distributions, in which the numerator random variable has a degenerate distribution.
In probability theory and statistics, the negative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories like Pass/Fail or Employed/Unemployed. As random selections are made from the population, each subsequent draw decreases the population causing the probability of success to change with each draw. Unlike the standard hypergeometric distribution, which describes the number of successes in a fixed sample size, in the negative hypergeometric distribution, samples are drawn until failures have been found, and the distribution describes the probability of finding successes in such a sample. In other words, the negative hypergeometric distribution describes the likelihood of successes in a sample with exactly failures.
{{cite book}}
: CS1 maint: others (link)