Negative hypergeometric distribution

Last updated
Negative hypergeometric
Probability mass function
Negative hypergeometric pmf.png
Cumulative distribution function
Negative hypergeometric cdf.png
Parameters

- total number of elements
- total number of 'success' elements

Contents

- number of failures when experiment is stopped
Support - number of successes when experiment is stopped.
PMF
Mean
Variance

In probability theory and statistics, the negative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories like Pass/Fail or Employed/Unemployed. As random selections are made from the population, each subsequent draw decreases the population causing the probability of success to change with each draw. Unlike the standard hypergeometric distribution, which describes the number of successes in a fixed sample size, in the negative hypergeometric distribution, samples are drawn until failures have been found, and the distribution describes the probability of finding successes in such a sample. In other words, the negative hypergeometric distribution describes the likelihood of successes in a sample with exactly failures.

Definition

There are elements, of which are defined as "successes" and the rest are "failures".

Elements are drawn one after the other, without replacements, until failures are encountered. Then, the drawing stops and the number of successes is counted. The negative hypergeometric distribution, is the discrete distribution of this .

[1]

The negative hypergeometric distribution is a special case of the beta-binomial distribution [2] with parameters and both being integers (and ).

The outcome requires that we observe successes in draws and the bit must be a failure. The probability of the former can be found by the direct application of the hypergeometric distribution and the probability of the latter is simply the number of failures remaining divided by the size of the remaining population . The probability of having exactly successes up to the failure (i.e. the drawing stops as soon as the sample includes the predefined number of failures) is then the product of these two probabilities:

Therefore, a random variable follows the negative hypergeometric distribution if its probability mass function (pmf) is given by

where

By design the probabilities sum up to 1. However, in case we want show it explicitly we have:

where we have used that,

which can be derived using the binomial identity,

and the Chu–Vandermonde identity,

which holds for any complex-values and and any non-negative integer .

Expectation

When counting the number of successes before failures, the expected number of successes is and can be derived as follows.

where we have used the relationship , that we derived above to show that the negative hypergeometric distribution was properly normalized.

Variance

The variance can be derived by the following calculation.

Then the variance is

If the drawing stops after a constant number of draws (regardless of the number of failures), then the number of successes has the hypergeometric distribution, . The two functions are related in the following way: [1]

Negative-hypergeometric distribution (like the hypergeometric distribution) deals with draws without replacement, so that the probability of success is different in each draw. In contrast, negative-binomial distribution (like the binomial distribution) deals with draws with replacement, so that the probability of success is the same and the trials are independent. The following table summarizes the four distributions related to drawing items:

With replacementsNo replacements
# of successes in constant # of draws binomial distribution hypergeometric distribution
# of successes in constant # of failures negative binomial distribution negative hypergeometric distribution

Some authors [3] [4] define the negative hypergeometric distribution to be the number of draws required to get the th failure. If we let denote this number then it is clear that where is as defined above. Hence the PMF

If we let the number of failures be denoted by means that we have

The support of is the set . It is clear that:

and .

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Binomial coefficient</span> Number of subsets of a given size

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers nk ≥ 0 and is written It is the coefficient of the xk term in the polynomial expansion of the binomial power (1 + x)n; this coefficient can be computed by the multiplicative formula

In elementary algebra, the binomial theorem (or binomial expansion) describes the algebraic expansion of powers of a binomial. According to the theorem, it is possible to expand the polynomial (x + y)n into a sum involving terms of the form axbyc, where the exponents b and c are nonnegative integers with b + c = n, and the coefficient a of each term is a specific positive integer depending on n and b. For example, for n = 4,

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Geometric distribution</span> Probability distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

<span class="mw-page-title-main">Hypergeometric distribution</span> Discrete probability distribution

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of successes in draws, without replacement, from a finite population of size that contains exactly objects with that feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of successes in draws with replacement.

<span class="mw-page-title-main">Bernoulli distribution</span> Probability distribution modeling a coin toss which need not be fair

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads. In particular, unfair coins would have

In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.

In probability theory, the factorial moment is a mathematical quantity defined as the expectation or average of the falling factorial of a random variable. Factorial moments are useful for studying non-negative integer-valued random variables, and arise in the use of probability-generating functions to derive the moments of discrete random variables.

In combinatorics, Vandermonde's identity is the following identity for binomial coefficients:

In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories using sample data.

In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a k-sided die rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

In combinatorics, the binomial transform is a sequence transformation that computes its forward differences. It is closely related to the Euler transform, which is the result of applying the binomial transform to the sequence associated with its ordinary generating function.

<span class="mw-page-title-main">Beta-binomial distribution</span> Discrete probability distribution

In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of n trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data.

<span class="mw-page-title-main">Fisher's noncentral hypergeometric distribution</span>

In probability theory and statistics, Fisher's noncentral hypergeometric distribution is a generalization of the hypergeometric distribution where sampling probabilities are modified by weight factors. It can also be defined as the conditional distribution of two or more binomially distributed variables dependent upon their fixed sum.

<span class="mw-page-title-main">Conway–Maxwell–Poisson distribution</span> Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

In probability and statistics the extended negative binomial distribution is a discrete probability distribution extending the negative binomial distribution. It is a truncated version of the negative binomial distribution for which estimation methods have been studied.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In probability theory, a beta negative binomial distribution is the probability distribution of a discrete random variable  equal to the number of failures needed to get successes in a sequence of independent Bernoulli trials. The probability of success on each trial stays constant within any given experiment but varies across different experiments following a beta distribution. Thus the distribution is a compound probability distribution.

In probability theory and statistics, the Conway–Maxwell–binomial (CMB) distribution is a three parameter discrete probability distribution that generalises the binomial distribution in an analogous manner to the way that the Conway–Maxwell–Poisson distribution generalises the Poisson distribution. The CMB distribution can be used to model both positive and negative association among the Bernoulli summands,.

References

  1. 1 2 Negative hypergeometric distribution in Encyclopedia of Math.
  2. Johnson, Norman L.; Kemp, Adrienne W.; Kotz, Samuel (2005). Univariate Discrete Distributions. Wiley. ISBN   0-471-27246-9. §6.2.2 (p.253–254)
  3. Rohatgi, Vijay K., and AK Md Ehsanes Saleh. An introduction to probability and statistics. John Wiley & Sons, 2015.
  4. Khan, RA (1994). A note on the generating function of a negative hypergeometric distribution. Sankhya: The Indian Journal of Statistics B, 56(3), 309-313.