Coupon collector's problem

Last updated
Graph of number of coupons, n vs the expected number of trials (i.e., time) needed to collect them all E (T ) Coupon collector problem.svg
Graph of number of coupons, n vs the expected number of trials (i.e., time) needed to collect them all E (T )

In probability theory, the coupon collector's problem refers to mathematical analysis of "collect all coupons and win" contests. It asks the following question: if each box of a given product (e.g., breakfast cereals) contains a coupon, and there are n different types of coupons, what is the probability that more than t boxes need to be bought to collect all n coupons? An alternative statement is: given n coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once? The mathematical analysis of the problem reveals that the expected number of trials needed grows as . [a] For example, when n = 50 it takes about 225 [b] trials on average to collect all 50 coupons.

Contents

Solution

Via generating functions

By definition of Stirling numbers of the second kind, the probability that exactly T draws are needed isBy manipulating the generating function of the Stirling numbers, we can explicitly calculate all moments of T:In general, the k-th moment is , where is the derivative operator . For example, the 0th moment isand the 1st moment is , which can be explicitly evaluated to , etc.

Calculating the expectation

Let time T be the number of draws needed to collect all n coupons, and let ti be the time to collect the i-th coupon after i  1 coupons have been collected. Then . Think of T and ti as random variables. Observe that the probability of collecting a new coupon is . Therefore, has geometric distribution with expectation . By the linearity of expectations we have:

Here Hn is the n-th harmonic number. Using the asymptotics of the harmonic numbers, we obtain:

where is the Euler–Mascheroni constant.

Using the Markov inequality to bound the desired probability:

The above can be modified slightly to handle the case when we've already collected some of the coupons. Let k be the number of coupons already collected, then:

And when then we get the original result.

Calculating the variance

Using the independence of random variables ti, we obtain:

since (see Basel problem).

Bound the desired probability using the Chebyshev inequality:

Tail estimates

A stronger tail estimate for the upper tail be obtained as follows. Let denote the event that the -th coupon was not picked in the first trials. Then

Thus, for , we have . Via a union bound over the coupons, we obtain

Extensions and generalizations

which is a Gumbel distribution. A simple proof by martingales is in the next section.
Here m is fixed. When m = 1 we get the earlier formula for the expectation.
This is equal to
where m denotes the number of coupons to be collected and PJ denotes the probability of getting any coupon in the set of coupons J.

Martingales

This section is based on. [3]

Define a discrete random process by letting be the number of coupons not yet seen after draws. The random process is just a sequence generated by a Markov chain with states , and transition probabilitiesNow define then it is a martingale, sinceConsequently, we have . In particular, we have a limit law for any . This suggests to us a limit law for .

More generally, each is a martingale process, which allows us to calculate all moments of . For example, giving another limit law . More generally, meaning that has all moments converging to constants, so it converges to some probability distribution on .

Let be the random variable with the limit distribution. We haveBy introducing a new variable , we can sum up both sides explicitly:giving .

At the limit, we have , which is precisely what the limit law states.

By taking the derivative multiple times, we find that , which is a Poisson distribution.

See also

Notes

  1. Here and throughout this article, "log" refers to the natural logarithm rather than a logarithm to some other base. The use of Θ here invokes big O notation.
  2. E(50) = 50(1 + 1/2 + 1/3 + ... + 1/50) = 224.9603, the expected number of trials to collect all 50 coupons. The approximation for this expected number gives in this case .

Related Research Articles

<span class="mw-page-title-main">Expected value</span> Average value of a random variable

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified/constant/fixed number of successes occur. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Geometric distribution</span> Probability distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

<span class="mw-page-title-main">Wiener process</span> Stochastic process generalizing Brownian motion

In mathematics, the Wiener process is a real-valued continuous-time stochastic process named in honor of American mathematician Norbert Wiener for his investigations on the mathematical properties of the one-dimensional Brownian motion. It is often also called Brownian motion due to its historical connection with the physical process of the same name originally observed by Scottish botanist Robert Brown. It is one of the best known Lévy processes and occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

<span class="mw-page-title-main">Bernoulli distribution</span> Probability distribution modeling a coin toss which need not be fair

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads. In particular, unfair coins would have

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Harmonic number</span> Sum of the first n whole number reciprocals; 1/1 + 1/2 + 1/3 + ... + 1/n

In mathematics, the n-th harmonic number is the sum of the reciprocals of the first n natural numbers:

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form and with parametric extension for arbitrary real constants a, b and non-zero c. It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.

In statistics, the Wishart distribution is a generalization of the gamma distribution to multiple dimensions. It is named in honor of John Wishart, who first formulated the distribution in 1928. Other names include Wishart ensemble, or Wishart–Laguerre ensemble, or LOE, LUE, LSE.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

<span class="mw-page-title-main">Conway–Maxwell–Poisson distribution</span> Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection. The process can describe the probabilistic dynamics in a finite population of constant size N in which two alleles A and B are competing for dominance. The two alleles are considered to be true replicators.

References

  1. Mitzenmacher, Michael (2017). Probability and computing : randomization and probabilistic techniques in algorithms and data analysis. Eli Upfal (2nd ed.). Cambridge, United Kingdom. Theorem 5.13. ISBN   978-1-107-15488-9. OCLC   960841613.{{cite book}}: CS1 maint: location missing publisher (link)
  2. Flajolet, Philippe; Gardy, Danièle; Thimonier, Loÿs (1992), "Birthday paradox, coupon collectors, caching algorithms and self-organizing search", Discrete Applied Mathematics, 39 (3): 207–229, CiteSeerX   10.1.1.217.5965 , doi:10.1016/0166-218x(92)90177-c
  3. Kan, N. D. (2005-05-01). "Martingale approach to the coupon collection problem". Journal of Mathematical Sciences. 127 (1): 1737–1744. doi:10.1007/s10958-005-0134-y. ISSN   1573-8795.