Lottery mathematics

Last updated

Lottery mathematics is used to calculate probabilities of winning or losing a lottery game. It is based primarily on combinatorics, particularly the twelvefold way and combinations without replacement.

Contents

Choosing 6 from 49

In a typical 6/49 game, each player chooses six distinct numbers from a range of 1–49. If the six numbers on a ticket match the numbers drawn by the lottery, the ticket holder is a jackpot winner—regardless of the order of the numbers. The probability of this happening is 1 in 13,983,816.

The chance of winning can be demonstrated as follows: The first number drawn has a 1 in 49 chance of matching. When the draw comes to the second number, there are now only 48 balls left in the bag, because the balls are drawn without replacement. So there is now a 1 in 48 chance of predicting this number.

Thus for each of the 49 ways of choosing the first number there are 48 different ways of choosing the second. This means that the probability of correctly predicting 2 numbers drawn from 49 in the correct order is calculated as 1 in 49 × 48. On drawing the third number there are only 47 ways of choosing the number; but we could have arrived at this point in any of 49 × 48 ways, so the chances of correctly predicting 3 numbers drawn from 49, again in the correct order, is 1 in 49 × 48 × 47. This continues until the sixth number has been drawn, giving the final calculation, 49 × 48 × 47 × 46 × 45 × 44, which can also be written as or 49 factorial divided by 43 factorial or FACT(49)/FACT(43) or simply PERM(49,6) .

608281864034267560872252163321295376887552831379210240000000000 / 60415263063373835637355132068513997507264512000000000 = 10068347520

This works out to 10,068,347,520, which is much bigger than the ~14 million stated above.

Perm(49,6)=10068347520 and 49 nPr 6 =10068347520.

However, the order of the 6 numbers is not significant for the payout. That is, if a ticket has the numbers 1, 2, 3, 4, 5, and 6, it wins as long as all the numbers 1 through 6 are drawn, no matter what order they come out in. Accordingly, given any combination of 6 numbers, there are 6 × 5 × 4 × 3 × 2 × 1 = 6! or 720 orders in which they can be drawn. Dividing 10,068,347,520 by 720 gives 13,983,816, also written as , or COMBIN(49,6) or 49 nCr 6 or more generally as

, where n is the number of alternatives and k is the number of choices. Further information is available at binomial coefficient and multinomial coefficient.

This function is called the combination function, COMBIN(n,k). For the rest of this article, we will use the notation . "Combination" means the group of numbers selected, irrespective of the order in which they are drawn. A combination of numbers is usually presented in ascending order. An eventual 7th drawn number, the reserve or bonus, is presented at the end.

An alternative method of calculating the odds is to note that the probability of the first ball corresponding to one of the six chosen is 6/49; the probability of the second ball corresponding to one of the remaining five chosen is 5/48; and so on. This yields a final formula of

A 7th ball often is drawn as reserve ball, in the past only a second chance to get 5+1 numbers correct with 6 numbers played.

Odds of getting other possibilities in choosing 6 from 49

One must divide the number of combinations producing the given result by the total number of possible combinations (for example, ). The numerator equates to the number of ways to select the winning numbers multiplied by the number of ways to select the losing numbers.

For a score of n (for example, if 3 choices match three of the 6 balls drawn, then n = 3), describes the odds of selecting n winning numbers from the 6 winning numbers. This means that there are 6 - n losing numbers, which are chosen from the 43 losing numbers in ways. The total number of combinations giving that result is, as stated above, the first number multiplied by the second. The expression is therefore .

This can be written in a general form for all lotteries as:

where is the number of balls in lottery, is the number of balls in a single ticket, and is the number of matching balls for a winning ticket.

The generalisation of this formula is called the hypergeometric distribution.

This gives the following results:

ScoreCalculationExact ProbabilityApproximate Decimal ProbabilityApproximate 1/Probability
0435,461/998,8440.4362.2938
168,757/166,4740.4132.4212
244,075/332,9480.1327.5541
38,815/499,4220.017756.66
4645/665,8960.0009691,032.4
543/2,330,6360.000018454,200.8
61/13,983,8160.000000071513,983,816

When a 7th number is drawn as bonus number then we have 49!/6!/1!/42!.=combin(49,6)*combin(49-6,1)=601304088 different possible drawing results.

ScoreCalculationExact ProbabilityApproximate Decimal ProbabilityApproximate 1/Probability
5 + 0252/139838160.000018020855,491.33
5 + 16/139838160.00000042912,330,636

You would expect to score 3 of 6 or better once in around 36.19 drawings. Notice that It takes a 3 if 6 wheel of 163 combinations to be sure of at least one 3/6 score.

1/p changes when several distinct combinations are played together. It mostly is about winning something, not just the jackpot.

Ensuring to win the jackpot

There is only one known way to ensure winning the jackpot. That is to buy at least one lottery ticket for every possible number combination. For example, one has to buy 13,983,816 different tickets to ensure to win the jackpot in a 6/49 game.

Lottery organizations have laws, rules and safeguards in place to prevent gamblers from executing such an operation. Further, just winning the jackpot by buying every possible combination does not guarantee to break even or make a profit.

If is the probability to win; the cost of a ticket; the cost for obtaining a ticket (e.g. including the logistics); one time costs for the operation (such as setting up and conducting the operation); then the jackpot should contain at least

to have a chance to at least break even.

The above theoretical "chance to break-even" point is slightly offset by the sum of the minor wins also included in all the lottery tickets:

Still, even if the above relation is satisfied, it does not guarantee to break even. The payout depends on the number of winning tickets for all the prizes , resulting in the relation

In probably the only known successful operations [1] the threshold to execute an operation was set at three times the cost of the tickets alone for unknown reasons

I.e.

This does, however, not eliminate all risks to make no profit. The success of the operations still depended on a bit of luck. In addition, in one operation the logistics failed and not all combinations could be obtained. This added the risk of not even winning the jackpot at all.

Powerballs and bonus balls

Many lotteries have a Powerball (or "bonus ball"). If the powerball is drawn from a pool of numbers different from the main lottery, the odds are multiplied by the number of powerballs. For example, in the 6 from 49 lottery, given 10 powerball numbers, then the odds of getting a score of 3 and the powerball would be 1 in 56.66 × 10, or 566.6 (the probability would be divided by 10, to give an exact value of ). Another example of such a game is Mega Millions, albeit with different jackpot odds.

Where more than 1 powerball is drawn from a separate pool of balls to the main lottery (for example, in the EuroMillions game), the odds of the different possible powerball matching scores are calculated using the method shown in the "other scores" section above (in other words, the powerballs are like a mini-lottery in their own right), and then multiplied by the odds of achieving the required main-lottery score.

If the powerball is drawn from the same pool of numbers as the main lottery, then, for a given target score, the number of winning combinations includes the powerball. For games based on the Canadian lottery (such as the lottery of the United Kingdom), after the 6 main balls are drawn, an extra ball is drawn from the same pool of balls, and this becomes the powerball (or "bonus ball"). An extra prize is given for matching 5 balls and the bonus ball. As described in the "other scores" section above, the number of ways one can obtain a score of 5 from a single ticket is . Since the number of remaining balls is 43, and the ticket has 1 unmatched number remaining, 1/43 of these 258 combinations will match the next ball drawn (the powerball), leaving 258/43 = 6 ways of achieving it. Therefore, the odds of getting a score of 5 and the powerball are .

Of the 258 combinations that match 5 of the main 6 balls, in 42/43 of them the remaining number will not match the powerball, giving odds of for obtaining a score of 5 without matching the powerball.

Using the same principle, the odds of getting a score of 2 and the powerball are for the score of 2 multiplied by the probability of one of the remaining four numbers matching the bonus ball, which is 4/43. Since , the probability of obtaining the score of 2 and the bonus ball is , approximate decimal odds of 1 in 81.2.

The general formula for matching balls in a choose lottery with one bonus ball from the pool of balls is:

The general formula for matching balls in a choose lottery with zero bonus ball from the pool of balls is:

The general formula for matching balls in a choose lottery with one bonus ball from a separate pool of balls is:

The general formula for matching balls in a choose lottery with no bonus ball from a separate pool of balls is:

Minimum number of tickets for a match

It is a hard (and often open) problem to calculate the minimum number of tickets one needs to purchase to guarantee that at least one of these tickets matches at least 2 numbers. In the 5-from-90 lotto, the minimum number of tickets that can guarantee a ticket with at least 2 matches is 100. [2]

Information theoretic results

As a discrete probability space, the probability of any particular lottery outcome is atomic, meaning it is greater than zero. Therefore, the probability of any event is the sum of probabilities of the outcomes of the event. This makes it easy to calculate quantities of interest from information theory. For example, the information content of any event is easy to calculate, by the formula

In particular, the information content of outcome of discrete random variable is

For example, winning in the example § Choosing 6 from 49 above is a Bernoulli-distributed random variable with a 1/13,983,816 chance of winning ("success") We write with and . The information content of winning is

shannons or bits of information. (See units of information for further explanation of terminology.) The information content of losing is

The information entropy of a lottery probability distribution is also easy to calculate as the expected value of the information content.

Oftentimes the random variable of interest in the lottery is a Bernoulli trial. In this case, the Bernoulli entropy function may be used. Using representing winning the 6-of-49 lottery, the Shannon entropy of 6-of-49 above is

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Binomial coefficient</span> Number of subsets of a given size

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers nk ≥ 0 and is written It is the coefficient of the xk term in the polynomial expansion of the binomial power (1 + x)n; this coefficient can be computed by the multiplicative formula

In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter. For example, given three fruits, say an apple, an orange and a pear, there are three combinations of two that can be drawn from this set: an apple and a pear; an apple and an orange; or a pear and an orange. More formally, a k-combination of a set S is a subset of k distinct elements of S. So, two combinations are identical if and only if each combination has the same members. If the set has n elements, the number of k-combinations, denoted by or , is equal to the binomial coefficient

<span class="texhtml mvar" style="font-style:italic;">e</span> (mathematical constant) 2.71828..., base of natural logarithms

The number e, also known as Euler's number, is a mathematical constant approximately equal to 2.71828 that can be characterized in many ways. It is the base of natural logarithms. It is the limit of (1 + 1/n)n as n approaches infinity, an expression that arises in the computation of compound interest. It can also be calculated as the sum of the infinite series

<span class="mw-page-title-main">Histogram</span> Graphical representation of the distribution of numerical data

A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often of equal size.

<span class="mw-page-title-main">Logarithm</span> Inverse of the exponential function

In mathematics, the logarithm is the inverse function to exponentiation. That means that the logarithm of a number x to the base b is the exponent to which b must be raised to produce x. For example, since 1000 = 103, the logarithm base 10 of 1000 is 3, or log10 (1000) = 3. The logarithm of x to base b is denoted as logb (x), or without parentheses, logbx, or even without the explicit base, log x, when no confusion is possible, or when the base does not matter such as in big O notation.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

Shor's algorithm is a quantum algorithm for finding the prime factors of an integer. It was developed in 1994 by the American mathematician Peter Shor. It is one of the few known quantum algorithms with compelling potential applications and strong evidence of superpolynomial speedup compared to best known classical algorithms. On the other hand, factoring numbers of practical significance requires far more qubits than available in the near future. Another concern is that noise in quantum circuits may undermine results, requiring additional qubits for quantum error correction.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a die as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the [[expected value]]

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical converges to the true value. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

<span class="mw-page-title-main">Hypergeometric distribution</span> Discrete probability distribution

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of successes in draws, without replacement, from a finite population of size that contains exactly objects with that feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of successes in draws with replacement.

In mathematics, summation is the addition of a sequence of any kind of numbers, called addends or summands; the result is their sum or total. Beside numbers, other types of values can be summed as well: functions, vectors, matrices, polynomials and, in general, elements of any type of mathematical objects on which an operation denoted "+" is defined.

In probability theory and statistics, the cumulantsκn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative way of expressing probability, much like odds or log-odds, but which has particular mathematical advantages in the setting of information theory.

<span class="mw-page-title-main">Pascal's pyramid</span>

In mathematics, Pascal's pyramid is a three-dimensional arrangement of the trinomial numbers, which are the coefficients of the trinomial expansion and the trinomial distribution. Pascal's pyramid is the three-dimensional analog of the two-dimensional Pascal's triangle, which contains the binomial numbers and relates to the binomial expansion and the binomial distribution. The binomial and trinomial numbers, coefficients, expansions, and distributions are subsets of the multinomial constructs with the same names.

Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

<span class="mw-page-title-main">Stirling numbers of the second kind</span> Numbers parameterizing ways to partition a set

In mathematics, particularly in combinatorics, a Stirling number of the second kind is the number of ways to partition a set of n objects into k non-empty subsets and is denoted by or . Stirling numbers of the second kind occur in the field of mathematics called combinatorics and the study of partitions. They are named after James Stirling.

In combinatorics, the twelvefold way is a systematic classification of 12 related enumerative problems concerning two finite sets, which include the classical problems of counting permutations, combinations, multisets, and partitions either of a set or of a number. The idea of the classification is credited to Gian-Carlo Rota, and the name was suggested by Joel Spencer.

Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys' 1939 textbook; it became known as Lindley's paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper.

References

  1. The man who won the lottery 14 times
  2. Z. Füredi, G. J. Székely, and Z. Zubor (1996). "On the lottery problem". Journal of Combinatorial Designs. 4 (1): 5–10. doi:10.1002/(sici)1520-6610(1996)4:1<5::aid-jcd2>3.3.co;2-w.{{cite journal}}: CS1 maint: multiple names: authors list (link)