This article includes a list of general references, but it lacks sufficient corresponding inline citations .(January 2010) |
In statistics, the question of checking whether a coin is fair is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of statistical inference and, secondly, in providing a simple problem that can be used to compare various competing methods of statistical inference, including decision theory. The practical problem of checking whether a coin is fair might be considered as easily solved by performing a sufficiently large number of trials, but statistics and probability theory can provide guidance on two types of question; specifically those of how many trials to undertake and of the accuracy of an estimate of the probability of turning up heads, derived from a given sample of trials.
A fair coin is an idealized randomizing device with two states (usually named "heads" and "tails") which are equally likely to occur. It is based on the coin flip used widely in sports and other situations where it is required to give two parties the same chance of winning. Either a specially designed chip or more usually a simple currency coin is used, although the latter might be slightly "unfair" due to an asymmetrical weight distribution, which might cause one state to occur more frequently than the other, giving one party an unfair advantage. [1] So it might be necessary to test experimentally whether the coin is in fact "fair" – that is, whether the probability of the coin's falling on either side when it is tossed is exactly 50%. It is of course impossible to rule out arbitrarily small deviations from fairness such as might be expected to affect only one flip in a lifetime of flipping; also it is always possible for an unfair (or "biased") coin to happen to turn up exactly 10 heads in 20 flips. Therefore, any fairness test must only establish a certain degree of confidence in a certain degree of fairness (a certain maximum bias). In more rigorous terminology, the problem is of determining the parameters of a Bernoulli process, given only a limited sample of Bernoulli trials.
This article describes experimental procedures for determining whether a coin is fair or unfair. There are many statistical methods for analyzing such an experimental procedure. This article illustrates two of them.
Both methods prescribe an experiment (or trial) in which the coin is tossed many times and the result of each toss is recorded. The results can then be analysed statistically to decide whether the coin is "fair" or "probably not fair".
An important difference between these two approaches is that the first approach gives some weight to one's prior experience of tossing coins, while the second does not. The question of how much weight to give to prior experience, depending on the quality (credibility) of that experience, is discussed under credibility theory.
One method is to calculate the posterior probability density function of Bayesian probability theory.
A test is performed by tossing the coin N times and noting the observed numbers of heads, h, and tails, t. The symbols H and T represent more generalised variables expressing the numbers of heads and tails respectively that might have been observed in the experiment. Thus N = H + T = h + t.
Next, let r be the actual probability of obtaining heads in a single toss of the coin. This is the property of the coin which is being investigated. Using Bayes' theorem, the posterior probability density of r conditional on h and t is expressed as follows:
where g(r) represents the prior probability density distribution of r, which lies in the range 0 to 1.
The prior probability density distribution summarizes what is known about the distribution of r in the absence of any observation. We will assume that the prior distribution of r is uniform over the interval [0, 1]. That is, g(r) = 1. (In practice, it would be more appropriate to assume a prior distribution which is much more heavily weighted in the region around 0.5, to reflect our experience with real coins.)
The probability of obtaining h heads in N tosses of a coin with a probability of heads equal to r is given by the binomial distribution:
Substituting this into the previous formula:
This is in fact a beta distribution (the conjugate prior for the binomial distribution), whose denominator can be expressed in terms of the beta function:
As a uniform prior distribution has been assumed, and because h and t are integers, this can also be written in terms of factorials:
For example, let N = 10, h = 7, i.e. the coin is tossed 10 times and 7 heads are obtained:
The graph on the right shows the probability density function of r given that 7 heads were obtained in 10 tosses. (Note: r is the probability of obtaining heads when tossing the same coin once.)
The probability for an unbiased coin (defined for this purpose as one whose probability of coming down heads is somewhere between 45% and 55%)
is small when compared with the alternative hypothesis (a biased coin). However, it is not small enough to cause us to believe that the coin has a significant bias. This probability is slightly higher than our presupposition of the probability that the coin was fair corresponding to the uniform prior distribution, which was 10%. Using a prior distribution that reflects our prior knowledge of what a coin is and how it acts, the posterior distribution would not favor the hypothesis of bias. However the number of trials in this example (10 tosses) is very small, and with more trials the choice of prior distribution would be somewhat less relevant.)
With the uniform prior, the posterior probability distribution f(r | H = 7,T = 3) achieves its peak at r = h / (h + t) = 0.7; this value is called the maximum a posteriori (MAP) estimate of r. Also with the uniform prior, the expected value of r under the posterior distribution is
The best estimator for the actual value is the estimator . This estimator has a margin of error (E) where at a particular confidence level. |
Using this approach, to decide the number of times the coin should be tossed, two parameters are required:
Z value | Confidence level | Comment |
---|---|---|
0.6745 | gives 50.000% level of confidence | Half |
1.0000 | gives 68.269% level of confidence | One std dev |
1.6449 | gives 90.000% level of confidence | "One nine" |
1.9599 | gives 95.000% level of confidence | 95 percent |
2.0000 | gives 95.450% level of confidence | Two std dev |
2.5759 | gives 99.000% level of confidence | "Two nines" |
3.0000 | gives 99.730% level of confidence | Three std dev |
3.2905 | gives 99.900% level of confidence | "Three nines" |
3.8906 | gives 99.990% level of confidence | "Four nines" |
4.0000 | gives 99.993% level of confidence | Four std dev |
4.4172 | gives 99.999% level of confidence | "Five nines" |
where n is the number of trials (which was denoted by N in the previous section).
This standard error function of p has a maximum at . Further, in the case of a coin being tossed, it is likely that p will be not far from 0.5, so it is reasonable to take p=0.5 in the following:
And hence the value of maximum error (E) is given by
Solving for the required number of coin tosses, n,
1. If a maximum error of 0.01 is desired, how many times should the coin be tossed?
2. If the coin is tossed 10000 times, what is the maximum error of the estimator on the value of (the actual probability of obtaining heads in a coin toss)?
3. The coin is tossed 12000 times with a result of 5961 heads (and 6039 tails). What interval does the value of (the true probability of obtaining heads) lie within if a confidence level of 99.999% is desired?
Now find the value of Z corresponding to 99.999% level of confidence.
Now calculate E
The interval which contains r is thus:
Other approaches to the question of checking whether a coin is fair are available using decision theory, whose application would require the formulation of a loss function or utility function which describes the consequences of making a given decision. An approach that avoids requiring either a loss function or a prior probability (as in the Bayesian approach) is that of "acceptance sampling". [2]
The above mathematical analysis for determining if a coin is fair can also be applied to other uses. For example:
In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the binomial test of statistical significance.
A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which
A likelihood function measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian inference uses a prior distribution to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".
In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variablesXi are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin. Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes ; this generalization is known as the Bernoulli scheme.
In probability theory and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.
In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.
A prior probability distribution of an uncertain quantity, often simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.
In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a k-sided die rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.
In Bayesian statistics, the Jeffreys prior is a non-informative prior distribution for a parameter space. Named after Sir Harold Jeffreys, its density function is proportional to the square root of the determinant of the Fisher information matrix:
The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association between the exposure and the outcome.
In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability when only the number of experiments and the number of successes are known.
Bootstrapping is a procedure for estimating the distribution of an estimator by resampling one's data or a model estimated from the data. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.
In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In theoretical studies, the assumption that a coin is fair is often made by referring to an ideal coin.
In statistics, additive smoothing, also called Laplace smoothing or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts from a -dimensional multinomial distribution with trials, a "smoothed" version of the counts gives the estimator
In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.