Fair coin

Last updated
A fair coin, when tossed, should have an equal chance of landing either side up Coin Toss (3635981474).jpg
A fair coin, when tossed, should have an equal chance of landing either side up

In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In theoretical studies, the assumption that a coin is fair is often made by referring to an ideal coin.

Contents

John Edmund Kerrich performed experiments in coin flipping and found that a coin made from a wooden disk about the size of a crown and coated on one side with lead landed heads (wooden side up) 679 times out of 1000. [1] In this experiment the coin was tossed by balancing it on the forefinger, flipping it using the thumb so that it spun through the air for about a foot before landing on a flat cloth spread over a table. Edwin Thompson Jaynes claimed that when a coin is caught in the hand, instead of being allowed to bounce, the physical bias in the coin is insignificant compared to the method of the toss, where with sufficient practice a coin can be made to land heads 100% of the time. [2] Exploring the problem of checking whether a coin is fair is a well-established pedagogical tool in teaching statistics.

Probability space definition

In probability theory, a fair coin is defined as a probability space , which is in turn defined by the sample space, event space, and probability measure. Using for heads and for tails, the sample space of a coin is defined as:

The event space for a coin includes all sets of outcomes from the sample space which can be assigned a probability, which is the full power set . Thus, the event space is defined as:

is the event where neither outcome happens (which is impossible and can therefore be assigned 0 probability), and is the event where either outcome happens, (which is guaranteed and can be assigned 1 probability). Because the coin is fair, the possibility of any single outcome is 50-50. The probability measure is then defined by the function:

00.50.51

So the full probability space which defines a fair coin is the triplet as defined above. Note that this is not a random variable because heads and tails do not have inherent numerical values like you might find on a fair two-valued die. A random variable adds the additional structure of assigning a numerical value to each outcome. Common choices are or .

Role in statistical teaching and theory

The probabilistic and statistical properties of coin-tossing games are often used as examples in both introductory and advanced text books and these are mainly based in assuming that a coin is fair or "ideal". For example, Feller uses this basis to introduce both the idea of random walks and to develop tests for homogeneity within a sequence of observations by looking at the properties of the runs of identical values within a sequence. [3] The latter leads on to a runs test. A time-series consisting of the result from tossing a fair coin is called a Bernoulli process.

Fair results from a biased coin

If a cheat has altered a coin to prefer one side over another (a biased coin), the coin can still be used for fair results by changing the game slightly. John von Neumann gave the following procedure: [4]

  1. Toss the coin twice.
  2. If the results match, start over, forgetting both results.
  3. If the results differ, use the first result, forgetting the second.

The reason this process produces a fair result is that the probability of getting heads and then tails must be the same as the probability of getting tails and then heads, as the coin is not changing its bias between flips and the two flips are independent. This works only if getting one result on a trial does not change the bias on subsequent trials, which is the case for most non-malleable coins (but not for processes such as the Pólya urn). By excluding the events of two heads and two tails by repeating the procedure, the coin flipper is left with the only two remaining outcomes having equivalent probability. This procedure only works if the tosses are paired properly; if part of a pair is reused in another pair, the fairness may be ruined. Also, the coin must not be so biased that one side has a probability of zero.

This method may be extended by also considering sequences of four tosses. That is, if the coin is flipped twice but the results match, and the coin is flipped twice again but the results match now for the opposite side, then the first result can be used. This is because HHTT and TTHH are equally likely. This can be extended to any multiple of 2.

The expected value of flips at the n game is not hard to calculate, first notice that in step 3 whatever the event or we have flipped the coin twice so but in step 2 ( or ) we also have to redo things so we will have 2 flips plus the expected value of flips of the next game that is but as we start over the expected value of the next game is the same as the value of the previous game or any other game so it does not really depend on n thus (this can be understood the process being a martingale where taking the expectation again get us that but because of the law of total expectation we get that ) hence we have:

Graph of
1
P
(
H
)
(
1
-
P
(
H
)
)
{\displaystyle {\frac {1}{P(H)(1-P(H))}}}
the further away
P
(
H
)
{\displaystyle P(H)}
is from
0.5
{\displaystyle 0.5}
the further expected number of flips before a successful result ExpectationVonNeumannCoin.gif
Graph of the further away is from the further expected number of flips before a successful result

The more biased our coin is, the more likely it is that we will have to perform a greater number of trials before a fair result.

A better algorithm when P(H) is known

Suppose that the bias is known. In this section, we provide a simple algorithm [5] that improves the expected number of coin tosses. The algorithm utilizes an ideal probability , which We first consider an algorithm to generate an arbitrary coin with bias . To get a fair coin, the algorithm first sets and then executes the following algorithm.

  1. Toss the biased coin, let be the result.
  2. If , use if the flip result is . Otherwise, replace to be and go back to step 1.
  3. Otherwise, , use if the flip result is . Otherwise, Set to be and go back to step 1.

Note that the above algorithm does not reach the optimal expected number of coin tosses, which is , here is the binary entropy function. There are algorithms that reaches this optimal value in expectation. However, those algorithms are more sophisticated than the one showed above.


The above algorithm has an expected number of biased coinflips being , which is exactly half comparing with von Neumann's trick.

Analysis

The correctness of the above algorithm is a perfect exercise of conditional expectation. We now analyze the expected number of coinflips.

Given the bias and the current value of , one can define a function that represents the expected number of coin tosses before a result is returned. The recurrence relation of can be described as follows.

This magically solves to the following function:

When , the expected number of coinflips is as desired.

Remark

The idea of this algorithm can be extended to generating any biased coin with a specified probability.

See also

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to , the entropy is

<span class="mw-page-title-main">Sample space</span> Set of all possible outcomes or results of a statistical trial or experiment

In probability theory, the sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually denoted using set notation, and the possible ordered outcomes, or sample points, are listed as elements in the set. It is common to refer to a sample space by the labels S, Ω, or U. The elements of a sample space may be numbers, words, letters, or symbols. They can also be finite, countably infinite, or uncountably infinite.

<span class="mw-page-title-main">Elementary event</span>

In probability theory, an elementary event, also called an atomic event or sample point, is an event which contains only a single outcome in the sample space. Using set theory terminology, an elementary event is a singleton. Elementary events and their corresponding outcomes are often written interchangeably for simplicity, as such an event corresponding to precisely one outcome.

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function is the probability of observing data assuming is the actual parameter.

<span class="mw-page-title-main">Bernoulli process</span> Random process of binary (boolean) random variables

In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variablesXi are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin. Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes ; this generalization is known as the Bernoulli scheme.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

<span class="mw-page-title-main">Bernoulli distribution</span> Probability distribution modeling a coin toss which need not be fair

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability and the value 0 with probability . Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads. In particular, unfair coins would have

In probability theory, an event is said to happen almost surely if it happens with probability 1. In other words, the set of outcomes on which the event does not occur has probability 0, even though the set might not be empty. The concept is analogous to the concept of "almost everywhere" in measure theory. In probability experiments on a finite sample space with a non-zero probability for each outcome, there is no difference between almost surely and surely ; however, this distinction becomes important when the sample space is an infinite set, because an infinite set can have non-empty subsets of probability 0.

In probability theory, coupling is a proof technique that allows one to compare two unrelated random variables (distributions) X and Y by creating a random vector W whose marginal distributions correspond to X and Y respectively. The choice of W is generally not unique, and the whole idea of "coupling" is about making such a choice so that X and Y can be related in a particularly desirable way.

Amplitude-shift keying (ASK) is a form of amplitude modulation that represents digital data as variations in the amplitude of a carrier wave. In an ASK system, a symbol, representing one or more bits, is sent by transmitting a fixed-amplitude carrier wave at a fixed frequency for a specific time duration. For example, if each symbol represents a single bit, then the carrier signal could be transmitted at nominal amplitude when the input value is 1, but transmitted at reduced amplitude or not at all when the input value is 0.

In statistics, the question of checking whether a coin is fair is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of statistical inference and, secondly, in providing a simple problem that can be used to compare various competing methods of statistical inference, including decision theory. The practical problem of checking whether a coin is fair might be considered as easily solved by performing a sufficiently large number of trials, but statistics and probability theory can provide guidance on two types of question; specifically those of how many trials to undertake and of the accuracy of an estimate of the probability of turning up heads, derived from a given sample of trials.

For supervised learning applications in machine learning and statistical learning theory, generalization error is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Because learning algorithms are evaluated on finite samples, the evaluation of a learning algorithm may be sensitive to sampling error. As a result, measurements of prediction error on the current data may not provide much information about predictive ability on new data. Generalization error can be minimized by avoiding overfitting in the learning algorithm. The performance of a machine learning algorithm is visualized by plots that show values of estimates of the generalization error through the learning process, which are called learning curves.

<span class="mw-page-title-main">Kelly criterion</span> Formula for bet sizing that maximizes the expected logarithmic value

In probability theory, the Kelly criterion is a formula for sizing a bet. The Kelly bet size is found by maximizing the expected value of the logarithm of wealth, which is equivalent to maximizing the expected geometric growth rate. Assuming that the expected returns are known, the Kelly criterion leads to higher wealth than any other strategy in the long run. J. L. Kelly Jr, a researcher at Bell Labs, described the criterion in 1956.

<span class="mw-page-title-main">Binary entropy function</span>

In information theory, the binary entropy function, denoted or , is defined as the entropy of a Bernoulli process with probability of one of two values. It is a special case of , the entropy function. Mathematically, the Bernoulli trial is modelled as a random variable that can take on only two values: 0 and 1, which are mutually exclusive and exhaustive.

The probability of the outcome of an experiment is never negative, although a quasiprobability distribution allows a negative probability, or quasiprobability for some events. These distributions may apply to unobservable events or conditional probabilities.

Algorithmic cooling is an algorithmic method for transferring heat from some qubits to others or outside the system and into the environment, which results in a cooling effect. This method uses regular quantum operations on ensembles of qubits, and it can be shown that it can succeed beyond Shannon's bound on data compression. The phenomenon is a result of the connection between thermodynamics and information theory.

Consider two remote players, connected by a channel, that don't trust each other. The problem of them agreeing on a random bit by exchanging messages over this channel, without relying on any trusted third party, is called the coin flipping problem in cryptography. Quantum coin flipping uses the principles of quantum mechanics to encrypt messages for secure communication. It is a cryptographic primitive which can be used to construct more complex and useful cryptographic protocols, e.g. Quantum Byzantine agreement.

In information theory, the Bretagnolle–Huber inequality bounds the total variation distance between two probability distributions and by a concave and bounded function of the Kullback–Leibler divergence . The bound can be viewed as an alternative to the well-known Pinsker's inequality: when is large, Pinsker's inequality is vacuous, while Bretagnolle–Huber remains bounded and hence non-vacuous. It is used in statistics and machine learning to prove information-theoretic lower bounds relying on hypothesis testing

References

  1. Kerrich, John Edmund (1946). An experimental introduction to the theory of probability . E. Munksgaard.
  2. Jaynes, E.T. (2003). Probability Theory: The Logic of Science. Cambridge, UK: Cambridge University Press. p. 318. ISBN   9780521592710. Archived from the original on 2002-02-05. anyone familiar with the law of conservation of angular momentum can, after some practice, cheat at the usual coin-toss game and call his shots with 100 per cent accuracy. You can obtain any frequency of heads you want; and the bias of the coin has no influence at all on the results!{{cite book}}: CS1 maint: bot: original URL status unknown (link)
  3. Feller, W (1968). An Introduction to Probability Theory and Its Applications. Wiley. ISBN   978-0-471-25708-0.
  4. von Neumann, John (1951). "Various techniques used in connection with random digits". National Bureau of Standards Applied Math Series. 12: 36.
  5. Henry Tsai, 2024 April 12.

Further reading