Galton board

Last updated
Galton box Galton box.jpg
Galton box
A Galton box demonstrated

The Galton board, also known as the Galton box or quincunx or bean machine, is a device invented by Francis Galton [1] to demonstrate the central limit theorem, in particular that with sufficient sample size the binomial distribution approximates a normal distribution. Among its applications, it afforded insight into regression to the mean or "reversion to mediocrity".

Contents

Description

The Galton board consists of a vertical board with interleaved rows of pegs. Beads are dropped from the top and, when the device is level, bounce either left or right as they hit the pegs. Eventually they are collected into bins at the bottom, where the height of bead columns accumulated in the bins approximate a bell curve. Overlaying Pascal's triangle onto the pins shows the number of different paths that can be taken to get to each bin. [2]

Large-scale working models of this device created by Charles and Ray Eames can be seen in the Mathematica: A World of Numbers... and Beyond exhibits permanently on view at the Boston Museum of Science, the New York Hall of Science, or the Henry Ford Museum. [3] The Ford Museum machine was displayed at the IBM Pavilion during 1964-65 New York World's Fair, later appearing at Pacific Science Center in Seattle. [4] [5] Another large-scale version is displayed in the lobby of Index Fund Advisors in Irvine, California. [6]

Boards can be constructed for other distributions by changing the shape of the pins or biasing them towards one direction, and even bimodal boards are possible. [7] A board for the log-normal distribution (common in many natural processes, particularly biological ones), which uses isosceles triangles of varying widths to 'multiply' the distance the bead travels instead of fixed sizes steps which would 'sum', was constructed by Jacobus Kapteyn while studying and popularizing the statistics of the log-normal in order to help visualize it and demonstrate its plausibility. [8] As of 1963, it was preserved in the University of Groningen. [9] There is also an improved log-normal machine that uses skewed triangles whose right sides are longer, and thus avoiding shifting the median of the beads to the left. [10]

Distribution of the beads

If a bead bounces to the right k times on its way down (and to the left on the remaining pegs) it ends up in the kth bin counting from the left. Denoting the number of rows of pegs in a Galton Board by n, the number of paths to the kth bin on the bottom is given by the binomial coefficient . Note that the leftmost bin is the 0-bin, next to it is the 1-bin, etc. and the furthest one to the right is the n-bin - making thus the total number of bins equal to n+1 (each row does not need to have more pegs than the number that identifies the row itself, e.g. the first row has 1 peg, the second 2 pegs, until the n-th row that has n pegs which correspond to the n+1 bins). If the probability of bouncing right on a peg is p (which equals 0.5 on an unbiased level machine) the probability that the ball ends up in the kth bin equals . This is the probability mass function of a binomial distribution. The number of rows correspond to the size of a binomial distribution in number of trials, while the probability p of each pin is the binomial's p.

According to the central limit theorem (more specifically, the de Moivre–Laplace theorem), the binomial distribution approximates the normal distribution provided that the number of rows and the number of balls are both large. Varying the rows will result in different standard deviations or widths of the bell-shaped curve or the normal distribution in the bins.

Another interpretation more accurate from the physical view is given by the Entropy: since the energy that is carried by every falling bead is finite, so even that on any tip their collisions are chaotic because the derivative is undefined (there is no way to previously figure out for which side is going to fall), the mean and variance of each bean is restricted to be finite (they will never bound out of the box), and the Gaussian shape arises because it is the maximum entropy probability distribution for a continuous process with defined mean and variance. The rise of the normal distribution could be interpreted as that all possible information carried by each bean related to which path it has travelled has been already completely lost through their downhill collisions.

Examples

History

The quincunx, as drawn by Francis Galton Quincunx (Galton Box) - Galton 1889 diagram.png
The quincunx, as drawn by Francis Galton

Francis Galton wrote in 1889 his book Natural Inheritance:

Order in Apparent Chaos: I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the Law of Frequency of Error. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. [1] :66

Games

Several games have been developed using the idea of pins changing the route of balls or other objects:

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

A histogram is a visual representation of the distribution of quantitative data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) are adjacent and are typically of equal size.

In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes a particular aspect of a probability distribution. There are different ways to quantify kurtosis for a theoretical distribution, and there are corresponding ways of estimating it using a sample from a population. Different measures of kurtosis may have different interpretations.

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Skewness</span> Measure of the asymmetry of random variables

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Chi-squared test</span> Statistical hypothesis test

A chi-squared test is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables are independent in influencing the test statistic. The test is valid when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants thereof. Pearson's chi-squared test is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. For contingency tables with smaller sample sizes, a Fisher's exact test is used instead.

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability when only the number of experiments and the number of successes are known.

<i>Mathematica: A World of Numbers... and Beyond</i> Museum exhibit about mathematics

Mathematica: A World of Numbers... and Beyond is a kinetic and static exhibition of mathematical concepts designed by Charles and Ray Eames, originally debuted at the California Museum of Science and Industry in 1961. Duplicates have since been made, and they have been moved to other institutions.

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

<span class="mw-page-title-main">Data transformation (statistics)</span> Application of a function to each point in a data set

In statistics, data transformation is the application of a deterministic mathematical function to each point in a data set—that is, each data point zi is replaced with the transformed value yi = f(zi), where f is a function. Transforms are usually applied so that the data appear to more closely meet the assumptions of a statistical inference procedure that is to be applied, or to improve the interpretability or appearance of graphs.

In probability theory, a probability distribution is infinitely divisible if it can be expressed as the probability distribution of the sum of an arbitrary number of independent and identically distributed (i.i.d.) random variables. The characteristic function of any infinitely divisible distribution is then called an infinitely divisible characteristic function.

<span class="mw-page-title-main">Relationships among probability distributions</span> Topic in probability theory and statistics

In probability theory and statistics, there are several relationships among probability distributions. These relations can be categorized in the following groups:

The generalized normal distribution or generalized Gaussian distribution (GGD) is either of two families of parametric continuous probability distributions on the real line. Both families add a shape parameter to the normal distribution. To distinguish the two families, they are referred to below as "symmetric" and "asymmetric"; however, this is not a standard nomenclature.

In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively. Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. The aim of distribution fitting is to predict the probability or to forecast the frequency of occurrence of the magnitude of the phenomenon in a certain interval.

In probability theory, the matrix-exponential distribution is an absolutely continuous distribution with rational Laplace–Stieltjes transform. They were first introduced by David Cox in 1955 as distributions with rational Laplace–Stieltjes transforms.


Sturges's rule is a method to choose the number of bins for a histogram. Given observations, Sturges's rule suggests using

References

  1. 1 2 Galton, Sir Francis (1894). Natural Inheritance. Macmillan. ISBN   978-1297895982
  2. "The Galton Board". www.galtonboard.com. Four Pines Publishing, Inc. Retrieved 2018-03-06.
  3. "Henry Ford museum acquires Eames' Mathematica exhibit". Auction Central News. LiveAuctioneers. 20 March 2015. Retrieved 2018-03-06.
  4. "Pavilions & Attractions - IBM - Page Six". New York World's Fair. Retrieved 22 December 2011.
  5. "Mathematica Exhibition from the Office of Charles and Ray Eames Opens inside Henry Ford Museum of American Innovation, Sept. 23" (press release). Henry Ford Museum of American Innovation. September 21, 2017.
  6. Archived at Ghostarchive and the Wayback Machine : "IFA.tv - From Chaos to Order on the Galton Board -- A Random Walker". YouTube . 23 December 2009. Retrieved 2018-03-06.
  7. Brehmer et al 2018, "Mining gold from implicit models to improve likelihood-free inference": "Simulator Mining Example"
  8. Kapteyn 1903, Skew frequency curves in biology and statistics v1; Kapteyn & van Uven 1916, Skew frequency curves in biology and statistics v2
  9. Aitchison & Brown 1963, The Lognormal Distribution, with Special Reference to its Uses in Economics Archived 2019-08-02 at the Wayback Machine
  10. Limpert et al 2001, "Log-normal Distributions across the Sciences: Keys and Clues"