Benford's law, also known as the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation that in many real-life sets of numerical data, the leading digit is likely to be small. [1] In sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. Uniformly distributed digits would each occur about 11.1% of the time. [2] Benford's law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.
The graph to the right shows Benford's law for base 10, one of infinitely many cases of a generalized law regarding numbers expressed in arbitrary (integer) bases, which rules out the possibility that the phenomenon might be an artifact of the base-10 number system. Further generalizations published in 1995 [3] included analogous statements for both the nth leading digit and the joint distribution of the leading n digits, the latter of which leads to a corollary wherein the significant digits are shown to be a statistically dependent quantity.
It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, house prices, population numbers, death rates, lengths of rivers, and physical and mathematical constants. [4] Like other general principles about natural data—for example, the fact that many data sets are well approximated by a normal distribution—there are illustrative examples and explanations that cover many of the cases where Benford's law applies, though there are many other cases where Benford's law applies that resist simple explanations. [5] [6] Benford's law tends to be most accurate when values are distributed across multiple orders of magnitude, especially if the process generating the numbers is described by a power law (which is common in nature).
The law is named after physicist Frank Benford, who stated it in 1938 in an article titled "The Law of Anomalous Numbers", [7] although it had been previously stated by Simon Newcomb in 1881. [8] [9]
The law is similar in concept, though not identical in distribution, to Zipf's law.
A set of numbers is said to satisfy Benford's law if the leading digit d (d ∈ {1, ..., 9}) occurs with probability [10]
The leading digits in such a set thus have the following distribution:
d | | Relative size of |
---|---|---|
1 | 30.1% | |
2 | 17.6% | |
3 | 12.5% | |
4 | 9.7% | |
5 | 7.9% | |
6 | 6.7% | |
7 | 5.8% | |
8 | 5.1% | |
9 | 4.6% |
The quantity is proportional to the space between d and d + 1 on a logarithmic scale. Therefore, this is the distribution expected if the logarithms of the numbers (but not the numbers themselves) are uniformly and randomly distributed.
For example, a number x, constrained to lie between 1 and 10, starts with the digit 1 if 1 ≤ x < 2, and starts with the digit 9 if 9 ≤ x < 10. Therefore, x starts with the digit 1 if log 1 ≤ log x < log 2, or starts with 9 if log 9 ≤ log x < log 10. The interval [log 1, log 2] is much wider than the interval [log 9, log 10] (0.30 and 0.05 respectively); therefore if log x is uniformly and randomly distributed, it is much more likely to fall into the wider interval than the narrower interval, i.e. more likely to start with 1 than with 9; the probabilities are proportional to the interval widths, giving the equation above (as well as the generalization to other bases besides decimal).
Benford's law is sometimes stated in a stronger form, asserting that the fractional part of the logarithm of data is typically close to uniformly distributed between 0 and 1; from this, the main claim about the distribution of first digits can be derived. [5]
An extension of Benford's law predicts the distribution of first digits in other bases besides decimal; in fact, any base b ≥ 2. The general form is [12]
For b = 2, 1 (the binary and unary) number systems, Benford's law is true but trivial: All binary and unary numbers (except for 0 or the empty set) start with the digit 1. (On the other hand, the generalization of Benford's law to second and later digits is not trivial, even for binary numbers. [13] )
Examining a list of the heights of the 58 tallest structures in the world by category shows that 1 is by far the most common leading digit, irrespective of the unit of measurement (see "scale invariance" below):
Leading digit | m | ft | Per Benford's law | ||
---|---|---|---|---|---|
Count | Share | Count | Share | ||
1 | 23 | 39.7 % | 15 | 25.9 % | 30.1 % |
2 | 12 | 20.7 % | 8 | 13.8 % | 17.6 % |
3 | 6 | 10.3 % | 5 | 8.6 % | 12.5 % |
4 | 5 | 8.6 % | 7 | 12.1 % | 9.7 % |
5 | 2 | 3.4 % | 9 | 15.5 % | 7.9 % |
6 | 5 | 8.6 % | 4 | 6.9 % | 6.7 % |
7 | 1 | 1.7 % | 3 | 5.2 % | 5.8 % |
8 | 4 | 6.9 % | 6 | 10.3 % | 5.1 % |
9 | 0 | 0 % | 1 | 1.7 % | 4.6 % |
Another example is the leading digit of 2n. The sequence of the first 96 leading digits (1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, ... (sequence A008952 in the OEIS )) exhibits closer adherence to Benford’s law than is expected for random sequences of the same length, because it is derived from a geometric sequence. [14]
Leading digit | Occurrence | Per Benford's law | |
---|---|---|---|
Count | Share | ||
1 | 29 | 30.2 % | 30.1 % |
2 | 17 | 17.7 % | 17.6 % |
3 | 12 | 12.5 % | 12.5 % |
4 | 10 | 10.4 % | 9.7 % |
5 | 7 | 7.3 % | 7.9 % |
6 | 6 | 6.3 % | 6.7 % |
7 | 5 | 5.2 % | 5.8 % |
8 | 5 | 5.2 % | 5.1 % |
9 | 5 | 5.2 % | 4.6 % |
The discovery of Benford's law goes back to 1881, when the Canadian-American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages (that started with 1) were much more worn than the other pages. [8] Newcomb's published result is the first known instance of this observation and includes a distribution on the second digit as well. Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log(N + 1) − log(N).
The phenomenon was again noted in 1938 by the physicist Frank Benford, [7] who tested it on data from 20 different domains and was credited for it. His data set included the surface areas of 335 rivers, the sizes of 3259 US populations, 104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of Reader's Digest , the street addresses of the first 342 persons listed in American Men of Science and 418 death rates. The total number of observations used in the paper was 20,229. This discovery was later named after Benford (making it an example of Stigler's law).
In 1995, Ted Hill proved the result about mixed distributions mentioned below. [15] [16]
Benford's law tends to apply most accurately to data that span several orders of magnitude. As a rule of thumb, the more orders of magnitude that the data evenly covers, the more accurately Benford's law applies. For instance, one can expect that Benford's law would apply to a list of numbers representing the populations of United Kingdom settlements. But if a "settlement" is defined as a village with population between 300 and 999, then Benford's law will not apply. [17] [18]
Consider the probability distributions shown below, referenced to a log scale. In each case, the total area in red is the relative probability that the first digit is 1, and the total area in blue is the relative probability that the first digit is 8. For the first distribution, the size of the areas of red and blue are approximately proportional to the widths of each red and blue bar. Therefore, the numbers drawn from this distribution will approximately follow Benford's law. On the other hand, for the second distribution, the ratio of the areas of red and blue is very different from the ratio of the widths of each red and blue bar. Rather, the relative areas of red and blue are determined more by the heights of the bars than the widths. Accordingly, the first digits in this distribution do not satisfy Benford's law at all. [18]
Thus, real-world distributions that span several orders of magnitude rather uniformly (e.g., stock-market prices and populations of villages, towns, and cities) are likely to satisfy Benford's law very accurately. On the other hand, a distribution mostly or entirely within one order of magnitude (e.g., IQ scores or heights of human adults) is unlikely to satisfy Benford's law very accurately, if at all. [17] [18] However, the difference between applicable and inapplicable regimes is not a sharp cut-off: as the distribution gets narrower, the deviations from Benford's law increase gradually.
(This discussion is not a full explanation of Benford's law, because it has not explained why data sets are so often encountered that, when plotted as a probability distribution of the logarithm of the variable, are relatively uniform over several orders of magnitude. [19] )
In 1970 Wolfgang Krieger proved what is now called the Krieger generator theorem. [20] [21] The Krieger generator theorem might be viewed as a justification for the assumption in the Kafri ball-and-box model that, in a given base with a fixed number of digits 0, 1, ..., n, ..., , digit n is equivalent to a Kafri box containing n non-interacting balls. Other scientists and statisticians have suggested entropy-related explanations[ which? ] for Benford's law. [22] [23] [10] [24]
Many real-world examples of Benford's law arise from multiplicative fluctuations. [25] For example, if a stock price starts at $100, and then each day it gets multiplied by a randomly chosen factor between 0.99 and 1.01, then over an extended period the probability distribution of its price satisfies Benford's law with higher and higher accuracy.
The reason is that the logarithm of the stock price is undergoing a random walk, so over time its probability distribution will get more and more broad and smooth (see above). [25] (More technically, the central limit theorem says that multiplying more and more random variables will create a log-normal distribution with larger and larger variance, so eventually it covers many orders of magnitude almost uniformly.) To be sure of approximate agreement with Benford's law, the distribution has to be approximately invariant when scaled up by any factor up to 10; a log-normally distributed data set with wide dispersion would have this approximate property.
Unlike multiplicative fluctuations, additive fluctuations do not lead to Benford's law: They lead instead to normal probability distributions (again by the central limit theorem), which do not satisfy Benford's law. By contrast, that hypothetical stock price described above can be written as the product of many random variables (i.e. the price change factor for each day), so is likely to follow Benford's law quite well.
Anton Formann provided an alternative explanation by directing attention to the interrelation between the distribution of the significant digits and the distribution of the observed variable. He showed in a simulation study that long-right-tailed distributions of a random variable are compatible with the Newcomb–Benford law, and that for distributions of the ratio of two random variables the fit generally improves. [26] For numbers drawn from certain distributions (IQ scores, human heights) the Benford's law fails to hold because these variates obey a normal distribution, which is known not to satisfy Benford's law, [9] since normal distributions can't span several orders of magnitude and the Significand of their logarithms will not be (even approximately) uniformly distributed. However, if one "mixes" numbers from those distributions, for example, by taking numbers from newspaper articles, Benford's law reappears. This can also be proven mathematically: if one repeatedly "randomly" chooses a probability distribution (from an uncorrelated set) and then randomly chooses a number according to that distribution, the resulting list of numbers will obey Benford's law. [15] [27] A similar probabilistic explanation for the appearance of Benford's law in everyday-life numbers has been advanced by showing that it arises naturally when one considers mixtures of uniform distributions. [28]
In a list of lengths, the distribution of first digits of numbers in the list may be generally similar regardless of whether all the lengths are expressed in metres, yards, feet, inches, etc. The same applies to monetary units.
This is not always the case. For example, the height of adult humans almost always starts with a 1 or 2 when measured in metres and almost always starts with 4, 5, 6, or 7 when measured in feet. But in a list of lengths spread evenly over many orders of magnitude—for example, a list of 1000 lengths mentioned in scientific papers that includes the measurements of molecules, bacteria, plants, and galaxies—it is reasonable to expect the distribution of first digits to be the same no matter whether the lengths are written in metres or in feet.
When the distribution of the first digits of a data set is scale-invariant (independent of the units that the data are expressed in), it is always given by Benford's law. [29] [30]
For example, the first (non-zero) digit on the aforementioned list of lengths should have the same distribution whether the unit of measurement is feet or yards. But there are three feet in a yard, so the probability that the first digit of a length in yards is 1 must be the same as the probability that the first digit of a length in feet is 3, 4, or 5; similarly, the probability that the first digit of a length in yards is 2 must be the same as the probability that the first digit of a length in feet is 6, 7, or 8. Applying this to all possible measurement scales gives the logarithmic distribution of Benford's law.
Benford's law for first digits is base invariant for number systems. There are conditions and proofs of sum invariance, inverse invariance, and addition and subtraction invariance. [31] [32]
In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who fabricate figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's law ought to show up any anomalous results. [33]
In the United States, evidence based on Benford's law has been admitted in criminal cases at the federal, state, and local levels. [34]
Walter Mebane, a political scientist and statistician at the University of Michigan, was the first to apply the second-digit Benford's law-test (2BL-test) in election forensics. [35] Such analysis is considered a simple, though not foolproof, method of identifying irregularities in election results. [36] Scientific consensus to support the applicability of Benford's law to elections has not been reached in the literature. A 2011 study by the political scientists Joseph Deckert, Mikhail Myagkov, and Peter C. Ordeshook argued that Benford's law is problematic and misleading as a statistical indicator of election fraud. [37] Their method was criticized by Mebane in a response, though he agreed that there are many caveats to the application of Benford's law to election data. [38]
Benford's law has been used as evidence of fraud in the 2009 Iranian elections. [39] An analysis by Mebane found that the second digits in vote counts for President Mahmoud Ahmadinejad, the winner of the election, tended to differ significantly from the expectations of Benford's law, and that the ballot boxes with very few invalid ballots had a greater influence on the results, suggesting widespread ballot stuffing. [40] Another study used bootstrap simulations to find that the candidate Mehdi Karroubi received almost twice as many vote counts beginning with the digit 7 as would be expected according to Benford's law, [41] while an analysis from Columbia University concluded that the probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies as found in the 2009 Iranian presidential election is less than 0.5 percent. [42] Benford's law has also been applied for forensic auditing and fraud detection on data from the 2003 California gubernatorial election, [43] the 2000 and 2004 United States presidential elections, [44] and the 2009 German federal election; [45] the Benford's Law Test was found to be "worth taking seriously as a statistical test for fraud," although "is not sensitive to distortions we know significantly affected many votes." [44] [ further explanation needed ]
Benford's law has also been misapplied to claim election fraud. When applying the law to Joe Biden's election returns for Chicago, Milwaukee, and other localities in the 2020 United States presidential election, the distribution of the first digit did not follow Benford's law. The misapplication was a result of looking at data that was tightly bound in range, which violates the assumption inherent in Benford's law that the range of the data be large. The first digit test was applied to precinct-level data, but because precincts rarely receive more than a few thousand votes or fewer than several dozen, Benford's law cannot be expected to apply. According to Mebane, "It is widely understood that the first digits of precinct vote counts are not useful for trying to diagnose election frauds." [46] [47]
Similarly, the macroeconomic data the Greek government reported to the European Union before entering the eurozone was shown to be probably fraudulent using Benford's law, albeit years after the country joined. [48] [49]
Researchers have used Benford's law to detect psychological pricing patterns, in a Europe-wide study in consumer product prices before and after euro was introduced in 2002. [50] The idea was that, without psychological pricing, the first two or three digits of price of items should follow Benford's law. Consequently, if the distribution of digits deviates from Benford's law (such as having a lot of 9's), it means merchants may have used psychological pricing.
When the euro replaced local currencies in 2002, for a brief period of time, the price of goods in euro was simply converted from the price of goods in local currencies before the replacement. As it is essentially impossible to use psychological pricing simultaneously on both price-in-euro and price-in-local-currency, during the transition period, psychological pricing would be disrupted even if it used to be present. It can only be re-established once consumers have gotten used to prices in a single currency again, this time in euro.
As the researchers expected, the distribution of first price digit followed Benford's law, but the distribution of the second and third digits deviated significantly from Benford's law before the introduction, then deviated less during the introduction, then deviated more again after the introduction.
The number of open reading frames and their relationship to genome size differs between eukaryotes and prokaryotes with the former showing a log-linear relationship and the latter a linear relationship. Benford's law has been used to test this observation with an excellent fit to the data in both cases. [51]
A test of regression coefficients in published papers showed agreement with Benford's law. [52] As a comparison group subjects were asked to fabricate statistical estimates. The fabricated results conformed to Benford's law on first digits, but failed to obey Benford's law on second digits.
Testing the number of published scientific papers of all registered researchers in Slovenia's national database was shown to strongly conform to Benford's law. [53] Moreover, the authors were grouped by scientific field, and tests indicate natural sciences exhibit greater conformity than social sciences.
Although the chi-squared test has been used to test for compliance with Benford's law it has low statistical power when used with small samples.
The Kolmogorov–Smirnov test and the Kuiper test are more powerful when the sample size is small, particularly when Stephens's corrective factor is used. [54] These tests may be unduly conservative when applied to discrete distributions. Values for the Benford test have been generated by Morrow. [55] The critical values of the test statistics are shown below:
⍺ Test | 0.10 | 0.05 | 0.01 |
---|---|---|---|
Kuiper | 1.191 | 1.321 | 1.579 |
Kolmogorov–Smirnov | 1.012 | 1.148 | 1.420 |
These critical values provide the minimum test statistic values required to reject the hypothesis of compliance with Benford's law at the given significance levels.
Two alternative tests specific to this law have been published: First, the max (m) statistic [56] is given by
The leading factor does not appear in the original formula by Leemis; [56] it was added by Morrow in a later paper. [55]
Secondly, the distance (d) statistic [57] is given by
where FSD is the first significant digit and N is the sample size. Morrow has determined the critical values for both these statistics, which are shown below: [55]
⍺ Statistic | 0.10 | 0.05 | 0.01 |
---|---|---|---|
Leemis's m | 0.851 | 0.967 | 1.212 |
Cho & Gaines's d | 1.212 | 1.330 | 1.569 |
Morrow has also shown that for any random variable X (with a continuous PDF) divided by its standard deviation (σ), some value A can be found so that the probability of the distribution of the first significant digit of the random variable will differ from Benford's law by less than ε > 0. [55] The value of A depends on the value of ε and the distribution of the random variable.
A method of accounting fraud detection based on bootstrapping and regression has been proposed. [58]
If the goal is to conclude agreement with the Benford's law rather than disagreement, then the goodness-of-fit tests mentioned above are inappropriate. In this case the specific tests for equivalence should be applied. An empirical distribution is called equivalent to the Benford's law if a distance (for example total variation distance or the usual Euclidean distance) between the probability mass functions is sufficiently small. This method of testing with application to Benford's law is described in Ostrovski. [59]
Some well-known infinite integer sequences provably satisfy Benford's law exactly (in the asymptotic limit as more and more terms of the sequence are included). Among these are the Fibonacci numbers, [60] [61] the factorials, [62] the powers of 2, [63] [14] and the powers of almost any other number. [63]
Likewise, some continuous processes satisfy Benford's law exactly (in the asymptotic limit as the process continues through time). One is an exponential growth or decay process: If a quantity is exponentially increasing or decreasing in time, then the percentage of time that it has each first digit satisfies Benford's law asymptotically (i.e. increasing accuracy as the process continues through time).
The square roots and reciprocals of successive natural numbers do not obey this law. [64] Prime numbers in a finite range follow a Generalized Benford’s law, that approaches uniformity as the size of the range approaches infinity. [65] Lists of local telephone numbers violate Benford's law. [66] Benford's law is violated by the populations of all places with a population of at least 2500 individuals from five US states according to the 1960 and 1970 censuses, where only 19 % began with digit 1 but 20 % began with digit 2, because truncation at 2500 introduces statistical bias. [64] The terminal digits in pathology reports violate Benford's law due to rounding. [67]
Distributions that do not span several orders of magnitude will not follow Benford's law. Examples include height, weight, and IQ scores. [9] [68]
A number of criteria, applicable particularly to accounting data, have been suggested where Benford's law can be expected to apply. [69]
Mathematically, Benford’s law applies if the distribution being tested fits the "Benford’s law compliance theorem". [17] The derivation says that Benford's law is followed if the Fourier transform of the logarithm of the probability density function is zero for all integer values. Most notably, this is satisfied if the Fourier transform is zero (or negligible) for n ≥ 1. This is satisfied if the distribution is wide (since wide distribution implies a narrow Fourier transform). Smith summarizes thus (p. 716):
Benford's law is followed by distributions that are wide compared with unit distance along the logarithmic scale. Likewise, the law is not followed by distributions that are narrow compared with unit distance … If the distribution is wide compared with unit distance on the log axis, it means that the spread in the set of numbers being examined is much greater than ten.
In short, Benford’s law requires that the numbers in the distribution being measured have a spread across at least an order of magnitude.
Benford's law was empirically tested against the numbers (up to the 10th digit) generated by a number of important distributions, including the uniform distribution, the exponential distribution, the normal distribution, and others. [9]
The uniform distribution, as might be expected, does not obey Benford's law. In contrast, the ratio distribution of two uniform distributions is well-described by Benford's law.
Neither the normal distribution nor the ratio distribution of two normal distributions (the Cauchy distribution) obey Benford's law. Although the half-normal distribution does not obey Benford's law, the ratio distribution of two half-normal distributions does. Neither the right-truncated normal distribution nor the ratio distribution of two right-truncated normal distributions are well described by Benford's law. This is not surprising as this distribution is weighted towards larger numbers.
Benford's law also describes the exponential distribution and the ratio distribution of two exponential distributions well. The fit of chi-squared distribution depends on the degrees of freedom (df) with good agreement with df = 1 and decreasing agreement as the df increases. The F-distribution is fitted well for low degrees of freedom. With increasing dfs the fit decreases but much more slowly than the chi-squared distribution. The fit of the log-normal distribution depends on the mean and the variance of the distribution. The variance has a much greater effect on the fit than does the mean. Larger values of both parameters result in better agreement with the law. The ratio of two log normal distributions is a log normal so this distribution was not examined.
Other distributions that have been examined include the Muth distribution, Gompertz distribution, Weibull distribution, gamma distribution, log-logistic distribution and the exponential power distribution all of which show reasonable agreement with the law. [56] [70] The Gumbel distribution – a density increases with increasing value of the random variable – does not show agreement with this law. [70]
It is possible to extend the law to digits beyond the first. [71] In particular, for any given number of digits, the probability of encountering a number starting with the string of digits n of that length – discarding leading zeros – is given by
Thus, the probability that a number starts with the digits 3, 1, 4 (some examples are 3.14, 3.142, π, 314280.7, and 0.00314005) is log10(1 + 1/314) ≈ 0.00138, as in the box with the log-log graph on the right.
This result can be used to find the probability that a particular digit occurs at a given position within a number. For instance, the probability that a "2" is encountered as the second digit is [71]
And the probability that d (d = 0, 1, ..., 9) is encountered as the n-th (n > 1) digit is
The distribution of the n-th digit, as n increases, rapidly approaches a uniform distribution with 10% for each of the ten digits, as shown below. [71] Four digits is often enough to assume a uniform distribution of 10% as "0" appears 10.0176% of the time in the fourth digit, while "9" appears 9.9824% of the time.
Digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
1st | — | 30.1% | 17.6% | 12.5% | 9.7% | 7.9% | 6.7% | 5.8% | 5.1% | 4.6% |
2nd | 12.0% | 11.4% | 10.9% | 10.4% | 10.0% | 9.7% | 9.3% | 9.0% | 8.8% | 8.5% |
3rd | 10.2% | 10.1% | 10.1% | 10.1% | 10.0% | 10.0% | 9.9% | 9.9% | 9.9% | 9.8% |
Average and moments of random variables for the digits 1 to 9 following this law have been calculated: [72]
For the two-digit distribution according to Benford's law these values are also known: [73]
A table of the exact probabilities for the joint occurrence of the first two digits according to Benford's law is available, [73] as is the population correlation between the first and second digits: [73] ρ = 0.0561.
Benford's law has appeared as a plot device in some twenty-first century popular entertainment.
In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable , which takes values in the set and is distributed according to , the entropy is where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event.
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.
In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to the change raised to a constant exponent: one quantity varies as a power of another. The change is independent of the initial size of those quantities.
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.
Zipf's law is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the n th entry is often approximately inversely proportional to n .
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. The name comes from the Monte Carlo Casino in Monaco, where the primary developer of the method, mathematician Stanisław Ulam, was inspired by his uncle's gambling habits.
In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data.
Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.
Intuitively, an algorithmically random sequence is a sequence of binary digits that appears random to any algorithm running on a universal Turing machine. The notion can be applied analogously to sequences on any finite alphabet. Random sequences are key objects of study in algorithmic information theory.
In number theory, the Erdős–Kac theorem, named after Paul Erdős and Mark Kac, and also known as the fundamental theorem of probabilistic number theory, states that if ω(n) is the number of distinct prime factors of n, then, loosely speaking, the probability distribution of
Computational statistics, or statistical computing, is the study which is the intersection of statistics and computer science, and refers to the statistical methods that are enabled by using computational methods. It is the area of computational science specific to the mathematical science of statistics. This area is fast developing. The view that the broader concept of computing must be taught as part of general statistical education is gaining momentum.
In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual random events are, by definition, unpredictable, but if there is a known probability distribution, the frequency of different outcomes over repeated events is predictable. For example, when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will tend to occur twice as often as 4. In this view, randomness is not haphazardness; it is a measure of uncertainty of an outcome. Randomness applies to concepts of chance, probability, and information entropy.
The 2009 Iranian presidential election was characterized by huge candidate rallies in Iranian cities, and very high turnout reported to be over 80 percent. Iran holds a run-off election when no candidate receives a majority of votes, and this would have been held on 19 June 2009. At the closing of election polls, both leading candidates, Mahmoud Ahmadinejad and Mir-Hossein Mousavi, claimed victory, with both candidates telling the press that their sources have them at 58–60% of the total vote. Early reports had claimed a turnout of 32 million votes cast although the actual figure could not be determined until all of the votes were counted. Mousavi warned the Iranian people of possible vote fraud.
Taylor's power law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship. It is named after the ecologist who first proposed it in 1961, Lionel Roy Taylor (1924–2007). Taylor's original name for this relationship was the law of the mean. The name Taylor's law was coined by Southwood in 1966.
Election forensics are methods used to determine if election results are statistically normal or statistically abnormal, which can indicate electoral fraud. It uses statistical tools to determine if observed election results differ from normally occurring patterns. These tools can be relatively simple, such as looking at the frequency of integers and using 2nd Digit Benford's law, or can be more complex and involve machine learning techniques.
{{cite book}}
: CS1 maint: location missing publisher (link)