King effect

Last updated
Rank-ordering of the population of countries follows a stretched exponential distribution except in the cases of the two "kings": China and India. Rank order countries.png
Rank-ordering of the population of countries follows a stretched exponential distribution except in the cases of the two "kings": China and India.

In statistics, economics, and econophysics, the king effect is the phenomenon in which the top one or two members of a ranked set show up as clear outliers. These top one or two members are unexpectedly large because they do not conform to the statistical distribution or rank-distribution which the remainder of the set obeys.

Distributions typically followed include the power-law distribution, [2] that is a basis for the stretched exponential function, [1] [3] and parabolic fractal distribution. The King effect has been observed in the distribution of:

Note, however, that the king effect is not limited to outliers with a positive evaluation attached to their rank: for rankings on an undesirable attribute, there may exist a pauper effect, with a similar detachment of extremely ranked data points from the reasonably distributed portion of the data set.

See also

Related Research Articles

<span class="mw-page-title-main">Interquartile range</span> Measure of statistical dispersion

In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference between the 75th and 25th percentiles of the data. To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation. These quartiles are denoted by Q1 (also called the lower quartile), Q2 (the median), and Q3 (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = Q3 − Q1.

<span class="mw-page-title-main">Power law</span> Functional relationship between two quantities

In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to a power of the change, independent of the initial size of those quantities: one quantity varies as a power of another. For instance, considering the area of a square in terms of the length of its side, if the length is doubled, the area is multiplied by a factor of four.

In statistics, a quartile is a type of quantile which divides the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows:

<span class="mw-page-title-main">Benford's law</span> Observation that in many real-life datasets, the leading digit is likely to be small

Benford's law, also known as the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation that in many real-life sets of numerical data, the leading digit is likely to be small. In sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford's law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.

<span class="mw-page-title-main">Zipf's law</span> Probability distribution

Zipf's law is an empirical law that often holds, approximately, when a list of measured values is sorted in decreasing order. It states that the value of the nth entry is inversely proportional to n.

<span class="mw-page-title-main">Outlier</span> Observation far apart from others in statistics and data science

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses.

Econophysics is a non-orthodox interdisciplinary research field, applying theories and methods originally developed by physicists in order to solve problems in economics, usually those including uncertainty or stochastic processes and nonlinear dynamics. Some of its application to the study of financial markets has also been termed statistical finance referring to its roots in statistical physics. Econophysics is closely related to social physics.

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming.

The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based on measurements in 1927.

In probability and statistics, the parabolic fractal distribution is a type of discrete probability distribution in which the logarithm of the frequency or size of entities in a population is a quadratic polynomial of the logarithm of the rank. This can markedly improve the fit over a simple power-law relationship.

<span class="mw-page-title-main">Primate city</span> Disproportionately largest city in its country or region

A primate city is a city that is the largest in its country, province, state, or region, and disproportionately larger than any others in the urban hierarchy. A primate city distribution is a rank-size distribution that has one very large city with many much smaller cities and towns, and no intermediate-sized urban centers: a king effect, visible as an outlier on an otherwise linear graph, when the rest of the data fit a power law or stretched exponential function.

In statistical theory, Chauvenet's criterion is a means of assessing whether one piece of experimental data — an outlier — from a set of observations, is likely to be spurious.

In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise. Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates.

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

<span class="mw-page-title-main">Anscombe's quartet</span> Four data sets with the same descriptive statistics, yet very different distributions

Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data when analyzing it, and the effect of outliers and other influential observations on statistical properties. He described the article as being intended to counter the impression among statisticians that "numerical calculations are exact, but graphs are rough."

<span class="mw-page-title-main">Rank–size distribution</span>

Rank–size distribution is the distribution of size by rank, in decreasing order of size. For example, if a data set consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5. This is also known as the rank–frequency distribution, when the source data are from a frequency distribution. These are particularly of interest when the data vary significantly in scales, such as city size or word frequency. These distributions frequently follow a power law distribution, or less well-known ones such as a stretched exponential function or parabolic fractal distribution, at least approximately for certain ranges of ranks; see below.

<span class="mw-page-title-main">Plot (graphics)</span>

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

<span class="mw-page-title-main">Didier Sornette</span>

Didier Sornette is a French researcher studying subjects including complex systems and risk management. He is Professor on the Chair of Entrepreneurial Risks at the Swiss Federal Institute of Technology Zurich and is also a professor of the Swiss Finance Institute, He was previously a Professor of Geophysics at UCLA, Los Angeles California (1996–2006) and a Research Professor at the French National Centre for Scientific Research (1981–2006).

<span class="mw-page-title-main">Dragon king theory</span> Event that is both extremely large in effect and of unique origins

Dragon king (DK) is a double metaphor for an event that is both extremely large in size or effect (a "king") and born of unique origins (a "dragon") relative to its peers (other events from the same system). DK events are generated by or correspond to mechanisms such as positive feedback, tipping points, bifurcations, and phase transitions, that tend to occur in nonlinear and complex systems, and serve to amplify DK events to extreme levels. By understanding and monitoring these dynamics, some predictability of such events may be obtained.

References

  1. 1 2 3 4 "Stretched exponential distributions in nature and economy: "fat tails" with characteristic scales", J. Laherrère and D. Sornette
  2. Jayadev, Arjun (2008). "A power law tail in India's wealth distribution: Evidence from survey data". Physica A: Statistical Mechanics and Its Applications. 387: 270–276. doi:10.1016/j.physa.2007.08.049.
  3. "The individual success of musicians, like that of physicists, follows a stretch exponential", J.A. Davies