In statistics, economics, and econophysics, the king effect is the phenomenon in which the top one or two members of a ranked set show up as clear outliers. These top one or two members are unexpectedly large because they do not conform to the statistical distribution or rank-distribution which the remainder of the set obeys.
Distributions typically followed include the power-law distribution, [2] that is a basis for the stretched exponential function, [1] [3] and parabolic fractal distribution. The King effect has been observed in the distribution of:
Note, however, that the king effect is not limited to outliers with a positive evaluation attached to their rank: for rankings on an undesirable attribute, there may exist a pauper effect, with a similar detachment of extremely ranked data points from the reasonably distributed portion of the data set.[ citation needed ]
In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a relative change in the other quantity proportional to the change raised to a constant exponent: one quantity varies as a power of another. The change is independent of the initial size of those quantities.
Benford's law, also known as the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation that in many real-life sets of numerical data, the leading digit is likely to be small. In sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. Uniformly distributed digits would each occur about 11.1% of the time. Benford's law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.
Zipf's law is an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the n th entry is often approximately inversely proportional to n .
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses.
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as is parametric statistics. Nonparametric statistics can be used for descriptive statistics or statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are evidently violated.
Econophysics is a non-orthodox interdisciplinary research field, applying theories and methods originally developed by physicists in order to solve problems in economics, usually those including uncertainty or stochastic processes and nonlinear dynamics. Some of its application to the study of financial markets has also been termed statistical finance referring to its roots in statistical physics. Econophysics is closely related to social physics.
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. The curse generally refers to issues that arise when the number of datapoints is small relative to the intrinsic dimension of the data.
In probability and statistics, the parabolic fractal distribution is a type of discrete probability distribution in which the logarithm of the frequency or size of entities in a population is a quadratic polynomial of the logarithm of the rank. This can markedly improve the fit over a simple power-law relationship.
In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method. It was first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. Most often, it is used for classification, as a k-NN classifier, the output of which is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise. Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates.
Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.
The stretched exponential function is obtained by inserting a fractional power law into the exponential function. In most applications, it is meaningful only for arguments t between 0 and +∞. With β = 1, the usual exponential function is recovered. With a stretching exponentβ between 0 and 1, the graph of log f versus t is characteristically stretched, hence the name of the function. The compressed exponential function has less practical importance, with the notable exception of β = 2, which gives the normal distribution.
Rank–size distribution is the distribution of size by rank, in decreasing order of size. For example, if a data set consists of items of sizes 5, 100, 5, and 8, the rank-size distribution is 100, 8, 5, 5. This is also known as the rank–frequency distribution, when the source data are from a frequency distribution. These are particularly of interest when the data vary significantly in scales, such as city size or word frequency. These distributions frequently follow a power law distribution, or less well-known ones such as a stretched exponential function or parabolic fractal distribution, at least approximately for certain ranges of ranks; see below.
Superstatistics is a branch of statistical mechanics or statistical physics devoted to the study of non-linear and non-equilibrium systems. It is characterized by using the superposition of multiple differing statistical models to achieve the desired non-linearity. In terms of ordinary statistical ideas, this is equivalent to compounding the distributions of random variables and it may be considered a simple case of a doubly stochastic model.
In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.
Didier Sornette is a French researcher studying subjects including complex systems and risk management. He is Professor on the Chair of Entrepreneurial Risks at the Swiss Federal Institute of Technology Zurich and is also a professor of the Swiss Finance Institute, He was previously a Professor of Geophysics at UCLA, Los Angeles California (1996–2006) and a Research Professor at the French National Centre for Scientific Research (1981–2006).
Dragon king is a double metaphor for an event that is both extremely large in size or effect and born of unique origins relative to its peers. DK events are generated by or correspond to mechanisms such as positive feedback, tipping points, bifurcations, and phase transitions, that tend to occur in nonlinear and complex systems, and serve to amplify Dragon king events to extreme levels. By understanding and monitoring these dynamics, some predictability of such events may be obtained.
Bent Jørgensen was a Danish statistician from the University of Southern Denmark whose research was focused on two related topics in statistics: dispersion models and the analysis of non-normal correlated data.
The Kaniadakis Gaussian distribution is a probability distribution which arises as a generalization of the Gaussian distribution from the maximization of the Kaniadakis entropy under appropriated constraints. It is one example of a Kaniadakis κ-distribution. The κ-Gaussian distribution has been applied successfully for describing several complex systems in economy, geophysics, astrophysics, among many others.