Generalized entropy index

Last updated
South Africa Inequality: Generalized Entropy Measure South Africa Inequality (Generalized Entropy Measure, Parameter 2) (5457977862).jpg
South Africa Inequality: Generalized Entropy Measure

The generalized entropy index has been proposed as a measure of income inequality in a population. [1] It is derived from information theory as a measure of redundancy in data. In information theory a measure of redundancy can be interpreted as non-randomness or data compression; thus this interpretation also applies to this index. In addition, interpretation of biodiversity as entropy has also been proposed leading to uses of generalized entropy to quantify biodiversity. [2]

Contents

Formula

The formula for general entropy for real values of is:

where N is the number of cases (e.g., households or families), is the income for case i and is a parameter which regulates the weight given to distances between incomes at different parts of the income distribution. For large the index is especially sensitive to the existence of large incomes, whereas for small the index is especially sensitive to the existence of small incomes.

An Atkinson index for any inequality aversion parameter can be derived from a generalized entropy index under the restriction that - i.e. an Atkinson index with high inequality aversion is derived from a GE index with small . Moreover, it is the unique class of inequality measures that is a monotone transformation of the Atkinson index and which is additive decomposable. Many popular indices, including Gini index, do not satisfy additive decomposability. [1] [3]

The formula for deriving an Atkinson index with inequality aversion parameter under the restriction is given by:

Note that the generalized entropy index has several income inequality metrics as special cases. For example, GE(0) is the mean log deviation, GE(1) is the Theil index, and GE(2) is half the squared coefficient of variation.

See also

Related Research Articles

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Pareto distribution</span> Probability distribution

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.
<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

Income inequality metrics or income distribution metrics are used by social scientists to measure the distribution of income and economic inequality among the participants in a particular economy, such as that of a specific country or of the world in general. While different theories may try to explain how income inequality comes about, income inequality metrics simply provide a system of measurement used to determine the dispersion of incomes. The concept of inequality is distinct from poverty and fairness.

In mathematics, subadditivity is a property of a function that states, roughly, that evaluating the function for the sum of two elements of the domain always returns something less than or equal to the sum of the function's values at each element. There are numerous examples of subadditive functions in various areas of mathematics, particularly norms and square roots. Additive maps are special cases of subadditive functions.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms is combined into a weighted sum that represents the final output of the boosted classifier. Usually, AdaBoost is presented for binary classification, although it can be generalized to multiple classes or bounded intervals on the real line.

In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

<span class="mw-page-title-main">Simple linear regression</span> Linear regression model with a single explanatory variable

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

The Theil index is a statistic primarily used to measure economic inequality and other economic phenomena, though it has also been used to measure racial segregation.

The Atkinson index is a measure of income inequality developed by British economist Anthony Barnes Atkinson. The measure is useful in determining which end of the distribution contributed most to the observed inequality.

<span class="mw-page-title-main">Dvoretzky–Kiefer–Wolfowitz inequality</span> Statistical inequality

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality provides a bound on the worst case distance of an empirically determined distribution function from its associated population distribution function. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality

The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below.

Financial models with long-tailed distributions and volatility clustering have been introduced to overcome problems with the realism of classical financial models. These classical models of financial time series typically assume homoskedasticity and normality cannot explain stylized phenomena such as skewness, heavy tails, and volatility clustering of the empirical asset returns in finance. In 1963, Benoit Mandelbrot first used the stable distribution to model the empirical distributions which have the skewness and heavy-tail property. Since -stable distributions have infinite -th moments for all , the tempered stable processes have been proposed for overcoming this limitation of the stable distribution.

The Foster–Greer–Thorbeckeindices are a family ofpoverty metrics. The most commonly used index from the family, FGT2, puts higher weight on the poverty of the poorest individuals, making it a combined measure of poverty and income inequality and a popular choice within development economics. The indices were introduced in a 1984 paper by economists Erik Thorbecke, Joel Greer, and James Foster.

In macroeconomics, the cost of business cycles is the decrease in social welfare, if any, caused by business cycle fluctuations.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

In statistics and econometrics, the mean log deviation (MLD) is a measure of income inequality. The MLD is zero when everyone has the same income, and takes larger positive values as incomes become more unequal, especially at the high end.

References

  1. 1 2 Shorrocks, A. F. (1980). "The Class of Additively Decomposable Inequality Measures". Econometrica. 48 (3): 613–625. doi:10.2307/1913126. JSTOR   1913126.
  2. Pielou, E.C. (December 1966). "The measurement of diversity in different types of biological collections". Journal of Theoretical Biology. 13: 131–144. Bibcode:1966JThBi..13..131P. doi:10.1016/0022-5193(66)90013-0.
  3. STEPHEN, JENKINS. "CALCULATING INCOME DISTRIBUTION INDICES FROM MICRO-DATA" (PDF). National Tax Journal . University of Oregon.