Lorenz curve

Last updated February 26, 2024

In economics, the Lorenz curve is a graphical representation of the distribution of income or of wealth. It was developed by Max O. Lorenz in 1905 for representing inequality of the wealth distribution.

The curve is a graph showing the proportion of overall income or wealth assumed by the bottom x% of the people, although this is not rigorously true for a finite population (see below). It is often used to represent income distribution, where it shows for the bottom x% of households, what percentage (y%) of the total income they have. The percentage of households is plotted on the x-axis, the percentage of income on the y-axis. It can also be used to show distribution of assets. In such use, many economists consider it to be a measure of social inequality.

The concept is useful in describing inequality among the size of individuals in ecology ^[1] and in studies of biodiversity, where the cumulative proportion of species is plotted against the cumulative proportion of individuals.^[2] It is also useful in business modeling: e.g., in consumer finance, to measure the actual percentage y% of delinquencies attributable to the x% of people with worst risk scores. Lorenz curves were also applied to epidemiology and public health, e.g., to measure pandemic inequality as the distribution of national cumulative incidence (y%) generated by the population residing in areas (x%) ranked with respect to their local epidemic attack rate.^[3]

Explanation

Data from 2005.

Points on the Lorenz curve represent statements such as, "the bottom 20% of all households have 10% of the total income."

A perfectly equal income distribution would be one in which every person has the same income. In this case, the bottom N% of society would always have N% of the income. This can be depicted by the straight line y = x; called the "line of perfect equality."

By contrast, a perfectly unequal distribution would be one in which one person has all the income and everyone else has none. In that case, the curve would be at y = 0% for all x < 100%, and y = 100% when x = 100%. This curve is called the "line of perfect inequality."

The Gini coefficient is the ratio of the area between the line of perfect equality and the observed Lorenz curve to the area between the line of perfect equality and the line of perfect inequality. The higher the coefficient, the more unequal the distribution is. In the diagram on the right, this is given by the ratio A/(A+B), where A and B are the areas of regions as marked in the diagram.

Definition and calculation

The Lorenz curve is a probability plot (a P–P plot) comparing the distribution of a variable against a hypothetical uniform distribution of that variable. It can usually be represented by a function L(F), where F, the cumulative portion of the population, is represented by the horizontal axis, and L, the cumulative portion of the total wealth or income, is represented by the vertical axis.

The curve L need not be a smoothly increasing function of F, For wealth distributions there may be oligarchies or people with negative wealth for instance.^[4]

For a discrete distribution of Y given by values y₁, ..., y_n in non-decreasing order ( y_i ≤ y_i+1) and their probabilities $f(y_{j}):=\Pr(Y=y_{j})$ the Lorenz curve is the continuous piecewise linear function connecting the points ( F_i, L_i ), i = 0 to n, where F₀ = 0, L₀ = 0, and for i = 1 to n:

{\begin{aligned}F_{i}&:=\sum _{j=1}^{i}f(y_{j})\\S_{i}&:=\sum _{j=1}^{i}f(y_{j})\,y_{j}\\L_{i}&:={\frac {S_{i}}{S_{n}}}\end{aligned}}

When all y_i are equally probable with probabilities 1/n this simplifies to

{\begin{aligned}F_{i}&={\frac {i}{n}}\\S_{i}&={\frac {1}{n}}\sum _{j=1}^{i}\;y_{j}\\L_{i}&={\frac {S_{i}}{S_{n}}}\end{aligned}}

For a continuous distribution with the probability density function f and the cumulative distribution function F, the Lorenz curve L is given by:

L(F(x))={\frac {\int _{-\infty }^{x}t\,f(t)\,dt}{\int _{-\infty }^{\infty }t\,f(t)\,dt}}={\frac {\int _{-\infty }^{x}t\,f(t)\,dt}{\mu }}

where $\mu$ denotes the average. The Lorenz curve L(F) may then be plotted as a function parametric in x: L(x) vs. F(x). In other contexts, the quantity computed here is known as the length biased (or size biased) distribution; it also has an important role in renewal theory.

Alternatively, for a cumulative distribution function F(x) with inverse x(F), the Lorenz curve L(F) is directly given by:

L(F)={\frac {\int _{0}^{F}x(F_{1})\,dF_{1}}{\int _{0}^{1}x(F_{1})\,dF_{1}}}

The inverse x(F) may not exist because the cumulative distribution function has intervals of constant values. However, the previous formula can still apply by generalizing the definition of x(F):

x(F_{1})=\inf\{y:F(y)\geq F_{1}\}

where $inf$ is the infimum.

For an example of a Lorenz curve, see Pareto distribution.

Properties

A Lorenz curve always starts at (0,0) and ends at (1,1).

The Lorenz curve is not defined if the mean of the probability distribution is zero or infinite.

The Lorenz curve for a probability distribution is a continuous function. However, Lorenz curves representing discontinuous functions can be constructed as the limit of Lorenz curves of probability distributions, the line of perfect inequality being an example.

The information in a Lorenz curve may be summarized by the Gini coefficient and the Lorenz asymmetry coefficient.^[1]

The Lorenz curve cannot rise above the line of perfect equality.

A Lorenz curve that never falls beneath a second Lorenz curve and at least once runs above it, has Lorenz dominance over the second one.^[5]

If the variable being measured cannot take negative values, the Lorenz curve:

cannot sink below the line of perfect inequality,
is increasing.

Note however that a Lorenz curve for net worth would start out by going negative due to the fact that some people have a negative net worth because of debt.

The Lorenz curve is invariant under positive scaling. If X is a random variable, for any positive number c the random variable cX has the same Lorenz curve as X.

The Lorenz curve is flipped twice, once about F = 0.5 and once about L = 0.5, by negation. If X is a random variable with Lorenz curve L_X(F), then −X has the Lorenz curve:

L_{− X} = 1 − L_X(1 − F)

The Lorenz curve is changed by translations so that the equality gap F − L(F) changes in proportion to the ratio of the original and translated means. If X is a random variable with a Lorenz curve L_X(F) and mean μ_X, then for any constant c ≠ −μ_X, X + c has a Lorenz curve defined by:

F-L_{X+c}(F)={\frac {\mu _{X}}{\mu _{X}+c}}(F-L_{X}(F))

For a cumulative distribution function F(x) with mean μ and (generalized) inverse x(F), then for any F with 0 <F< 1 :

If the Lorenz curve is differentiable: ${\frac {dL(F)}{dF}}={\frac {x(F)}{\mu }}$
If the Lorenz curve is twice differentiable, then the probability density function f(x) exists at that point and: ${\frac {d^{2}L(F)}{dF^{2}}}={\frac {1}{\mu \,f(x(F))}}\,$
If L(F) is continuously differentiable, then the tangent of L(F) is parallel to the line of perfect equality at the point F(μ). This is also the point at which the equality gap F − L(F), the vertical distance between the Lorenz curve and the line of perfect equality, is greatest. The size of the gap is equal to half of the relative mean absolute deviation: $F(\mu )-L(F(\mu ))={\frac {\text{mean absolute deviation}}{2\,\mu }}$

Related Research Articles

In economics, the Gini coefficient, also known as the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income inequality, the wealth inequality, or the consumption inequality within a nation or a social group. It was developed by Italian statistician and sociologist Corrado Gini.

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log₄5 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable $X$ is log-normally distributed, then $Y = ln(X)$ has a normal distribution. Equivalently, if $Y$ has a normal distribution, then the exponential function of $Y$ , $X = exp(Y)$ , has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

<span class="mw-page-title-main">Jensen's inequality</span> Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment is the center of mass, and the second moment is the moment of inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.

Income inequality metrics or income distribution metrics are used by social scientists to measure the distribution of income and economic inequality among the participants in a particular economy, such as that of a specific country or of the world in general. While different theories may try to explain how income inequality comes about, income inequality metrics simply provide a system of measurement used to determine the dispersion of incomes. The concept of inequality is distinct from poverty and fairness.

<span class="mw-page-title-main">Logistic distribution</span> Continuous probability distribution

In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails. The logistic distribution is a special case of the Tukey lambda distribution.

In probability theory and statistics, the coefficient of variation (CV), also known as Normalized Root-Mean-Square Deviation (NRMSD), Percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation $to the mean, and often expressed as a percentage ("%RSD"). The CV or RSD is widely used in analytical chemistry to express the precision and repeatability of an assay. It is also commonly used in fields such as engineering or physics when doing quality assurance studies and ANOVA gauge R&R, by economists and investors in economic models, and in psychology/neuroscience.$

In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponential distributions spliced together along the abscissa, although the term is also sometimes used to refer to the Gumbel distribution. The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance gamma process evaluated over the time scale also have a Laplace distribution.

In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets.

The Theil index is a statistic primarily used to measure economic inequality and other economic phenomena, though it has also been used to measure racial segregation.

The Atkinson index is a measure of income inequality developed by British economist Anthony Barnes Atkinson. The measure is useful in determining which end of the distribution contributed most to the observed inequality.

The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference and the Gini mean difference (GMD). The mean absolute difference is sometimes denoted by Δ or as MD.

<span class="mw-page-title-main">Generalized Pareto distribution</span> Family of probability distributions often used to model tails or extreme values

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location $, scale, and shape . Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as .$

<span class="mw-page-title-main">Log-logistic distribution</span>

In probability and statistics, the log-logistic distribution is a continuous probability distribution for a non-negative random variable. It is used in survival analysis as a parametric model for events whose rate increases initially and decreases later, as, for example, mortality rate from cancer following diagnosis or treatment. It has also been used in hydrology to model stream flow and precipitation, in economics as a simple model of the distribution of wealth or income, and in networking to model the transmission times of data considering both the network and the software.

The Lorenz asymmetry coefficient (LAC) is a summary statistic of the Lorenz curve that measures the degree of asymmetry of the curve. The Lorenz curve is used to describe the inequality in the distribution of a quantity. The most common summary statistic for the Lorenz curve is the Gini coefficient, which is an overall measure of inequality within the population. The Lorenz asymmetry coefficient can be a useful supplement to the Gini coefficient. The Lorenz asymmetry coefficient is defined as

<span class="mw-page-title-main">Dagum distribution</span>

The Dagum distribution is a continuous probability distribution defined over positive real numbers. It is named after Camilo Dagum, who proposed it in a series of papers in the 1970s. The Dagum distribution arose from several variants of a new model on the size distribution of personal income and is mostly associated with the study of income distribution. There is both a three-parameter specification and a four-parameter specification of the Dagum distribution; a summary of the genesis of this distribution can be found in "A Guide to the Dagum Distributions". A general source on statistical size distributions often cited in work using the Dagum distribution is Statistical Size Distributions in Economics and Actuarial Sciences.

In probability theory and statistics, the Hermite distribution, named after Charles Hermite, is a discrete probability distribution used to model count data with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate over-dispersion in the data.

References

1 2 Damgaard, Christian; Jacob Weiner (2000). "Describing inequality in plant size or fecundity". Ecology. 81 (4): 1139–1142. doi:10.1890/0012-9658(2000)081[1139:DIIPSO]2.0.CO;2.
↑ Wittebolle, Lieven; et al. (2009). "Initial community evenness favours functionality under selective stress". Nature . 458 (7238): 623–626. Bibcode:2009Natur.458..623W. doi:10.1038/nature07840. PMID 19270679. S2CID 4419280.
↑ Nguyen, Quang D.; Chang, Sheryl L.; Jamerlan, Christina M.; Prokopenko, Mikhail (2023). "Measuring unequal distribution of pandemic severity across census years, variants of concern and interventions". Population Health Metrics. 21 (17): 17. doi: 10.1186/s12963-023-00318-6 . PMC 10613397 . PMID 37899455.
↑ Li, Jie; Boghosian, Bruce M.; Li, Chengli (14 February 2018). "The Affine Wealth Model: An agent-based model of asset exchange that allows for negative-wealth agents and its empirical validation". arXiv: 1604.02370v2 .{{cite journal}}: Cite journal requires |journal= (help)
↑ Bishop, John A.; Formby, John P.; Smith, W. James (1991). "Lorenz Dominance and Welfare: Changes in the U.S. Distribution of Income, 1967-1986". The Review of Economics and Statistics. 73 (1): 134–139. doi:10.2307/2109695. ISSN 0034-6535. JSTOR 2109695.

External links

WIID Archived 2011-03-13 at the Wayback Machine : World Income Inequality Database, a source of information on inequality, collected by WIDER (World Institute for Development Economics Research, part of United Nations University)
glcurve: Stata module to plot Lorenz curve (type "findit glcurve" or "ssc install glcurve" in Stata prompt to install)
Free add-on to STATA to compute inequality and poverty measures
Free Online Software (Calculator) computes the Gini Coefficient, plots the Lorenz curve, and computes many other measures of concentration for any dataset
Free Calculator: Online and downloadable scripts (Python and Lua) for Atkinson, Gini, and Hoover inequalities
Users of the R data analysis software can install the "ineq" package which allows for computation of a variety of inequality indices including Gini, Atkinson, Theil.
A MATLAB Inequality Package Archived 2008-10-04 at the Wayback Machine , including code for computing Gini, Atkinson, Theil indexes and for plotting the Lorenz Curve. Many examples are available.
A complete handout about the Lorenz curve including various applications, including an Excel spreadsheet graphing Lorenz curves and calculating Gini coefficients as well as coefficients of variation.
LORENZ 3.0 is a Mathematica notebook which draw sample Lorenz curves and calculates Gini coefficients and Lorenz asymmetry coefficients from data in an Excel sheet.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[EcolgyArticle-1] 1 2 Damgaard, Christian; Jacob Weiner (2000). "Describing inequality in plant size or fecundity". Ecology. 81 (4): 1139–1142. doi:10.1890/0012-9658(2000)081[1139:DIIPSO]2.0.CO;2.

[natureArticle-2] Wittebolle, Lieven; et al. (2009). "Initial community evenness favours functionality under selective stress". Nature . 458 (7238): 623–626. Bibcode:2009Natur.458..623W. doi:10.1038/nature07840. PMID 19270679. S2CID 4419280.

[3] Nguyen, Quang D.; Chang, Sheryl L.; Jamerlan, Christina M.; Prokopenko, Mikhail (2023). "Measuring unequal distribution of pandemic severity across census years, variants of concern and interventions". Population Health Metrics. 21 (17): 17. doi: 10.1186/s12963-023-00318-6 . PMC 10613397 . PMID 37899455.

[4] Li, Jie; Boghosian, Bruce M.; Li, Chengli (14 February 2018). "The Affine Wealth Model: An agent-based model of asset exchange that allows for negative-wealth agents and its empirical validation". arXiv: 1604.02370v2 .{{cite journal}}: Cite journal requires |journal= (help)

[5] Bishop, John A.; Formby, John P.; Smith, W. James (1991). "Lorenz Dominance and Welfare: Changes in the U.S. Distribution of Income, 1967-1986". The Review of Economics and Statistics. 73 (1): 134–139. doi:10.2307/2109695. ISSN 0034-6535. JSTOR 2109695.

[1]

[2]

[3]

[4]

[5]

Authority control databases
International	FAST
National	France BnF data Germany Israel United States