Mean absolute difference

Last updated

The mean absolute difference (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the relative mean absolute difference , which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the absolute mean difference (not to be confused with the absolute value of the mean signed difference) and the Gini mean difference (GMD). [1] The mean absolute difference is sometimes denoted by Δ or as MD.

Contents

Definition

The mean absolute difference is defined as the "average" or "mean", formally the expected value, of the absolute difference of two random variables X and Y independently and identically distributed with the same (unknown) distribution henceforth called Q.

Calculation

Specifically, in the discrete case,

In the continuous case,

An alternative form of the equation is given by:

Relative mean absolute difference

When the probability distribution has a finite and nonzero arithmetic mean AM, the relative mean absolute difference, sometimes denoted by Δ or RMD, is defined by

The relative mean absolute difference quantifies the mean absolute difference in comparison to the size of the mean and is a dimensionless quantity. The relative mean absolute difference is equal to twice the Gini coefficient which is defined in terms of the Lorenz curve. This relationship gives complementary perspectives to both the relative mean absolute difference and the Gini coefficient, including alternative ways of calculating their values.

Properties

The mean absolute difference is invariant to translations and negation, and varies proportionally to positive scaling. That is to say, if X is a random variable and c is a constant:

The relative mean absolute difference is invariant to positive scaling, commutes with negation, and varies under translation in proportion to the ratio of the original and translated arithmetic means. That is to say, if X is a random variable and c is a constant:

If a random variable has a positive mean, then its relative mean absolute difference will always be greater than or equal to zero. If, additionally, the random variable can only take on values that are greater than or equal to zero, then its relative mean absolute difference will be less than 2.

Compared to standard deviation

The mean absolute difference is twice the L-scale (the second L-moment), while the standard deviation is the square root of the variance about the mean (the second conventional central moment). The differences between L-moments and conventional moments are first seen in comparing the mean absolute difference and the standard deviation (the first L-moment and first conventional moment are both the mean).

Both the standard deviation and the mean absolute difference measure dispersion—how spread out are the values of a population or the probabilities of a distribution. The mean absolute difference is not defined in terms of a specific measure of central tendency, whereas the standard deviation is defined in terms of the deviation from the arithmetic mean. Because the standard deviation squares its differences, it tends to give more weight to larger differences and less weight to smaller differences compared to the mean absolute difference. When the arithmetic mean is finite, the mean absolute difference will also be finite, even when the standard deviation is infinite. See the examples for some specific comparisons.

The recently introduced distance standard deviation plays similar role to the mean absolute difference but the distance standard deviation works with centered distances. See also E-statistics.

Sample estimators

For a random sample S from a random variable X, consisting of n values yi, the statistic

is a consistent and unbiased estimator of MD(X). The statistic:

is a consistent estimator of RMD(X), but is not, in general, unbiased.

Confidence intervals for RMD(X) can be calculated using bootstrap sampling techniques.

There does not exist, in general, an unbiased estimator for RMD(X), in part because of the difficulty of finding an unbiased estimation for multiplying by the inverse of the mean. For example, even where the sample is known to be taken from a random variable X(p) for an unknown p, and X(p)  1 has the Bernoulli distribution, so that Pr(X(p) = 1) = 1  p and Pr(X(p) = 2) = p, then

RMD(X(p)) = 2p(1  p)/(1 + p).

But the expected value of any estimator R(S) of RMD(X(p)) will be of the form:[ citation needed ]

where the ri are constants. So E(R(S)) can never equal RMD(X(p)) for all p between 0 and 1.

Examples

Examples of mean absolute difference and relative mean absolute difference
DistributionParametersMeanStandard deviationMean absolute differenceRelative mean absolute difference
Continuous uniform
Normal ; undefined
Exponential
Pareto ;
Gamma ;
Gamma ;
Gamma ;
Gamma ;
Gamma ;
Bernoulli
Student's t, 2 d.f. undefined
is the Beta function

See also

Related Research Articles

<span class="mw-page-title-main">Cauchy distribution</span> Probability distribution

The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution is the distribution of the x-intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.

<span class="mw-page-title-main">Expected value</span> Average value of a random variable

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a large number of independently selected outcomes of a random variable.

<span class="mw-page-title-main">Gini coefficient</span> Measure of inequality of a distribution

In economics, the Gini coefficient, also known as the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income inequality, the wealth inequality, or the consumption inequality within a nation or a social group. It was developed by Italian statistician and sociologist Corrado Gini.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

<span class="mw-page-title-main">Lorenz curve</span> Graphical representation of the distribution of income or of wealth

In economics, the Lorenz curve is a graphical representation of the distribution of income or of wealth. It was developed by Max O. Lorenz in 1905 for representing inequality of the wealth distribution.

<span class="mw-page-title-main">Median</span> Middle quantile of a data set or probability distribution

In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe center of the income distribution because increases in the largest incomes alone have no effect on median. For this reason, the median is of central importance in robust statistics.

There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value of a given data set.

In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean; that is, it is the expected value of a specified integer power of the deviation of the random variable from the mean. The various moments form one set of values by which the properties of a probability distribution can be usefully characterized. Central moments are used in preference to ordinary moments, computed in terms of deviations from the mean instead of from zero, because the higher-order central moments relate only to the spread and shape of the distribution, rather than also to its location.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the squared deviation from the mean of a random variable. The variance is also often defined as the square of the standard deviation. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gumbel distribution</span> Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

In probability theory and statistics, the coefficient of variation (COV), also known as Normalized Root-Mean-Square Deviation (NRMSD), Percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation to the mean , and often expressed as a percentage ("%RSD"). The CV or RSD is widely used in analytical chemistry to express the precision and repeatability of an assay. It is also commonly used in fields such as engineering or physics when doing quality assurance studies and ANOVA gauge R&R, by economists and investors in economic models, and in neuroscience.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In statistics, mean absolute error (MAE) is a measure of errors between paired observations expressing the same phenomenon. Examples of Y versus X include comparisons of predicted versus observed, subsequent time versus initial time, and one technique of measurement versus an alternative technique of measurement. MAE is calculated as the sum of absolute errors divided by the sample size:

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.

The sample mean or empirical mean, and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables.

<span class="mw-page-title-main">Quantile regression</span> Statistics concept

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

<span class="mw-page-title-main">Distance correlation</span>

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

References

  1. Yitzhaki, Shlomo (2003). "Gini's Mean Difference: A Superior Measure of Variability for Non-Normal Distributions" (PDF). Metron International Journal of Statistics. Springer Verlag. 61 (2): 285–316.

Sources