This article needs additional citations for verification .(December 2010) |

In statistics, **dispersion** (also called **variability**, **scatter**, or **spread**) is the extent to which a distribution is stretched or squeezed.^{ [1] } Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered.

Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.

A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse.

Most measures of dispersion have the same units as the quantity being measured. In other words, if the measurements are in metres or seconds, so is the measure of dispersion. Examples of dispersion measures include:

- Standard deviation
- Interquartile range (IQR)
- Range
- Mean absolute difference (also known as Gini mean absolute difference)
- Median absolute deviation (MAD)
- Average absolute deviation (or simply called average deviation)
- Distance standard deviation

These are frequently used (together with scale factors) as estimators of scale parameters, in which capacity they are called **estimates of scale.** Robust measures of scale are those unaffected by a small number of outliers, and include the IQR and MAD.

All the above measures of statistical dispersion have the useful property that they are *location-invariant* and *linear in scale*. This means that if a random variable *X* has a dispersion of *S _{X}* then a linear transformation

Other measures of dispersion are ** dimensionless **. In other words, they have no units even if the variable itself has units. These include:

- Coefficient of variation
- Quartile coefficient of dispersion
- Relative mean difference, equal to twice the Gini coefficient
- Entropy: While the entropy of a discrete variable is location-invariant and scale-independent, and therefore not a measure of dispersion in the above sense, the entropy of a continuous variable is location invariant and additive in scale: If
*Hz*is the entropy of continuous variable*z*and z*=ax+b*, then*Hz=Hx+log(a)*.

There are other measures of dispersion:

- Variance (the square of the standard deviation) – location-invariant but not linear in scale.
- Variance-to-mean ratio – mostly used for count data when the term coefficient of dispersion is used and when this ratio is dimensionless, as count data are themselves dimensionless, not otherwise.

Some measures of dispersion have specialized purposes. The Allan variance can be used for applications where the noise disrupts convergence.^{ [2] } The Hadamard variance can be used to counteract linear frequency drift sensitivity.^{ [3] }

For categorical variables, it is less common to measure dispersion by a single number; see qualitative variation. One measure that does so is the discrete entropy.

In the physical sciences, such variability may result from random measurement errors: instrument measurements are often not perfectly precise, i.e., reproducible, and there is additional inter-rater variability in interpreting and reporting the measured results. One may assume that the quantity being measured is stable, and that the variation between measurements is due to observational error. A system of a large number of particles is characterized by the mean values of a relatively few number of macroscopic quantities such as temperature, energy, and density. The standard deviation is an important measure in fluctuation theory, which explains many physical phenomena, including why the sky is blue.^{ [4] }

In the biological sciences, the quantity being measured is seldom unchanging and stable, and the variation observed might additionally be *intrinsic* to the phenomenon: It may be due to *inter-individual variability*, that is, distinct members of a population differing from each other. Also, it may be due to * intra-individual variability*, that is, one and the same subject differing in tests taken at different times or in other differing conditions. Such types of variability are also seen in the arena of manufactured products; even there, the meticulous scientist finds variation.

In economics, finance, and other disciplines, regression analysis attempts to explain the dispersion of a dependent variable, generally measured by its variance, using one or more independent variables each of which itself has positive dispersion. The fraction of variance explained is called the coefficient of determination.

A mean-preserving spread (MPS) is a change from one probability distribution A to another probability distribution B, where B is formed by spreading out one or more portions of A's probability density function while leaving the mean (the expected value) unchanged.^{ [5] } The concept of a mean-preserving spread provides a partial ordering of probability distributions according to their dispersions: of two probability distributions, one may be ranked as having more dispersion than the other, or alternatively neither may be ranked as having more dispersion.

Wikimedia Commons has media related to Dispersion (statistics) . |

In statistics, a **central tendency** is a central or typical value for a probability distribution. It may also be called a **center** or **location** of the distribution. Colloquially, measures of central tendency are often called *averages.* The term *central tendency* dates from the late 1920s.

In descriptive statistics, the **interquartile range** (**IQR**), also called the **midspread**, **middle 50%**, or **H‑spread**, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = *Q*_{3} − *Q*_{1}. In other words, the IQR is the first quartile subtracted from the third quartile; these quartiles can be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed range, and is a commonly used robust measure of scale.

In descriptive statistics, **summary statistics** are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in

In probability theory and statistics, **variance** is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory and statistics, a **standardized moment** of a probability distribution is a moment that is normalized. The normalization is typically a division by an expression of the standard deviation which renders the moment scale invariant. This has the advantage that such normalized moments differ only in other properties than variability, facilitating e.g. comparison of shape of different probability distributions.

In statistics, as opposed to its general use in mathematics, a **parameter** is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population exactly follows a known and defined distribution, for example the normal distribution, then a small set of parameters can be measured which completely describes the population, and can be considered to define a probability distribution for the purposes of extracting samples from this population.

In statistics and optimization, **errors** and **residuals** are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "theoretical value". The **error** of an observed value is the deviation of the observed value from the (unobservable) *true* value of a quantity of interest, and the **residual** of an observed value is the difference between the observed value and the *estimated* value of the quantity of interest. The distinction is most important in regression analysis, where the concepts are sometimes called the **regression errors** and **regression residuals** and where they lead to the concept of studentized residuals.

The **standard error** (**SE**) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the **standard error of the mean** (**SEM**).

In probability theory and statistics, the **coefficient of variation** (**CV**), also known as **relative standard deviation** (**RSD**), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation to the mean . The CV or RSD is widely used in analytical chemistry to express the precision and repeatability of an assay. It is also commonly used in fields such as engineering or physics when doing quality assurance studies and ANOVA gauge R&R. In addition, CV is utilized by economists and investors in economic models.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

The **mean absolute difference** (univariate) is a measure of statistical dispersion equal to the average absolute difference of two independent values drawn from a probability distribution. A related statistic is the **relative mean absolute difference**, which is the mean absolute difference divided by the arithmetic mean, and equal to twice the Gini coefficient. The mean absolute difference is also known as the **absolute mean difference** and the **Gini mean difference** (GMD). The mean absolute difference is sometimes denoted by Δ or as MD.

The **root-mean-square deviation** (**RMSD**) or **root-mean-square error** (**RMSE**) is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called *residuals* when the calculations are performed over the data sample that was used for estimation and are called *errors* when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.

In mathematics and statistics, **deviation** is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference. The magnitude of the value indicates the size of the difference.

In probability theory and statistics, the **index of dispersion**, **dispersion index,****coefficient of dispersion,****relative variance**, or **variance-to-mean ratio (VMR)**, like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.

In statistics, an **L-estimator** is an estimator which is a linear combination of order statistics of the measurements. This can be as little as a single point, as in the median, or as many as all points, as in the mean.

In statistics, a **trimmed estimator** is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers. Trimmed estimators also often have higher efficiency for mixture distributions and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution.

In statistics, **robust measures of scale** are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the *interquartile range* (IQR) and the *median absolute deviation* (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample variance or standard deviation, which are greatly influenced by outliers.

In statistics, **linear regression** is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called *simple linear regression*; for more than one, the process is called **multiple linear regression**. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

**Univariate** is a term commonly used in statistics to describe a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry. Like all the other data, univariate data can be visualized using graphs, images or other analysis tools after the data is measured, collected, reported, and analyzed.

- ↑ NIST/SEMATECH e-Handbook of Statistical Methods. "1.3.6.4. Location and Scale Parameters".
*www.itl.nist.gov*. U.S. Department of Commerce. - ↑ "Allan Variance -- Overview by David W. Allan".
*www.allanstime.com*. Retrieved 2021-09-16. - ↑ "Hadamard Variance".
*www.wriley.com*. Retrieved 2021-09-16. - ↑ McQuarrie, Donald A. (1976).
*Statistical Mechanics*. NY: Harper & Row. ISBN 0-06-044366-9. - ↑ Rothschild, Michael; Stiglitz, Joseph (1970). "Increasing risk I: A definition".
*Journal of Economic Theory*.**2**(3): 225–243. doi:10.1016/0022-0531(70)90038-4.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.