Interdecile range

Last updated

In statistics, the interdecile range is the difference between the first and the ninth deciles (10% and 90%). The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile range, and can be computed from the (non-parametric) seven-number summary.

Despite its simplicity, the interdecile range of a sample drawn from a normal distribution can be divided by 2.56 to give a reasonably efficient estimator [ citation needed ] of the standard deviation of a normal distribution. This is derived from the fact that the lower (respectively upper) decile of a normal distribution with arbitrary variance is equal to the mean minus (respectively, plus) 1.28 times the standard deviation.

A more efficient estimator is given by instead taking the 7% trimmed range (the difference between the 7th and 93rd percentiles) and dividing by 3 (corresponding to 86% of the data falling within ±1.5 standard deviations of the mean in a normal distribution); this yields an estimator having about 65% efficiency. [1] Analogous measures of location are given by the median, midhinge, and trimean (or statistics based on nearby points).

See also

Related Research Articles

In statistics, a central tendency is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages. The term central tendency dates from the late 1920s.

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished.

Interquartile range measure of statistical dispersion

In descriptive statistics, the interquartile range (IQR), also called the midspread, middle 50%, or H‑spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q3 − Q1. In other words, the IQR is the first quartile subtracted from the third quartile; these quartiles can be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed range, and is a commonly used robust measure of scale.

Quantile cutpoint dividing a set of observations into equal sized groups

In statistics and probability quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Thus quartiles are the three cut points that will divide a dataset into four equal-sized groups. Common quantiles have special names: for instance quartile, decile. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

Summary statistics

In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in

Standard deviation Measure of the amount of variation or dispersion of a set of values

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

The average absolute deviation, or mean absolute deviation (MAD), of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be the mean, median, mode, or the result of any other measure of central tendency or any random data point related to the given data set. The absolute values of the differences between the data points and their central tendency are totaled and divided by the number of data points.

Errors and residuals

In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "theoretical value". The error of an observed value is the deviation of the observed value from the (unobservable) true value of a quantity of interest, and the residual of an observed value is the difference between the observed value and the estimated value of the quantity of interest. The distinction is most important in regression analysis, where the concepts are sometimes called the regression errors and regression residuals and where they lead to the concept of studentized residuals.

In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation to the mean . The CV or RSD is widely used in analytical chemistry to express the precision and repeatability of an assay. It is also commonly used in fields such as engineering or physics when doing quality assurance studies and ANOVA gauge R&R. In addition, CV is utilized by economists and investors in economic models.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

In statistics, the mid-range or mid-extreme of a set of statistical data values is the arithmetic mean of the maximum and minimum values in a data set, defined as:

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.

In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference. The magnitude of the value indicates the size of the difference.

In statistics the trimean (TM), or Tukey's trimean, is a measure of a probability distribution's location defined as a weighted average of the distribution's median and its two quartiles:

In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.

L-estimator

In statistics, an L-estimator is an estimator which is an L-statistic – a linear combination of order statistics of the measurements. This can be as little as a single point, as in the median, or as many as all points, as in the mean.

In statistics, a trimmed estimator is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers. Trimmed estimators also often have higher efficiency for mixture distributions and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution.

Sample maximum and minimum

In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample. They are basic summary statistics, used in descriptive statistics such as the five-number summary and Bowley's seven-figure summary and the associated box plot.

In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of numerical data. The most common such statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional measures of scale, such as sample variance or sample standard deviation, which are non-robust, meaning greatly influenced by outliers.

Statistical dispersion general term for the value describing how spread out the data are

In statistics, dispersion is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.

References

  1. Evans 1955, Appendix G: Inefficient statistics, pp. 902–904.