This article needs additional citations for verification .(May 2012) (Learn how and when to remove this template message) |

In descriptive statistics, the **interquartile range** (**IQR**), also called the **midspread**, **middle 50%**, or **H‑spread**, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles,^{ [1] }^{ [2] } IQR = *Q*_{3} − *Q*_{1}. In other words, the IQR is the first quartile subtracted from the third quartile; these quartiles can be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed range, and is a commonly used robust measure of scale.

- Use
- Algorithm
- Examples
- Data set in a table
- Data set in a plain-text box plot
- Distributions
- Interquartile range test for normality of distribution
- Outliers
- See also
- References
- External links

The IQR is a measure of variability, based on dividing a data set into quartiles. Quartiles divide a rank-ordered data set into four equal parts. The values that separate parts are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively.

Unlike total range, the interquartile range has a breakdown point of 25%,^{ [3] } and is thus often preferred to the total range.

The IQR is used to build box plots, simple graphical representations of a probability distribution.

The IQR is used in businesses as a marker for their income rates.

For a symmetric distribution (where the median equals the midhinge, the average of the first and third quartiles), half the IQR equals the median absolute deviation (MAD).

The median is the corresponding measure of central tendency.

The IQR can be used to identify outliers (see below).

The quartile deviation or semi-interquartile range is defined as half the IQR.^{ [4] }^{ [5] }

The IQR of a set of values is calculated as the difference between the upper and lower quartiles, Q_{3} and Q_{1}. Each quartile is a median^{ [6] } calculated as follows.

Given an even *2n* or odd *2n+1* number of values

*first quartile Q*= median of the_{1}*n*smallest values*third quartile Q*= median of the_{3}*n*largest values^{ [6] }

The *second quartile Q _{2}* is the same as the ordinary median.

The following table has 13 rows, and follows the rules for the odd number of entries.

i | x[i] | Median | Quartile |
---|---|---|---|

1 | 7 | Q_{2}=87(median of whole table) | Q_{1}=31(median of upper half, from row 1 to 6) |

2 | 7 | ||

3 | 31 | ||

4 | 31 | ||

5 | 47 | ||

6 | 75 | ||

7 | 87 | ||

8 | 115 | ||

Q_{3}=119(median of lower half, from row 8 to 13) | |||

9 | 116 | ||

10 | 119 | ||

11 | 119 | ||

12 | 155 | ||

13 | 177 |

For the data in this table the interquartile range is IQR = Q_{3}− Q_{1} = 119 - 31 = 88.

+−−−−−+−+ * |−−−−−−−−−−−| | |−−−−−−−−−−−| +−−−−−+−+ +−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+ number line 0 1 2 3 4 5 6 7 8 9 10 11 12

For the data set in this box plot:

- lower (first) quartile
*Q*_{1}= 7 - median (second quartile)
*Q*_{2}= 8.5 - upper (third) quartile
*Q*_{3}= 9 - interquartile range, IQR =
*Q*_{3}-*Q*_{1}= 2 - lower 1.5*IQR whisker =
*Q*_{1}- 1.5 * IQR = 7 - 3 = 4. (If there is no data point at 4, then the lowest point greater than 4.) - upper 1.5*IQR whisker =
*Q*_{3}+ 1.5 * IQR = 9 + 3 = 12. (If there is no data point at 12, then the highest point less than 12.)

This means the 1.5*IQR whiskers can be uneven in lengths.

The interquartile range of a continuous distribution can be calculated by integrating the probability density function (which yields the cumulative distribution function—any other means of calculating the CDF will also work). The lower quartile, *Q*_{1}, is a number such that integral of the PDF from -∞ to *Q*_{1} equals 0.25, while the upper quartile, *Q*_{3}, is such a number that the integral from -∞ to *Q*_{3} equals 0.75; in terms of the CDF, the quartiles can be defined as follows:

where CDF^{−1} is the quantile function.

The interquartile range and median of some common distributions are shown below

Distribution | Median | IQR |
---|---|---|

Normal | μ | 2 Φ^{−1}(0.75)σ ≈ 1.349σ ≈ (27/20)σ |

Laplace | μ | 2b ln(2) ≈ 1.386b |

Cauchy | μ | 2γ |

The IQR, mean, and standard deviation of a population *P* can be used in a simple test of whether or not *P* is normally distributed, or Gaussian. If *P* is normally distributed, then the standard score of the first quartile, *z*_{1}, is −0.67, and the standard score of the third quartile, *z*_{3}, is +0.67. Given *mean* = *X* and *standard deviation* = σ for *P*, if *P* is normally distributed, the first quartile

and the third quartile

If the actual values of the first or third quartiles differ substantially^{[ clarification needed ]} from the calculated values, *P* is not normally distributed. However, a normal distribution can be trivially perturbed to maintain its Q1 and Q2 std. scores at 0.67 and −0.67 and not be normally distributed (so the above test would produce a false positive). A better test of normality, such as Q-Q plot would be indicated here.

The interquartile range is often used to find outliers in data. Outliers here are defined as observations that fall below Q1 − 1.5 IQR or above Q3 + 1.5 IQR. In a boxplot, the highest and lowest occurring value within this limit are indicated by *whiskers* of the box (frequently with an additional bar at the end of the whisker) and any outliers as individual points.

In statistics, a **central tendency** is a central or typical value for a probability distribution. It may also be called a **center** or **location** of the distribution. Colloquially, measures of central tendency are often called *averages.* The term *central tendency* dates from the late 1920s.

A **histogram** is an approximate representation of the distribution of numerical data. It was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often of equal size.

In statistics, a **quartile** is a type of quantile which divides the number of data points into four parts, or *quarters*, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows:

In statistics and probability, **quantiles** are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Common quantiles have special names, such as quartiles, deciles, and percentiles. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

In probability theory and statistics, **skewness** is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

The **interquartile mean (IQM)** is a statistical measure of central tendency based on the truncated mean of the interquartile range. The IQM is very similar to the scoring method used in sports that are evaluated by a panel of judges: *discard the lowest and the highest scores; calculate the mean value of the remaining scores*.

In descriptive statistics, a **box plot** or **boxplot** is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes (*whiskers*) indicating variability outside the upper and lower quartiles, hence the terms **box-and-whisker plot** and **box-and-whisker diagram**. Outliers may be plotted as individual points. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution. The spacings between the different parts of the box indicate the degree of dispersion (spread) and skewness in the data, and show outliers. In addition to the points themselves, they allow one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean. Box plots can be drawn either horizontally or vertically. Box plots received their name from the box in the middle.

The **five-number summary** is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles:

- the sample minimum
*(smallest observation)* - the lower quartile or
*first quartile* - the median
- the upper quartile or
*third quartile* - the sample maximum

The **average absolute deviation**, or **mean absolute deviation** (**MAD**), of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, mode, or the result of any other measure of central tendency or any random data point related to the given data set. The absolute values of the differences between the data points and their central tendency are totaled and divided by the number of data points.

In probability theory and statistics, the **coefficient of variation** (**CV**), also known as **relative standard deviation** (**RSD**), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation to the mean . The CV or RSD is widely used in analytical chemistry to express the precision and repeatability of an assay. It is also commonly used in fields such as engineering or physics when doing quality assurance studies and ANOVA gauge R&R. In addition, CV is utilized by economists and investors in economic models.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

**Robust statistics** are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.

The **root-mean-square deviation** (**RMSD**) or **root-mean-square error** (**RMSE**) is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called *residuals* when the calculations are performed over the data sample that was used for estimation and are called *errors* when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.

In statistics, the **median absolute deviation** (**MAD**) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.

In statistics, the **quartile coefficient of dispersion** is a descriptive statistic which measures dispersion and which is used to make comparisons within and between data sets. Since it is based on quantile information, it is less sensitive to outliers than measures such as the Coefficient of variation. As such, it is one of several Robust measures of scale.

In statistics, the **midhinge** is the average of the first and third quartiles and is thus a measure of location. Equivalently, it is the 25% trimmed mid-range or 25% midsummary; it is an L-estimator.

In statistics, an **L-estimator** is an estimator which is an L-statistic – a linear combination of order statistics of the measurements. This can be as little as a single point, as in the median, or as many as all points, as in the mean.

In statistics, a **trimmed estimator** is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers. Trimmed estimators also often have higher efficiency for mixture distributions and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution.

In statistics, a **robust measure of scale** is a robust statistic that quantifies the statistical dispersion in a set of numerical data. The most common such statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional measures of scale, such as sample variance or sample standard deviation, which are non-robust, meaning greatly influenced by outliers.

In statistics, **dispersion** is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.

- ↑ Upton, Graham; Cook, Ian (1996).
*Understanding Statistics*. Oxford University Press. p. 55. ISBN 0-19-914391-9. - ↑ Zwillinger, D., Kokoska, S. (2000)
*CRC Standard Probability and Statistics Tables and Formulae*, CRC Press. ISBN 1-58488-059-7 page 18. - ↑ Rousseeuw, Peter J.; Croux, Christophe (1992). Y. Dodge (ed.). "Explicit Scale Estimators with High Breakdown Point" (PDF).
*L1-Statistical Analysis and Related Methods*. Amsterdam: North-Holland. pp. 77–92. - ↑ Yule, G. Udny (1911).
*An Introduction to the Theory of Statistics*. Charles Griffin and Company. pp. 147–148. - ↑ Weisstein, Eric W. "Quartile Deviation".
*MathWorld*. - 1 2 3 Bertil., Westergren (1988).
*Beta [beta] mathematics handbook : concepts, theorems, methods, algorithms, formulas, graphs, tables*. Studentlitteratur. p. 348. ISBN 9144250517. OCLC 18454776.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.