This article needs additional citations for verification .(May 2012) |

In descriptive statistics, the **interquartile range** (**IQR**) is a measure of statistical dispersion, which is the spread of the data.^{ [1] } The IQR may also be called the **midspread**, **middle 50%**, or **H‑spread.** It is defined as the difference between the 75th and 25th percentiles of the data.^{ [2] }^{ [3] }^{ [4] } To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation.^{ [1] } These quartiles are denoted by *Q*_{1} (also called the lower quartile), *Q*_{2} (the median), and *Q*_{3} (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = *Q*_{3} − *Q*_{1}.^{ [1] }

- Use
- Algorithm
- Discrete Variables
- Continuous Variables
- Examples
- Data set in a table
- Data set in a plain-text box plot
- Distributions
- Outliers
- See also
- References
- External links

The IQR is an example of a trimmed estimator, defined as the 25% trimmed range, which enhances the accuracy of dataset statistics by dropping lower contribution, outlying points.^{ [5] } It is also used as a robust measure of scale ^{ [5] } It can be clearly visualized by the box on a Box plot.^{ [1] }

The primary use of the IQR is to represent the difference between the upper and lower quartiles of a data set. This can be used as an indicator for variability of the dataset.^{ [1] }

It is also used to build box plots, which are a graphical representation of probability distribution. In the box plot, the IQR is the height of the box itself, and the whiskers have a length of 1.5*IQR.^{ [1] } Any data point located outside of the whiskers is referred to as an outlier (see below).^{ [1] }

IQR is often used as a preferred measurement or variability to total range or median absolute deviation because it has a lower breakdown point: 25% compared to MAD's 50%.^{ [6] }

The IQR has been practically used in a number of recent studies. Some of these uses include:

- Sampling for Design Space Exploration
^{ [7] } - Predicting Stock Returns
^{ [8] } - Image Denoising
^{ [9] }

The IQR of a set of values is calculated as the difference between the upper and lower quartiles, *Q*_{3} and *Q*_{1}. Each quartile is a median calculated as follows.

Given an even *2n* or odd *2n+1* number of values:

*first quartile Q*= median of the_{1}*n*smallest values;*third quartile Q*= median of the_{3}*n*largest values.^{ [10] }

The *second quartile Q _{2}* is the same as the ordinary median.

The interquartile range of a continuous distribution can be calculated by integrating the probability density function over specific intervals. The lower quartile, *Q*_{1}, is a number such that integral of the PDF from -∞ to *Q*_{1} equals 0.25, while the upper quartile, *Q*_{3}, is such a number that the integral from -∞ to *Q*_{3} equals 0.75.^{ [1] }

In terms of the CDF, the quartiles can be defined as follows: where CDF^{−1} is the quantile function.^{ [1] }

The following table has 13 rows, and follows the rules for the odd number of entries.

i | x[i] | Median | Quartile |
---|---|---|---|

1 | 7 | Q_{2} = 87(median of whole table) | Q_{1} = 31(median of upper half, from row 1 to 6) |

2 | 7 | ||

3 | 31 | ||

4 | 31 | ||

5 | 47 | ||

6 | 75 | ||

7 | 87 | ||

8 | 115 | ||

Q_{3} = 119(median of lower half, from row 8 to 13) | |||

9 | 116 | ||

10 | 119 | ||

11 | 119 | ||

12 | 155 | ||

13 | 177 |

For the data in this table the interquartile range is IQR = *Q*_{3}−*Q*_{1} = 119 - 31 = 88.

+−−−−−+−+ * |−−−−−−−−−−−| | |−−−−−−−−−−−| +−−−−−+−+ +−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+ number line 0 1 2 3 4 5 6 7 8 9 10 11 12

For the data set in this box plot:

- lower (first) quartile
*Q*_{1}= 7 - median (second quartile)
*Q*_{2}= 8.5 - upper (third) quartile
*Q*_{3}= 9 - interquartile range, IQR =
*Q*_{3}-*Q*_{1}= 2 - lower 1.5*IQR whisker =
*Q*_{1}- 1.5 * IQR = 7 - 3 = 4. (If there is no data point at 4, then the lowest point greater than 4.) - upper 1.5*IQR whisker =
*Q*_{3}+ 1.5 * IQR = 9 + 3 = 12. (If there is no data point at 12, then the highest point less than 12.)

This means the 1.5*IQR whiskers can be uneven in lengths. The median, minimum, maximum, and the first and third quartile constitute the Five-number summary.^{ [1] }^{ [11] }

The interquartile range and median of some common distributions are shown below:

Distribution | Median | IQR |
---|---|---|

Normal | μ | 2 Φ^{−1}(0.75)σ ≈ 1.349σ ≈ (27/20)σ |

Laplace | μ | 2b ln(2) ≈ 1.386b |

Cauchy | μ | 2γ |

If both the median and mean of a distribution fall inside the interquartile range, the distribution is considered to be reasonably symmetrical.^{ [12] }

The interquartile range is often used to find outliers in data. A fence is used to identify and categorize types of outliers from the data, or on a box plot.^{ [13] } There are four relevant fences:

- Lower Inner Fence:
*Q*_{1}- 1.5 * IQR - Upper Inner Fence:
*Q*_{3}+ 1.5 * IQR - Lower Outer Fence:
*Q*_{1}- 3 * IQR - Upper Outer Fence:
*Q*_{3}+ 3 * IQR

Any data points that fall between the inner and outer fences are called mild outliers. Points that fall beyond the outer fences are called extreme outliers.^{ [13] }

A **histogram** is an approximate representation of the distribution of numerical data. It was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often of equal size.

In statistics, a **quartile** is a type of quantile which divides the number of data points into four parts, or *quarters*, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows:

In statistics and probability, **quantiles** are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Common quantiles have special names, such as *quartiles*, *deciles*, and *percentiles*. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

The **interquartile mean (IQM)** is a statistical measure of central tendency based on the truncated mean of the interquartile range. The IQM is very similar to the scoring method used in sports that are evaluated by a panel of judges: *discard the lowest and the highest scores; calculate the mean value of the remaining scores*.

In statistics, an **outlier** is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses.

In descriptive statistics, a **box plot** or **boxplot** is a method for graphically demonstrating the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also termed as the **box-and-whisker plot** and the **box-and-whisker diagram**. Outliers that differ significantly from the rest of the dataset may be plotted as individual points beyond the whiskers on the box-plot. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution. The spacings in each subsection of the box-plot indicate the degree of dispersion (spread) and skewness of the data, which are usually described using the five-number summary. In addition, the box-plot allows one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean. Box plots can be drawn either horizontally or vertically.

The **five-number summary** is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles:

- the sample minimum
*(smallest observation)* - the lower quartile or
*first quartile* - the median
- the upper quartile or
*third quartile* - the sample maximum

In statistics, a *k*-th**percentile**, denoted , is a score *below which* a given percentage *k* of scores in its frequency distribution falls or a score *at or below which* a given percentage falls. For example, the 50th percentile is the score below which (exclusive) or at or below which (inclusive) 50% of the scores in the distribution may be found. Percentiles are expressed in the same unit of measurement as the input scores; for example, if the scores refer to human weight, the corresponding percentiles will be expressed in kilograms or pounds.

A **truncated mean** or **trimmed mean** is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points.

In statistics the **trimean** (**TM**), or **Tukey's trimean**, is a measure of a probability distribution's location defined as a weighted average of the distribution's median and its two quartiles:

In statistics, the **quartile coefficient of dispersion** is a descriptive statistic which measures dispersion and which is used to make comparisons within and between data sets. Since it is based on quantile information, it is less sensitive to outliers than measures such as the Coefficient of variation. As such, it is one of several Robust measures of scale.

In statistics, the **midhinge** is the average of the first and third quartiles and is thus a measure of location. Equivalently, it is the 25% trimmed mid-range or 25% midsummary; it is an L-estimator.

In statistics, an **L-estimator** is an estimator which is a linear combination of order statistics of the measurements. This can be as little as a single point, as in the median, or as many as all points, as in the mean.

In descriptive statistics, the **seven-number summary** is a collection of seven summary statistics, and is an extension of the five-number summary. There are two similar, common forms.

In statistics, a **trimmed estimator** is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers. Trimmed estimators also often have higher efficiency for mixture distributions and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution.

A **plot** is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

In statistics, **robust measures of scale** are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the *interquartile range* (IQR) and the *median absolute deviation* (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample variance or standard deviation, which are greatly influenced by outliers.

In statistics, **dispersion** is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered.

In statistical graphics, the **functional boxplot** is an informative exploratory tool that has been proposed for visualizing functional data. Analogous to the classical boxplot, the descriptive statistics of a functional boxplot are: the envelope of the 50% central region, the median curve and the maximum non-outlying envelope.

In statistical graphics and scientific visualization, the **contour boxplot** is an exploratory tool that has been proposed for visualizing ensembles of feature-sets determined by a threshold on some scalar function. Analogous to the classical boxplot and considered an expansion of the concepts defining functional boxplot, the descriptive statistics of a contour boxplot are: the envelope of the 50% central region, the median curve and the maximum non-outlying envelope.

- 1 2 3 4 5 6 7 8 9 10 Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005).
*A Modern Introduction to Probability and Statistics*. Springer Texts in Statistics. London: Springer London. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1. - ↑ Upton, Graham; Cook, Ian (1996).
*Understanding Statistics*. Oxford University Press. p. 55. ISBN 0-19-914391-9. - ↑ Zwillinger, D., Kokoska, S. (2000)
*CRC Standard Probability and Statistics Tables and Formulae*, CRC Press. ISBN 1-58488-059-7 page 18. - ↑ Ross, Sheldon (2010).
*Introductory Statistics*. Burlington, MA: Elsevier. pp. 103–104. ISBN 978-0-12-374388-6. - 1 2 Kaltenbach, Hans-Michael (2012).
*A concise guide to statistics*. Heidelberg: Springer. ISBN 978-3-642-23502-3. OCLC 763157853. - ↑ Rousseeuw, Peter J.; Croux, Christophe (1992). Y. Dodge (ed.). "Explicit Scale Estimators with High Breakdown Point" (PDF).
*L1-Statistical Analysis and Related Methods*. Amsterdam: North-Holland. pp. 77–92. - ↑ Zhang, Yiming; Kim, Nam H.; Haftka, Raphael T. (2019-11-20). "General-Surrogate Adaptive Sampling Using Interquartile Range for Design Space Exploration".
*Journal of Mechanical Design*.**142**(5). doi:10.1115/1.4044432. ISSN 1050-0472. - ↑ Dai, Zhifeng and Xiaomin Chang. “Predicting Stock Return with Economic Constraint: Can Interquartile Range Truncate the Outliers?” (2021).
- ↑ Ajil, Jassim, Firas (2013-02-05).
*Image Denoising Using Interquartile Range Filter with Local Averaging*. OCLC 1106182050. - 1 2 Bertil., Westergren (1988).
*Beta [beta] mathematics handbook : concepts, theorems, methods, algorithms, formulas, graphs, tables*. Studentlitteratur. p. 348. ISBN 9144250517. OCLC 18454776. - ↑ Tukey, J.W. "Exploratory data analysis".
*Addison-Wesley, Reading, 1977*. - ↑ Whitley, Elise; Ball, Jonathan (2002). "Statistics review 1: Presenting and summarising data".
*Critical Care*.**6**(1): 66–71. ISSN 1364-8535. PMID 11940268. - 1 2 "NIST/SEMATECH e-Handbook of Statistical Methods".
*www.itl.nist.gov*. doi:10.18434/m32189 . Retrieved 2021-12-14.

- Media related to Interquartile range at Wikimedia Commons

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.