# Tolerance interval

Last updated

A tolerance interval (TI) is a statistical interval within which, with some confidence level, a specified sampled proportion of a population falls. "More specifically, a 100×p%/100×(1−α) tolerance interval provides limits within which at least a certain proportion (p) of the population falls with a given level of confidence (1−α)." [1] "A (p, 1−α) tolerance interval (TI) based on a sample is constructed so that it would include at least a proportion p of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI." [2] "A (p, 1−α) upper tolerance limit (TL) is simply a 1−α upper confidence limit for the 100 p percentile of the population." [2]

## Calculation

One-sided normal tolerance intervals have an exact solution in terms of the sample mean and sample variance based on the noncentral t-distribution. [3] Two-sided normal tolerance intervals can be obtained based on the noncentral chi-squared distribution. [3]

## Relation to other intervals

"In the parameters-known case, a 95% tolerance interval and a 95% prediction interval are the same." [4] If we knew a population's exact parameters, we would be able to compute a range within which a certain proportion of the population falls. For example, if we know a population is normally distributed with mean ${\displaystyle \mu }$ and standard deviation ${\displaystyle \sigma }$, then the interval ${\displaystyle \mu \pm 1.96\sigma }$ includes 95% of the population (1.96 is the z-score for 95% coverage of a normally distributed population).

However, if we have only a sample from the population, we know only the sample mean ${\displaystyle {\hat {\mu }}}$ and sample standard deviation ${\displaystyle {\hat {\sigma }}}$, which are only estimates of the true parameters. In that case, ${\displaystyle {\hat {\mu }}\pm 1.96{\hat {\sigma }}}$ will not necessarily include 95% of the population, due to variance in these estimates. A tolerance interval bounds this variance by introducing a confidence level ${\displaystyle \gamma }$, which is the confidence with which this interval actually includes the specified proportion of the population. For a normally distributed population, a z-score can be transformed into a "k factor" or tolerance factor [5] for a given ${\displaystyle \gamma }$ via lookup tables or several approximation formulas. [6] "As the degrees of freedom approach infinity, the prediction and tolerance intervals become equal." [7]

The tolerance interval is less widely known than the confidence interval and prediction interval, a situation some educators have lamented, as it can lead to misuse of the other intervals where a tolerance interval is more appropriate. [8] [9]

The tolerance interval differs from a confidence interval in that the confidence interval bounds a single-valued population parameter (the mean or the variance, for example) with some confidence, while the tolerance interval bounds the range of data values that includes a specific proportion of the population. Whereas a confidence interval's size is entirely due to sampling error, and will approach a zero-width interval at the true population parameter as sample size increases, a tolerance interval's size is due partly to sampling error and partly to actual variance in the population, and will approach the population's probability interval as sample size increases. [8] [9]

The tolerance interval is related to a prediction interval in that both put bounds on variation in future samples. However, the prediction interval only bounds a single future sample, whereas a tolerance interval bounds the entire population (equivalently, an arbitrary sequence of future samples). In other words, a prediction interval covers a specified proportion of a population on average, whereas a tolerance interval covers it with a certain confidence level, making the tolerance interval more appropriate if a single interval is intended to bound multiple future samples. [9] [10]

## Examples

[8] gives the following example:

So consider once again a proverbial EPA mileage test scenario, in which several nominally identical autos of a particular model are tested to produce mileage figures ${\displaystyle y_{1},y_{2},...,y_{n}}$. If such data are processed to produce a 95% confidence interval for the mean mileage of the model, it is, for example, possible to use it to project the mean or total gasoline consumption for the manufactured fleet of such autos over their first 5,000 miles of use. Such an interval, would however, not be of much help to a person renting one of these cars and wondering whether the (full) 10-gallon tank of gas will suffice to carry him the 350 miles to his destination. For that job, a prediction interval would be much more useful. (Consider the differing implications of being "95% sure" that ${\displaystyle \mu \geq 35}$ as opposed to being "95% sure" that ${\displaystyle y_{n+1}\geq 35}$.) But neither a confidence interval for ${\displaystyle \mu }$ nor a prediction interval for a single additional mileage is exactly what is needed by a design engineer charged with determining how large a gas tank the model really needs to guarantee that 99% of the autos produced will have a 400-mile cruising range. What the engineer really needs is a tolerance interval for a fraction ${\displaystyle p=.99}$ of mileages of such autos.

Another example is given by: [10]

The air lead levels were collected from ${\displaystyle n=15}$ different areas within the facility. It was noted that the log-transformed lead levels fitted a normal distribution well (that is, the data are from a lognormal distribution. Let ${\displaystyle \mu }$ and ${\displaystyle \sigma ^{2}}$, respectively, denote the population mean and variance for the log-transformed data. If ${\displaystyle X}$ denotes the corresponding random variable, we thus have ${\displaystyle X\sim {\mathcal {N}}(\mu ,\sigma ^{2})}$. We note that ${\displaystyle \exp(\mu )}$ is the median air lead level. A confidence interval for ${\displaystyle \mu }$ can be constructed the usual way, based on the t-distribution; this in turn will provide a confidence interval for the median air lead level. If ${\displaystyle {\bar {X}}}$ and ${\displaystyle S}$ denote the sample mean and standard deviation of the log-transformed data for a sample of size n, a 95% confidence interval for ${\displaystyle \mu }$ is given by ${\displaystyle {\bar {X}}\pm t_{n-1,0.975}S/{\sqrt {n}}}$, where ${\displaystyle t_{m,1-\alpha }}$ denotes the ${\displaystyle 1-\alpha }$ quantile of a t-distribution with ${\displaystyle m}$ degrees of freedom. It may also be of interest to derive a 95% upper confidence bound for the median air lead level. Such a bound for ${\displaystyle \mu }$ is given by ${\displaystyle {\bar {X}}+t_{n-1,0.95}S/{\sqrt {n}}}$. Consequently, a 95% upper confidence bound for the median air lead is given by ${\displaystyle \exp {\left({\bar {X}}+t_{n-1,0.95}S/{\sqrt {n}}\right)}}$. Now suppose we want to predict the air lead level at a particular area within the laboratory. A 95% upper prediction limit for the log-transformed lead level is given by ${\displaystyle {\bar {X}}+t_{n-1,0.95}S{\sqrt {\left(1+1/n\right)}}}$. A two-sided prediction interval can be similarly computed. The meaning and interpretation of these intervals are well known. For example, if the confidence interval ${\displaystyle {\bar {X}}\pm t_{n-1,0.975}S/{\sqrt {n}}}$ is computed repeatedly from independent samples, 95% of the intervals so computed will include the true value of ${\displaystyle \mu }$, in the long run. In other words, the interval is meant to provide information concerning the parameter ${\displaystyle \mu }$ only. A prediction interval has a similar interpretation, and is meant to provide information concerning a single lead level only. Now suppose we want to use the sample to conclude whether or not at least 95% of the population lead levels are below a threshold. The confidence interval and prediction interval cannot answer this question, since the confidence interval is only for the median lead level, and the prediction interval is only for a single lead level. What is required is a tolerance interval; more specifically, an upper tolerance limit. The upper tolerance limit is to be computed subject to the condition that at least 95% of the population lead levels is below the limit, with a certain confidence level, say 99%.

## Related Research Articles

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In probability and statistics, Student's t-distribution is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be k or more standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

The margin of error is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a census of the entire population. The margin of error will be positive whenever a population is incompletely sampled and the outcome measure has positive variance, which is to say, the measure varies.

In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated confidence level; the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used. The confidence level represents the long-run proportion of CIs that theoretically contain the true value of the parameter. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, and which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated where stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable under no constraint other than that it is contained in the distribution's support.

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot quantity need not be a statistic—the function and its value can depend on the parameters of the model, but its distribution must not. If it is a statistic, then it is known as an ancillary statistic.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics.

In probability theory and statistics, the index of dispersion, dispersion index,coefficient of dispersion,relative variance, or variance-to-mean ratio (VMR), like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.

Exact statistics, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size.

In statistics, a generalized p-value is an extended version of the classical p-value, which except in a limited number of applications, provides only approximate solutions.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

## References

1. D. S. Young (2010), Book Reviews: "Statistical Tolerance Regions: Theory, Applications, and Computation", TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1, pp.143-144.
2. Krishnamoorthy, K. and Lian, Xiaodong(2011) 'Closed-form approximate tolerance intervals for some general linear models and comparison studies', Journal of Statistical Computation and Simulation, First published on: 13 June 2011 doi : 10.1080/00949655.2010.545061
3. Derek S. Young (August 2010). "tolerance: An R Package for Estimating Tolerance Intervals". Journal of Statistical Software. 36 (5): 1–39. ISSN   1548-7660 . Retrieved 19 February 2013., p.23
4. Thomas P. Ryan (22 June 2007). Modern Engineering Statistics. John Wiley & Sons. pp. 222–. ISBN   978-0-470-12843-5 . Retrieved 22 February 2013.
5. "Statistical interpretation of data — Part 6: Determination of statistical tolerance intervals". ISO 16269-6. 2014. p. 2.
6. "Tolerance intervals for a normal distribution". Engineering Statistics Handbook. NIST/Sematech. 2010. Retrieved 2011-08-26.
7. De Gryze, S.; Langhans, I.; Vandebroek, M. (2007). "Using the correct intervals for prediction: A tutorial on tolerance intervals for ordinary least-squares regression". Chemometrics and Intelligent Laboratory Systems. 87 (2): 147. doi:10.1016/j.chemolab.2007.03.002.
8. Stephen B. Vardeman (1992). "What about the Other Intervals?". The American Statistician. 46 (3): 193–197. doi:10.2307/2685212. JSTOR   2685212.
9. Mark J. Nelson (2011-08-14). "You might want a tolerance interval" . Retrieved 2011-08-26.
10. K. Krishnamoorthy (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons. pp. 1–6. ISBN   978-0-470-38026-0.
• Hahn, Gerald J.; Meeker, William Q.; Escobar, Luis A. (2017). Statistical Intervals: A Guide for Practitioners and Researchers (2nd ed.). John Wiley & Sons, Incorporated. ISBN   978-0-471-68717-7.