This article needs attention from an expert in Statistics.(November 2010) |

In statistical inference, specifically predictive inference, a **prediction interval** is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

- Introduction
- Normal distribution
- Known mean, known variance
- Estimation of parameters
- Non-parametric methods
- Contrast with other intervals
- Contrast with confidence intervals
- Contrast with tolerance intervals
- Applications
- Regression analysis
- Bayesian statistics
- See also
- Notes
- References
- Further reading

Prediction intervals are used in both frequentist statistics and Bayesian statistics: a prediction interval bears the same relationship to a future observation that a frequentist confidence interval or Bayesian credible interval bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed.

For example, if one makes the parametric assumption that the underlying distribution is a normal distribution, and has a sample set {*X*_{1}, ..., *X*_{n}}, then confidence intervals and credible intervals may be used to estimate the population mean *μ* and population standard deviation *σ* of the underlying population, while prediction intervals may be used to estimate the value of the next sample variable, *X*_{n+1}.

Alternatively, in Bayesian terms, a prediction interval can be described as a credible interval for the variable itself, rather than for a parameter of the distribution thereof.

The concept of prediction intervals need not be restricted to inference about a single future sample value but can be extended to more complicated cases. For example, in the context of river flooding where analyses are often based on annual values of the largest flow within the year, there may be interest in making inferences about the largest flood likely to be experienced within the next 50 years.

Since prediction intervals are only concerned with past and future observations, rather than unobservable population parameters, they are advocated as a better method than confidence intervals by some statisticians, such as Seymour Geisser,^{[ citation needed ]} following the focus on observables by Bruno de Finetti.^{[ citation needed ]}

Given a sample from a normal distribution, whose parameters are unknown, it is possible to give prediction intervals in the frequentist sense, i.e., an interval [*a*, *b*] based on statistics of the sample such that on repeated experiments, *X*_{n+1} falls in the interval the desired percentage of the time; one may call these "predictive confidence intervals".^{ [1] }

A general technique of frequentist prediction intervals is to find and compute a pivotal quantity of the observables *X*_{1}, ..., *X*_{n}, *X*_{n+1} – meaning a function of observables and parameters whose probability distribution does not depend on the parameters – that can be inverted to give a probability of the future observation *X*_{n+1} falling in some interval computed in terms of the observed values so far, Such a pivotal quantity, depending only on observables, is called an ancillary statistic.^{ [2] } The usual method of constructing pivotal quantities is to take the difference of two variables that depend on location, so that location cancels out, and then take the ratio of two variables that depend on scale, so that scale cancels out. The most familiar pivotal quantity is the Student's t-statistic, which can be derived by this method and is used in the sequel.

A prediction interval [*ℓ*,*u*] for a future observation *X* in a normal distribution *N*(*µ*,*σ*^{2}) with known mean and variance may be calculated from

where , the standard score of *X*, is distributed as standard normal.

Hence

or

with *z* the quantile in the standard normal distribution for which:

or equivalently;

Prediction interval | z |
---|---|

75% | 1.15^{ [3] } |

90% | 1.64^{ [3] } |

95% | 1.96^{ [3] } |

99% | 2.58^{ [3] } |

The prediction interval is conventionally written as:

For example, to calculate the 95% prediction interval for a normal distribution with a mean (*µ*) of 5 and a standard deviation (*σ*) of 1, then *z* is approximately 2. Therefore, the lower limit of the prediction interval is approximately 5 ‒ (2·1) = 3, and the upper limit is approximately 5 + (2·1) = 7, thus giving a prediction interval of approximately 3 to 7.

For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function – for example, one could use the sample mean as estimate for *μ* and the sample variance *s*^{2} as an estimate for *σ*^{2}. Note that there are two natural choices for *s*^{2} here – dividing by yields an unbiased estimate, while dividing by *n* yields the maximum likelihood estimator, and either might be used. One then uses the quantile function with these estimated parameters to give a prediction interval.

This approach is usable, but the resulting interval will not have the repeated sampling interpretation^{ [4] } – it is not a predictive confidence interval.

For the sequel, use the sample mean:

and the (unbiased) sample variance:

Given^{ [5] } a normal distribution with unknown mean *μ* but known variance 1, the sample mean of the observations has distribution while the future observation has distribution Taking the difference of these cancels the *μ* and yields a normal distribution of variance thus

Solving for gives the prediction distribution from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100*p*%, then on repeated applications of this computation, the future observation will fall in the predicted interval 100*p*% of the time.

Notice that this prediction distribution is more conservative than using the estimated mean and known variance 1, as this uses variance , hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

Conversely, given a normal distribution with known mean 0 but unknown variance , the sample variance of the observations has, up to scale, a distribution; more precisely:

while the future observation has distribution Taking the ratio of the future observation and the sample standard deviation ^{[ clarification needed ]} cancels the *σ,* yielding a Student's t-distribution with *n* – 1 degrees of freedom:

Solving for gives the prediction distribution from which one can compute intervals as before.

Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation and known mean 0, as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

Combining the above for a normal distribution with both *μ* and *σ*^{2} unknown yields the following ancillary statistic:^{ [6] }

This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution.

Solving for yields the prediction distribution

The probability of falling in a given interval is then:

where *T _{a}* is the 100(1 −

are the endpoints of a 100(1 − *p*)% prediction interval for .

One can compute prediction intervals without any assumptions on the population; formally, this is a non-parametric method.^{ [7] }

Suppose one randomly draws a sample of two observations *X*_{1} and *X*_{2} from a population in which values are assumed to have a continuous probability distribution

- What is the probability that
*X*_{2}>*X*_{1}?

The answer is exactly 50%, *regardless* of the underlying population – the probability of picking 3 and then 7 is the same as picking 7 and then 3, regardless of the particular probability of picking 3 or 7. Thus, if one picks a single sample point *X*_{1}, then 50% of the time the next sample point will be greater, which yields (*X*_{1}, +∞) as a 50% prediction interval for *X*_{2}. Similarly, 50% of the time it will be smaller, which yields another 50% prediction interval for *X*_{2}, namely (−∞, *X*_{1}). Note that the assumption of a continuous distribution avoids the possibility that values might be exactly equal; this would complicate matters.

Similarly, if one has a sample {*X*_{1}, ..., *X*_{n}} then the probability that the next observation *X*_{n+1} will be the largest is 1/(*n* + 1), since all observations have equal probability of being the maximum. In the same way, the probability that *X*_{n+1} will be the smallest is 1/(*n* + 1). The other (*n* − 1)/(*n* + 1) of the time, *X*_{n+1} falls between the sample maximum and sample minimum of the sample {*X*_{1}, ..., *X*_{n}}. Thus, denoting the sample maximum and minimum by *M* and *m,* this yields an (*n* − 1)/(*n* + 1) prediction interval of [*m*, *M*].

For example, if *n* = 19, then [*m*, *M*] gives an 18/20 = 90% prediction interval – 90% of the time, the 20th observation falls between the smallest and largest observation seen heretofore. Likewise, *n* = 39 gives a 95% prediction interval, and *n* = 199 gives a 99% prediction interval.

More generally, if *X*_{(j)} and *X*_{(k)} are order statistics of the sample with *j* < *k* and *j* + *k* = *n* + 1, then [*X*_{(j)}, *X*_{(k)}] is a prediction interval for *X*_{n+1} with coverage probability (significance level) equal to (*n* + 1 − 2*j*) / (*n* + 1).

One can visualize this by drawing the *n* sample points on a line, which divides the line into *n* + 1 sections (*n* − 1 segments between samples, and 2 intervals going to infinity at both ends), and noting that *X*_{n+1} has an equal chance of landing in any of these *n* + 1 sections. Thus one can also pick any *k* of these sections and give a *k*/(*n* + 1) prediction interval (or set, if the sections are not consecutive). For instance, if *n* = 2, then the probability that *X*_{3} will land between the existing two observations is 1/3.

Notice that while this gives the probability that a future observation will fall in a range, it does not give any estimate as to where in a segment it will fall – notably, if it falls outside the range of observed values, it may be far outside the range. See extreme value theory for further discussion. Formally, this applies not just to sampling from a population, but to any exchangeable sequence of random variables, not necessarily independent or identically distributed.

Note that in the formula for the predictive confidence interval *no mention* is made of the unobservable parameters *μ* and *σ* of population mean and standard deviation – the observed *sample* statistics and of sample mean and standard deviation are used, and what is estimated is the outcome of *future* samples.

Rather than using sample statistics as estimators of population parameters and applying confidence intervals to these estimates, one considers "the next sample" as *itself* a statistic, and computes its sampling distribution.

In parameter confidence intervals, one estimates population parameters; if one wishes to interpret this as prediction of the next sample, one models "the next sample" as a draw from this estimated population, using the (estimated) *population* distribution. By contrast, in predictive confidence intervals, one uses the *sampling* distribution of (a statistic of) a sample of *n* or *n* + 1 observations from such a population, and the population distribution is not directly used, though the assumption about its form (though not the values of its parameters) is used in computing the sampling distribution.

This article should include a summary of Tolerance interval.(February 2013) |

Prediction intervals are commonly used as definitions of reference ranges, such as reference ranges for blood tests to give an idea of whether a blood test is normal or not. For this purpose, the most commonly used prediction interval is the 95% prediction interval, and a reference range based on it can be called a *standard reference range*.

A common application of prediction intervals is to regression analysis.

Suppose the data is being modeled by a straight line regression:

where is the response variable, is the explanatory variable, *ε _{i}* is a random error term, and and are parameters.

Given estimates and for the parameters, such as from a simple linear regression, the predicted response value *y*_{d} for a given explanatory value *x*_{d} is

(the point on the regression line), while the actual response would be

The point estimate is called the mean response, and is an estimate of the expected value of *y*_{d},

A prediction interval instead gives an interval in which one expects *y*_{d} to fall; this is not necessary if the actual parameters *α* and *β* are known (together with the error term *ε _{i}*), but if one is estimating from a sample, then one may use the standard error of the estimates for the intercept and slope ( and ), as well as their correlation, to compute a prediction interval.

In regression, Faraway (2002 , p. 39) makes a distinction between intervals for predictions of the mean response vs. for predictions of observed response—affecting essentially the inclusion or not of the unity term within the square root in the expansion factors above; for details, see Faraway (2002).

Seymour Geisser, a proponent of predictive inference, gives predictive applications of Bayesian statistics.^{ [8] }

In Bayesian statistics, one can compute (Bayesian) prediction intervals from the posterior probability of the random variable, as a credible interval. In theoretical work, credible intervals are not often calculated for the prediction of future events, but for inference of parameters – i.e., credible intervals of a parameter, not for the outcomes of the variable itself. However, particularly where applications are concerned with possible extreme values of yet to be observed cases, credible intervals for such values can be of practical importance.

- ↑ Geisser (1993 , p. 6 ): Chapter 2: Non-Bayesian predictive approaches
- ↑ Geisser (1993 , p. 7 )
- 1 2 3 4 Table A2 in Sterne & Kirkwood (2003 , p. 472)
- ↑ Geisser (1993 , p. 8–9 )
- ↑ Geisser (1993 , p. 7– )
- ↑ Geisser (1993 , Example 2.2, p. 9–10 )
- ↑ "Prediction Intervals", Statistics @ SUNY Oswego
- ↑ Geisser (1993)

In probability theory and statistics, **kurtosis** is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes the shape of a probability distribution and there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population. Different measures of kurtosis may have different interpretations.

In probability theory, a **normal****distribution** is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In statistics, the **standard deviation** is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In probability theory and statistics, the **multivariate normal distribution**, **multivariate Gaussian distribution**, or **joint normal distribution** is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be *k*-variate normally distributed if every linear combination of its *k* components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In probability theory, a **log-normal distribution** is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then *Y* = ln(*X*) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, *X* = exp(*Y*), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics.

In probability and statistics, **Student's t-distribution** is any member of a family of continuous probability distributions that arise when estimating the mean of a normally-distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

In statistics, the **mean squared error** (**MSE**) or **mean squared deviation** (**MSD**) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

In statistics, a **confidence interval** (**CI**) is a type of estimate computed from the statistics of the observed data. This gives a range of values for an unknown parameter. The interval has an associated **confidence level** that gives the probability with which the estimated interval will contain the true parameter. The confidence level is chosen by the investigator. For a given estimation in a given sample, using a higher confidence level generates a wider confidence interval. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.

A ** Z-test** is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Z-test tests the mean of a distribution. For each significance level in the confidence interval, the

In statistics and optimization, **errors** and **residuals** are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "theoretical value". The **error** of an observed value is the deviation of the observed value from the (unobservable) *true* value of a quantity of interest, and the **residual** of an observed value is the difference between the observed value and the *estimated* value of the quantity of interest. The distinction is most important in regression analysis, where the concepts are sometimes called the **regression errors** and **regression residuals** and where they lead to the concept of studentized residuals.

An **ancillary statistic** is a measure of a sample whose distribution does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals.

A **tolerance interval** is a statistical interval within which, with some confidence level, a specified proportion of a sampled population falls. "More specifically, a 100×p%/100×(1−α) tolerance interval provides limits within which at least a certain proportion (p) of the population falls with a given level of confidence (1−α)." "A tolerance interval (TI) based on a sample is constructed so that it would include at least a proportion p of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI." "A upper **tolerance limit** (TL) is simply a 1−α upper confidence limit for the 100 p percentile of the population."

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. **Estimation of covariance matrices** then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in **R**^{p×p}; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has normal distribution, the sample covariance matrix has Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

In probability theory and directional statistics, the **von Mises distribution** is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises–Fisher distribution on the *N*-dimensional sphere.

In statistics, a **pivotal quantity** or **pivot** is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot quantity need not be a statistic—the function and its *value* can depend on the parameters of the model, but its *distribution* must not. If it is a statistic, then it is known as an *ancillary statistic.*

In statistics, the **bias** of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called **unbiased**. In statistics, "bias" is an **objective** property of an estimator. Bias can also be measured with respect to the median, rather than the mean, in which case one distinguishes *median*-unbiased from the usual *mean*-unbiasedness property. Bias is a distinct concept from consistency. Consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In probability and statistics, the **truncated normal distribution** is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics. For example, it is used to model the probabilities of the binary outcomes in the probit model and to model censored data in the tobit model.

**Exact statistics**, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size. Exact statistical methods help avoid some of the unreasonable assumptions of traditional statistical methods, such as the assumption of equal variances in classical ANOVA. They also allow exact inference on variance components of mixed models.

In statistics, a **generalized p-value** is an extended version of the classical

In the comparison of various statistical procedures, **efficiency** is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator, experiment, or test needs fewer observations than a less efficient one to achieve a given performance. This article primarily deals with efficiency of estimators.

- Faraway, Julian J. (2002),
*Practical Regression and Anova using R*(PDF) - Geisser, Seymour (1993),
*Predictive Inference*, CRC Press - Sterne, Jonathan; Kirkwood, Betty R. (2003),
*Essential Medical Statistics*, Blackwell Science, ISBN 0-86542-871-9

- Chatfield, C. (1993). "Calculating Interval Forecasts".
*Journal of Business & Economic Statistics*.**11**(2): 121–135. doi:10.2307/1391361. - Lawless, J. F.; Fredette, M. (2005). "Frequentist prediction intervals and predictive distributions".
*Biometrika*.**92**(3): 529–542. doi: 10.1093/biomet/92.3.529 . - Meade, N.; Islam, T. (1995). "Prediction Intervals for Growth Curve Forecasts".
*Journal of Forecasting*.**14**(5): 413–430. doi:10.1002/for.3980140502. - ISO 16269-8 Standard Interpretation of Data, Part 8, Determination of Prediction Intervals

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.