Jackknife resampling

Last updated December 06, 2024

In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size $n$ , a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size $(n-1)$ obtained by omitting one observation.^[1]

The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.^[2]

The jackknife is a linear approximation of the bootstrap.^[2]

A simple example: mean estimation

The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations.

For example, if the parameter to be estimated is the population mean of random variable $x$ , then for a given set of i.i.d. observations $x_{1},...,x_{n}$ the natural estimator is the sample mean:

{\bar {x}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}={\frac {1}{n}}\sum _{i\in [n]}x_{i},

where the last sum used another way to indicate that the index $i$ runs over the set $[n]=\{1,\ldots ,n\}$ .

Then we proceed as follows: For each $i\in [n]$ we compute the mean ${\bar {x}}_{(i)}$ of the jackknife subsample consisting of all but the $i$ -th data point, and this is called the $i$ -th jackknife replicate:

{\bar {x}}_{(i)}={\frac {1}{n-1}}\sum _{j\in [n],j\neq i}x_{j},\quad \quad i=1,\dots ,n.

It could help to think that these $n$ jackknife replicates ${\bar {x}}_{(1)},\ldots ,{\bar {x}}_{(n)}$ give us an approximation of the distribution of the sample mean ${\bar {x}}$ and the larger the $n$ the better this approximation will be. Then finally to get the jackknife estimator we take the average of these $n$ jackknife replicates:

{\bar {x}}_{\mathrm {jack} }={\frac {1}{n}}\sum _{i=1}^{n}{\bar {x}}_{(i)}.

One may ask about the bias and the variance of ${\bar {x}}_{\mathrm {jack} }$ . From the definition of ${\bar {x}}_{\mathrm {jack} }$ as the average of the jackknife replicates one could try to calculate explicitly, and the bias is a trivial calculation but the variance of ${\bar {x}}_{\mathrm {jack} }$ is more involved since the jackknife replicates are not independent.

For the special case of the mean, one can show explicitly that the jackknife estimate equals the usual estimate:

{\frac {1}{n}}\sum _{i=1}^{n}{\bar {x}}_{(i)}={\bar {x}}.

This establishes the identity ${\bar {x}}_{\mathrm {jack} }={\bar {x}}$ . Then taking expectations we get $E[{\bar {x}}_{\mathrm {jack} }]=E[{\bar {x}}]=E[x]$ , so ${\bar {x}}_{\mathrm {jack} }$ is unbiased, while taking variance we get $V[{\bar {x}}_{\mathrm {jack} }]=V[{\bar {x}}]=V[x]/n$ . However, these properties do not generally hold for parameters other than the mean.

This simple example for the case of mean estimation is just to illustrate the construction of a jackknife estimator, while the real subtleties (and the usefulness) emerge for the case of estimating other parameters, such as higher moments than the mean or other functionals of the distribution.

${\bar {x}}_{\mathrm {jack} }$ could be used to construct an empirical estimate of the bias of ${\bar {x}}$ , namely ${\widehat {\operatorname {bias} }}({\bar {x}})_{\mathrm {jack} }=c({\bar {x}}_{\mathrm {jack} }-{\bar {x}})$ with some suitable factor $c>0$ , although in this case we know that ${\bar {x}}_{\mathrm {jack} }={\bar {x}}$ so this construction does not add any meaningful knowledge, but it gives the correct estimation of the bias (which is zero).

A jackknife estimate of the variance of ${\bar {x}}$ can be calculated from the variance of the jackknife replicates ${\bar {x}}_{(i)}$ :^[3]^[4]

{\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }={\frac {n-1}{n}}\sum _{i=1}^{n}({\bar {x}}_{(i)}-{\bar {x}}_{\mathrm {jack} })^{2}={\frac {1}{n(n-1)}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}.

The left equality defines the estimator ${\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }$ and the right equality is an identity that can be verified directly. Then taking expectations we get $E[{\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }]=V[x]/n=V[{\bar {x}}]$ , so this is an unbiased estimator of the variance of ${\bar {x}}$ .

Estimating the bias of an estimator

The jackknife technique can be used to estimate (and correct) the bias of an estimator calculated over the entire sample.

Suppose $\theta$ is the target parameter of interest, which is assumed to be some functional of the distribution of $x$ . Based on a finite set of observations $x_{1},...,x_{n}$ , which is assumed to consist of i.i.d. copies of $x$ , the estimator ${\hat {\theta }}$ is constructed:

{\hat {\theta }}=f_{n}(x_{1},\ldots ,x_{n}).

The value of ${\hat {\theta }}$ is sample-dependent, so this value will change from one random sample to another.

By definition, the bias of ${\hat {\theta }}$ is as follows:

{\text{bias}}({\hat {\theta }})=E[{\hat {\theta }}]-\theta .

One may wish to compute several values of ${\hat {\theta }}$ from several samples, and average them, to calculate an empirical approximation of $E[{\hat {\theta }}]$ , but this is impossible when there are no "other samples" when the entire set of available observations $x_{1},...,x_{n}$ was used to calculate ${\hat {\theta }}$ . In this kind of situation the jackknife resampling technique may be of help.

We construct the jackknife replicates:

{\hat {\theta }}_{(1)}=f_{n-1}(x_{2},x_{3}\ldots ,x_{n})

{\hat {\theta }}_{(2)}=f_{n-1}(x_{1},x_{3},\ldots ,x_{n})

\vdots

{\hat {\theta }}_{(n)}=f_{n-1}(x_{1},x_{2},\ldots ,x_{n-1})

where each replicate is a "leave-one-out" estimate based on the jackknife subsample consisting of all but one of the data points:

{\hat {\theta }}_{(i)}=f_{n-1}(x_{1},\ldots ,x_{i-1},x_{i+1},\ldots ,x_{n})\quad \quad i=1,\dots ,n.

Then we define their average:

{\hat {\theta }}_{\mathrm {jack} }={\frac {1}{n}}\sum _{i=1}^{n}{\hat {\theta }}_{(i)}

The jackknife estimate of the bias of ${\hat {\theta }}$ is given by:

{\widehat {\text{bias}}}({\hat {\theta }})_{\mathrm {jack} }=(n-1)({\hat {\theta }}_{\mathrm {jack} }-{\hat {\theta }})

and the resulting bias-corrected jackknife estimate of $\theta$ is given by:

{\hat {\theta }}_{\text{jack}}^{*}={\hat {\theta }}-{\widehat {\text{bias}}}({\hat {\theta }})_{\mathrm {jack} }=n{\hat {\theta }}-(n-1){\hat {\theta }}_{\mathrm {jack} }.

This removes the bias in the special case that the bias is $O(n^{-1})$ and reduces it to $O(n^{-2})$ in other cases.^[2]

Estimating the variance of an estimator

The jackknife technique can be also used to estimate the variance of an estimator calculated over the entire sample.

Literature

Berger, Y.G. (2007). "A jackknife variance estimator for unistage stratified samples with unequal probabilities". Biometrika . 94 (4): 953–964. doi:10.1093/biomet/asm072.
Berger, Y.G.; Rao, J.N.K. (2006). "Adjusted jackknife for imputation under unequal probability sampling without replacement". Journal of the Royal Statistical Society, Series B . 68 (3): 531–547. doi: 10.1111/j.1467-9868.2006.00555.x .
Berger, Y.G.; Skinner, C.J. (2005). "A jackknife variance estimator for unequal probability sampling". Journal of the Royal Statistical Society, Series B . 67 (1): 79–89. doi:10.1111/j.1467-9868.2005.00489.x.
Jiang, J.; Lahiri, P.; Wan, S-M. (2002). "A unified jackknife theory for empirical best prediction with M-estimation". The Annals of Statistics . 30 (6): 1782–810. doi: 10.1214/aos/1043351257 .
Jones, H.L. (1974). "Jackknife estimation of functions of stratum means". Biometrika . 61 (2): 343–348. doi:10.2307/2334363. JSTOR 2334363.
Kish, L.; Frankel, M.R. (1974). "Inference from complex samples". Journal of the Royal Statistical Society, Series B . 36 (1): 1–37.
Krewski, D.; Rao, J.N.K. (1981). "Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods". The Annals of Statistics . 9 (5): 1010–1019. doi: 10.1214/aos/1176345580 .
Quenouille, M.H. (1956). "Notes on bias in estimation". Biometrika . 43 (3–4): 353–360. doi:10.1093/biomet/43.3-4.353.
Rao, J.N.K.; Shao, J. (1992). "Jackknife variance estimation with survey data under hot deck imputation". Biometrika . 79 (4): 811–822. doi:10.1093/biomet/79.4.811.
Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). "Some recent work on resampling methods for complex surveys". Survey Methodology . 18 (2): 209–217.
Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer-Verlag, Inc.
Tukey, J.W. (1958). "Bias and confidence in not-quite large samples (abstract)". The Annals of Mathematical Statistics . 29 (2): 614.
Wu, C.F.J. (1986). "Jackknife, Bootstrap and other resampling methods in regression analysis". The Annals of Statistics . 14 (4): 1261–1295. doi: 10.1214/aos/1176350142 .

Notes

↑ Efron 1982, p. 2.
1 2 3 Cameron & Trivedi 2005, p. 375.
↑ Efron 1982, p. 14.
↑ McIntosh, Avery I. "The Jackknife Estimation Method" (PDF). Boston University. Avery I. McIntosh. Archived from the original (PDF) on 2016-05-14. Retrieved 2016-04-30.: p. 3.

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It is also known as Fréchet-Cramér–Rao or Fréchet-Darmois-Cramér-Rao lower bound. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:

In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation. The "M" initial stands for "maximum likelihood-type".

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

Bootstrapping is a procedure for estimating the distribution of an estimator by resampling one's data or a model estimated from the data. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased.

In statistics and in particular statistical theory, unbiased estimation of a standard deviation is the calculation from a statistical sample of an estimated value of the standard deviation of a population of values, in such a way that the expected value of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of significance tests and confidence intervals, or by using Bayesian analysis.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

Minimum-distance estimation (MDE) is a conceptual method for fitting a statistical model to data, usually the empirical distribution. Often-used estimators such as ordinary least squares can be thought of as special cases of minimum-distance estimation.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

The ratio estimator is a statistical estimator for the ratio of means of two random variables. Ratio estimates are biased and corrections must be made when they are used in experimental or survey work. The ratio estimates are asymmetrical and symmetrical tests such as the t test should not be used to generate confidence intervals.

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

In statistics, the Innovation Method provides an estimator for the parameters of stochastic differential equations given a time series of observations of the state variables. In the framework of continuous-discrete state space models, the innovation estimator is obtained by maximizing the log-likelihood of the corresponding discrete-time innovation process with respect to the parameters. The innovation estimator can be classified as a M-estimator, a quasi-maximum likelihood estimator or a prediction error estimator depending on the inferential considerations that want to be emphasized. The innovation method is a system identification technique for developing mathematical models of dynamical systems from measured data and for the optimal design of experiments.

References

Cameron, Adrian; Trivedi, Pravin K. (2005). Microeconometrics : methods and applications. Cambridge New York: Cambridge University Press. ISBN 9780521848053.
Efron, Bradley; Stein, Charles (May 1981). "The Jackknife Estimate of Variance". The Annals of Statistics. 9 (3): 586–596. doi: 10.1214/aos/1176345462 . JSTOR 2240822.
Efron, Bradley (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia, PA: Society for Industrial and Applied Mathematics. ISBN 9781611970319.
Quenouille, Maurice H. (September 1949). "Problems in Plane Sampling". The Annals of Mathematical Statistics. 20 (3): 355–375. doi: 10.1214/aoms/1177729989 . JSTOR 2236533.
Quenouille, Maurice H. (1956). "Notes on Bias in Estimation". Biometrika. 43 (3–4): 353–360. doi:10.1093/biomet/43.3-4.353. JSTOR 2332914.
Tukey, John W. (1958). "Bias and confidence in not quite large samples (abstract)". The Annals of Mathematical Statistics. 29 (2): 614. doi: 10.1214/aoms/1177706647 .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTEEfron19822-1] Efron 1982, p. 2.

[FOOTNOTECameronTrivedi2005375-2] 1 2 3 Cameron & Trivedi 2005, p. 375.

[FOOTNOTEEfron198214-3] Efron 1982, p. 14.

[4] McIntosh, Avery I. "The Jackknife Estimation Method" (PDF). Boston University. Avery I. McIntosh. Archived from the original (PDF) on 2016-05-14. Retrieved 2016-04-30.: p. 3.

[1]

[2]

[3]

[4]