Minimum-variance unbiased estimator

Last updated May 22, 2023

In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.

While combining the constraint of unbiasedness with the desirability metric of least variance leads to good results in most practical settings—making MVUE a natural starting point for a broad range of analyses—a targeted specification may perform better for a given problem; thus, MVUE is not always the best stopping point.

Definition

Consider estimation of $g(\theta )$ based on data $X_{1},X_{2},\ldots ,X_{n}$ i.i.d. from some member of a family of densities $p_{\theta },\theta \in \Omega$ , where $\Omega$ is the parameter space. An unbiased estimator $\delta (X_{1},X_{2},\ldots ,X_{n})$ of $g(\theta )$ is UMVUE if $\forall \theta \in \Omega$ ,

\operatorname {var} (\delta (X_{1},X_{2},\ldots ,X_{n}))\leq \operatorname {var} ({\tilde {\delta }}(X_{1},X_{2},\ldots ,X_{n}))

for any other unbiased estimator ${\tilde {\delta }}.$

If an unbiased estimator of $g(\theta )$ exists, then one can prove there is an essentially unique MVUE.^[1] Using the Rao–Blackwell theorem one can also prove that determining the MVUE is simply a matter of finding a complete sufficient statistic for the family $p_{\theta },\theta \in \Omega$ and conditioning any unbiased estimator on it.

Further, by the Lehmann–Scheffé theorem, an unbiased estimator that is a function of a complete, sufficient statistic is the UMVUE estimator.

Put formally, suppose $\delta (X_{1},X_{2},\ldots ,X_{n})$ is unbiased for $g(\theta )$ , and that $T$ is a complete sufficient statistic for the family of densities. Then

\eta (X_{1},X_{2},\ldots ,X_{n})=\operatorname {E} (\delta (X_{1},X_{2},\ldots ,X_{n})\mid T)\,

is the MVUE for $g(\theta ).$

A Bayesian analog is a Bayes estimator, particularly with minimum mean square error (MMSE).

Estimator selection

An efficient estimator need not exist, but if it does and if it is unbiased, it is the MVUE. Since the mean squared error (MSE) of an estimator δ is

\operatorname {MSE} (\delta )=\operatorname {var} (\delta )+[\operatorname {bias} (\delta )]^{2}\

the MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator; see estimator bias.

Example

Consider the data to be a single observation from an absolutely continuous distribution on $\mathbb {R}$ with density

p_{\theta }(x)={\frac {\theta e^{-x}}{(1+e^{-x})^{\theta +1}}}

and we wish to find the UMVU estimator of

g(\theta )={\frac {1}{\theta ^{2}}}

First we recognize that the density can be written as

{\frac {e^{-x}}{1+e^{-x}}}\exp(-\theta \log(1+e^{-x})+\log(\theta ))

Which is an exponential family with sufficient statistic $T=\log(1+e^{-x})$ . In fact this is a full rank exponential family, and therefore $T$ is complete sufficient. See exponential family for a derivation which shows

\operatorname {E} (T)={\frac {1}{\theta }},\quad \operatorname {var} (T)={\frac {1}{\theta ^{2}}}

Therefore,

\operatorname {E} (T^{2})={\frac {2}{\theta ^{2}}}

Here we use Lehmann–Scheffé theorem to get the MVUE

Clearly $\delta (X)={\frac {T^{2}}{2}}$ is unbiased and $T=\log(1+e^{-x})$ is complete sufficient, thus the UMVU estimator is

\eta (X)=\operatorname {E} (\delta (X)\mid T)=\operatorname {E} \left(\left.{\frac {T^{2}}{2}}\,\right|\,T\right)={\frac {T^{2}}{2}}={\frac {\log(1+e^{-X})^{2}}{2}}

This example illustrates that an unbiased function of the complete sufficient statistic will be UMVU, as Lehmann–Scheffé theorem states.

Other examples

For a normal distribution with unknown mean and variance, the sample mean and (unbiased) sample variance are the MVUEs for the population mean and population variance.
However, the sample standard deviation is not unbiased for the population standard deviation – see unbiased estimation of standard deviation.
Further, for other distributions the sample mean and sample variance are not in general MVUEs – for a uniform distribution with unknown upper and lower bounds, the mid-range is the MVUE for the population mean.
If k exemplars are chosen (without replacement) from a discrete uniform distribution over the set {1, 2, ..., N} with unknown upper bound N, the MVUE for N is

{\frac {k+1}{k}}m-1,

where m is the sample maximum. This is a scaled and shifted (so unbiased) transform of the sample maximum, which is a sufficient and complete statistic. See German tank problem for details.

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

In statistics, the Lehmann–Scheffé theorem is a prominent statement, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator which is unbiased for a given unknown quantity and that depends on the data only through a complete, sufficient statistic is the unique best unbiased estimator of that quantity. The Lehmann–Scheffé theorem is named after Erich Leo Lehmann and Henry Scheffé, given their two early papers.

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the inverse of the Fisher information is a lower bound on its variance.

In statistics, the score is the gradient of the log-likelihood function with respect to the parameter vector. Evaluated at a particular point of the parameter vector, the score indicates the steepness of the log-likelihood function and thereby the sensitivity to infinitesimal changes to the parameter values. If the log-likelihood function is continuous over the parameter space, the score will vanish at a local maximum or minimum; this fact is used in maximum likelihood estimation to find the parameter values that maximize the likelihood function.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ₀ converges to one.

<span class="mw-page-title-main">Empirical distribution function</span> Distribution function associated with the empirical measure of a sample

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by $1/ n$ at each of the $n$ data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In statistics, the jackknife is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size $, a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size obtained by omitting one observation.$

In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities. Strictly speaking, "invariant" would mean that the estimates themselves are unchanged when both the measurements and the parameters are transformed in a compatible way, but the meaning has been extended to allow the estimates to change in appropriate ways with such transformations. The term equivariant estimator is used in formal mathematical contexts that include a precise description of the relation of the way the estimator changes in response to changes to the dataset and parameterisation: this corresponds to the use of "equivariance" in more general mathematics.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

Location estimation in wireless sensor networks is the problem of estimating the location of an object from a set of noisy measurements. These measurements are acquired in a distributed manner by a set of sensors.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

References

↑ Lee, A. J., 1946- (1990). U-statistics : theory and practice. New York: M. Dekker. ISBN 0824782534. OCLC 21523971.{{cite book}}: CS1 maint: multiple names: authors list (link)

Keener, Robert W. (2006). Statistical Theory: Notes for a Course in Theoretical Statistics. Springer. pp. 47–48, 57–58.
Keener, Robert W. (2010). Theoretical statistics: Topics for a core course. New York: Springer. DOI 10.1007/978-0-387-93839-4
Voinov V. G., Nikulin M.S. (1993). Unbiased estimators and their applications, Vol.1: Univariate case. Kluwer Academic Publishers. pp. 521p.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Lee, A. J., 1946- (1990). U-statistics : theory and practice. New York: M. Dekker. ISBN 0824782534. OCLC 21523971.{{cite book}}: CS1 maint: multiple names: authors list (link)

[1]