# Minimum-variance unbiased estimator

Last updated

In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.

## Contents

For practical statistics problems, it is important to determine the MVUE if one exists, since less-than-optimal procedures would naturally be avoided, other things being equal. This has led to substantial development of statistical theory related to the problem of optimal estimation.

While combining the constraint of unbiasedness with the desirability metric of least variance leads to good results in most practical settings—making MVUE a natural starting point for a broad range of analyses—a targeted specification may perform better for a given problem; thus, MVUE is not always the best stopping point.

## Definition

Consider estimation of $g(\theta )$ based on data $X_{1},X_{2},\ldots ,X_{n}$ i.i.d. from some member of a family of densities $p_{\theta },\theta \in \Omega$ , where $\Omega$ is the parameter space. An unbiased estimator $\delta (X_{1},X_{2},\ldots ,X_{n})$ of $g(\theta )$ is UMVUE if $\forall \theta \in \Omega$ ,

$\operatorname {var} (\delta (X_{1},X_{2},\ldots ,X_{n}))\leq \operatorname {var} ({\tilde {\delta }}(X_{1},X_{2},\ldots ,X_{n}))$ for any other unbiased estimator ${\tilde {\delta }}.$ If an unbiased estimator of $g(\theta )$ exists, then one can prove there is an essentially unique MVUE.  Using the Rao–Blackwell theorem one can also prove that determining the MVUE is simply a matter of finding a complete sufficient statistic for the family $p_{\theta },\theta \in \Omega$ and conditioning any unbiased estimator on it.

Further, by the Lehmann–Scheffé theorem, an unbiased estimator that is a function of a complete, sufficient statistic is the UMVUE estimator.

Put formally, suppose $\delta (X_{1},X_{2},\ldots ,X_{n})$ is unbiased for $g(\theta )$ , and that $T$ is a complete sufficient statistic for the family of densities. Then

$\eta (X_{1},X_{2},\ldots ,X_{n})=\operatorname {E} (\delta (X_{1},X_{2},\ldots ,X_{n})\mid T)\,$ is the MVUE for $g(\theta ).$ A Bayesian analog is a Bayes estimator, particularly with minimum mean square error (MMSE).

## Estimator selection

An efficient estimator need not exist, but if it does and if it is unbiased, it is the MVUE. Since the mean squared error (MSE) of an estimator δ is

$\operatorname {MSE} (\delta )=\operatorname {var} (\delta )+[\operatorname {bias} (\delta )]^{2}\$ the MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator; see estimator bias.

## Example

Consider the data to be a single observation from an absolutely continuous distribution on $\mathbb {R}$ with density

$p_{\theta }(x)={\frac {\theta e^{-x}}{(1+e^{-x})^{\theta +1}}}$ and we wish to find the UMVU estimator of

$g(\theta )={\frac {1}{\theta ^{2}}}$ First we recognize that the density can be written as

${\frac {e^{-x}}{1+e^{-x}}}\exp(-\theta \log(1+e^{-x})+\log(\theta ))$ Which is an exponential family with sufficient statistic $T=\log(1+e^{-x})$ . In fact this is a full rank exponential family, and therefore $T$ is complete sufficient. See exponential family for a derivation which shows

$\operatorname {E} (T)={\frac {1}{\theta }},\quad \operatorname {var} (T)={\frac {1}{\theta ^{2}}}$ Therefore,

$\operatorname {E} (T^{2})={\frac {2}{\theta ^{2}}}$ Here we use Lehmann–Scheffé theorem to get the MVUE

Clearly $\delta (X)={\frac {T^{2}}{2}}$ is unbiased and $T=\log(1+e^{-x})$ is complete sufficient, thus the UMVU estimator is

$\eta (X)=\operatorname {E} (\delta (X)\mid T)=\operatorname {E} \left(\left.{\frac {T^{2}}{2}}\,\right|\,T\right)={\frac {T^{2}}{2}}={\frac {\log(1+e^{-X})^{2}}{2}}$ This example illustrates that an unbiased function of the complete sufficient statistic will be UMVU, as Lehmann–Scheffé theorem states.

## Other examples

${\frac {k+1}{k}}m-1,$ where m is the sample maximum. This is a scaled and shifted (so unbiased) transform of the sample maximum, which is a sufficient and complete statistic. See German tank problem for details.

## Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it ensures that the distributions corresponding to different values of the parameters are distinct.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

In statistics, the Lehmann–Scheffé theorem is a prominent statement, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator which is unbiased for a given unknown quantity and that depends on the data only through a complete, sufficient statistic is the unique best unbiased estimator of that quantity. The Lehmann–Scheffé theorem is named after Erich Leo Lehmann and Henry Scheffé, given their two early papers.

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors of different magnitudes. The most common choice of the loss function is quadratic, resulting in the mean squared error criterion of optimality.

In estimation theory and statistics, the Cramér–Rao bound (CRB), Cramér–Rao lower bound (CRLB), Cramér–Rao inequality, Fréchet–Darmois–Cramér–Rao inequality, or information inequality expresses a lower bound on the variance of unbiased estimators of a deterministic parameter. This term is named in honor of Harald Cramér, Calyampudi Radhakrishna Rao, Maurice Fréchet and Georges Darmois all of whom independently derived this limit to statistical precision in the 1940s.

In statistics, the score is the gradient of the log-likelihood function with respect to the parameter vector. Evaluated at a particular point of the parameter vector, the score indicates the steepness of the log-likelihood function and thereby the sensitivity to infinitesimal changes to the parameter values. If the log-likelihood function is continuous over the parameter space, the score will vanish at a local maximum or minimum; this fact is used in maximum likelihood estimation to find the parameter values that maximize the likelihood function.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior. The role of the Fisher information in the asymptotic theory of maximum-likelihood estimation was emphasized by the statistician Ronald Fisher. The Fisher information is also used in the calculation of the Jeffreys prior, which is used in Bayesian statistics. In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator.

In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems. However, it is usually more difficult to compute.

In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities. Strictly speaking, "invariant" would mean that the estimates themselves are unchanged when both the measurements and the parameters are transformed in a compatible way, but the meaning has been extended to allow the estimates to change in appropriate ways with such transformations. The term equivariant estimator is used in formal mathematical contexts that include a precise description of the relation of the way the estimator changes in response to changes to the dataset and parameterisation: this corresponds to the use of "equivariance" in more general mathematics. In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCA is used for estimating the unknown regression coefficients in a standard linear regression model.

Location estimation in wireless sensor networks is the problem of estimating the location of an object from a set of noisy measurements. These measurements are acquired in a distributed manner by a set of sensors. In statistics, maximum spacing estimation, or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In the comparison of various statistical procedures, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator, experiment, or test needs fewer observations than a less efficient one to achieve a given performance. This article primarily deals with efficiency of estimators. In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

1. Lee, A. J., 1946- (1990). U-statistics : theory and practice. New York: M. Dekker. ISBN   0824782534. OCLC   21523971.CS1 maint: multiple names: authors list (link)
• Keener, Robert W. (2006). Statistical Theory: Notes for a Course in Theoretical Statistics. Springer. pp. 47–48, 57–58.
• Voinov V. G., Nikulin M.S. (1993). Unbiased estimators and their applications, Vol.1: Univariate case. Kluwer Academic Publishers. pp. 521p.