# Efficiency (statistics)

Last updated

In the comparison of various statistical procedures, efficiency is a measure of quality of an estimator, of an experimental design, [1] or of a hypothesis testing procedure. [2] Essentially, a more efficient estimator, experiment, or test needs fewer observations than a less efficient one to achieve a given error performance. An efficient estimator is characterized by a small variance or mean square error, indicating that there is a small deviance between the estimated value and the "true" value. [1]

## Contents

The relative efficiency of two procedures is the ratio of their efficiencies, although often this concept is used where the comparison is made between a given procedure and a notional "best possible" procedure. The efficiencies and the relative efficiency of two procedures theoretically depend on the sample size available for the given procedure, but it is often possible to use the asymptotic relative efficiency (defined as the limit of the relative efficiencies as the sample size grows) as the principal comparison measure.

## Estimators

The efficiency of an unbiased estimator, T, of a parameter θ is defined as [3]

${\displaystyle e(T)={\frac {1/{\mathcal {I}}(\theta )}{\operatorname {var} (T)}}}$

where ${\displaystyle {\mathcal {I}}(\theta )}$ is the Fisher information of the sample. Thus e(T) is the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér–Rao bound can be used to prove that e(T) ≤ 1.

### Efficient estimators

An efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors of different magnitudes. The most common choice of the loss function is quadratic, resulting in the mean squared error criterion of optimality. [4]

In general, the spread of an estimator around the parameter θ is a measure of estimator efficiency and performance. This performance can be calculated by finding the mean squared error. More formally, let T be an estimator for the parameter θ. The mean squared error of T is the value ${\displaystyle \operatorname {MSE} (T)=E[(T-\theta )^{2}]}$, which can be decomposed as a sum of its variance and bias:

{\displaystyle {\begin{aligned}\operatorname {MSE} (T)&=\operatorname {E} [(T-\theta )^{2}]=\operatorname {E} [(T-\operatorname {E} [T]+\operatorname {E} [T]-\theta )^{2}]\\[5pt]&=\operatorname {E} [(T-\operatorname {E} [T])^{2}]+2E[T-E[T]](\operatorname {E} [T]-\theta )+(\operatorname {E} [T]-\theta )^{2}\\[5pt]&=\operatorname {var} (T)+(\operatorname {E} [T]-\theta )^{2}\end{aligned}}}

An estimator T1 performs better than an estimator T2 if ${\displaystyle \operatorname {MSE} (T_{1})<\operatorname {MSE} (T_{2})}$. [5] For a more specific case, if T1 and T2are two unbiased estimators for the same parameter θ, then the variance can be compared to determine performance. In this case, T2 is more efficient than T1 if the variance of T2 is smaller than the variance of T1, i.e. ${\displaystyle \operatorname {var} (T_{1})>\operatorname {var} (T_{2})}$ for all values of θ. This relationship can be determined by simplifying the more general case above for mean squared error; since the expected value of an unbiased estimator is equal to the parameter value, ${\displaystyle \operatorname {E} [T]=\theta }$. Therefore, for an unbiased estimator, ${\displaystyle \operatorname {MSE} (T)=\operatorname {var} (T)}$, as the ${\displaystyle (\operatorname {E} [T]-\theta )^{2}}$ term drops out for being equal to 0. [5]

If an unbiased estimator of a parameter θ attains ${\displaystyle e(T)=1}$ for all values of the parameter, then the estimator is called efficient. [3]

Equivalently, the estimator achieves equality in the Cramér–Rao inequality for all θ. The Cramér–Rao lower bound is a lower bound of the variance of an unbiased estimator, representing the "best" an unbiased estimator can be.

An efficient estimator is also the minimum variance unbiased estimator (MVUE). This is because an efficient estimator maintains equality on the Cramér–Rao inequality for all parameter values, which means it attains the minimum variance for all parameters (the definition of the MVUE). The MVUE estimator, even if it exists, is not necessarily efficient, because "minimum" does not mean equality holds on the Cramér–Rao inequality.

Thus an efficient estimator need not exist, but if it does, it is the MVUE.

#### Finite-sample efficiency

Suppose { Pθ | θ ∈ Θ } is a parametric model and X = (X1, …, Xn) are the data sampled from this model. Let T = T(X) be an estimator for the parameter θ. If this estimator is unbiased (that is, E[T] = θ), then the Cramér–Rao inequality states the variance of this estimator is bounded from below:

${\displaystyle \operatorname {var} [\,T\,]\ \geq \ {\mathcal {I}}_{\theta }^{-1},}$

where ${\displaystyle \scriptstyle {\mathcal {I}}_{\theta }}$ is the Fisher information matrix of the model at point θ. Generally, the variance measures the degree of dispersion of a random variable around its mean. Thus estimators with small variances are more concentrated, they estimate the parameters more precisely. We say that the estimator is a finite-sample efficient estimator (in the class of unbiased estimators) if it reaches the lower bound in the Cramér–Rao inequality above, for all θ ∈ Θ. Efficient estimators are always minimum variance unbiased estimators. However the converse is false: There exist point-estimation problems for which the minimum-variance mean-unbiased estimator is inefficient. [6]

Historically, finite-sample efficiency was an early optimality criterion. However this criterion has some limitations:

• Finite-sample efficient estimators are extremely rare. In fact, it was proved that efficient estimation is possible only in an exponential family, and only for the natural parameters of that family.[ citation needed ]
• This notion of efficiency is sometimes restricted to the class of unbiased estimators. (Often it isn't. [7] ) Since there are no good theoretical reasons to require that estimators are unbiased, this restriction is inconvenient. In fact, if we use mean squared error as a selection criterion, many biased estimators will slightly outperform the “best” unbiased ones. For example, in multivariate statistics for dimension three or more, the mean-unbiased estimator, sample mean, is inadmissible: Regardless of the outcome, its performance is worse than for example the James–Stein estimator.[ citation needed ]
• Finite-sample efficiency is based on the variance, as a criterion according to which the estimators are judged. A more general approach is to use loss functions other than quadratic ones, in which case the finite-sample efficiency can no longer be formulated.[ citation needed ][ dubious ]

As an example, among the models encountered in practice, efficient estimators exist for: the mean μ of the normal distribution (but not the variance σ2), parameter λ of the Poisson distribution, the probability p in the binomial or multinomial distribution.

Consider the model of a normal distribution with unknown mean but known variance: { Pθ = N(θ, σ2) | θR }. The data consists of n independent and identically distributed observations from this model: X = (x1, …, xn). We estimate the parameter θ using the sample mean of all observations:

${\displaystyle T(X)={\frac {1}{n}}\sum _{i=1}^{n}x_{i}\ .}$

This estimator has mean θ and variance of σ2/n, which is equal to the reciprocal of the Fisher information from the sample. Thus, the sample mean is a finite-sample efficient estimator for the mean of the normal distribution.

### Asymptotic efficiency

Some estimators can attain efficiency asymptotically and are thus called asymptotically efficient estimators. This can be the case for some maximum likelihood estimators or for any estimators that attain equality of the Cramér–Rao bound asymptotically.

#### Example: Median

Consider a sample of size ${\displaystyle N}$ drawn from a normal distribution of mean ${\displaystyle \mu }$ and unit variance, i.e., ${\displaystyle X_{n}\sim {\mathcal {N}}(\mu ,1).}$

The sample mean, ${\displaystyle {\overline {X}}}$, of the sample ${\displaystyle X_{1},X_{2},\ldots ,X_{N}}$, defined as

${\displaystyle {\overline {X}}={\frac {1}{N}}\sum _{n=1}^{N}X_{n}\sim {\mathcal {N}}\left(\mu ,{\frac {1}{N}}\right).}$

The variance of the mean, 1/N (the square of the standard error) is equal to the reciprocal of the Fisher information from the sample and thus, by the Cramér–Rao inequality, the sample mean is efficient in the sense that its efficiency is unity (100%).

Now consider the sample median, ${\displaystyle {\widetilde {X}}}$. This is an unbiased and consistent estimator for ${\displaystyle \mu }$. For large ${\displaystyle N}$ the sample median is approximately normally distributed with mean ${\displaystyle \mu }$ and variance ${\displaystyle {\pi }/{2N},}$ [8]

${\displaystyle {\widetilde {X}}\sim {\mathcal {N}}\left(\mu ,{\frac {\pi }{2N}}\right).}$

The efficiency of the median for large ${\displaystyle N}$ is thus

${\displaystyle e\left({\widetilde {X}}\right)=\left({\frac {1}{N}}\right)\left({\frac {\pi }{2N}}\right)^{-1}=2/\pi \approx 0.64.}$

In other words, the relative variance of the median will be ${\displaystyle \pi /2\approx 1.57}$, or 57% greater than the variance of the mean – the standard error of the median will be 25% greater than that of the mean. [9]

Note that this is the asymptotic efficiency that is, the efficiency in the limit as sample size ${\displaystyle N}$ tends to infinity. For finite values of ${\displaystyle N,}$ the efficiency is higher than this (for example, a sample size of 3 gives an efficiency of about 74%).[ citation needed ]

The sample mean is thus more efficient than the sample median in this example. However, there may be measures by which the median performs better. For example, the median is far more robust to outliers, so that if the Gaussian model is questionable or approximate, there may advantages to using the median (see Robust statistics).

### Dominant estimators

If ${\displaystyle T_{1}}$ and ${\displaystyle T_{2}}$ are estimators for the parameter ${\displaystyle \theta }$, then ${\displaystyle T_{1}}$ is said to dominate ${\displaystyle T_{2}}$ if:

1. its mean squared error (MSE) is smaller for at least some value of ${\displaystyle \theta }$
2. the MSE does not exceed that of ${\displaystyle T_{2}}$ for any value of θ.

Formally, ${\displaystyle T_{1}}$ dominates ${\displaystyle T_{2}}$ if

${\displaystyle \operatorname {E} [(T_{1}-\theta )^{2}]\leq \operatorname {E} [(T_{2}-\theta )^{2}]}$

holds for all ${\displaystyle \theta }$, with strict inequality holding somewhere.

### Relative efficiency

The relative efficiency of two unbiased estimators is defined as [10]

${\displaystyle e(T_{1},T_{2})={\frac {\operatorname {E} [(T_{2}-\theta )^{2}]}{\operatorname {E} [(T_{1}-\theta )^{2}]}}={\frac {\operatorname {var} (T_{2})}{\operatorname {var} (T_{1})}}}$

Although ${\displaystyle e}$ is in general a function of ${\displaystyle \theta }$, in many cases the dependence drops out; if this is so, ${\displaystyle e}$ being greater than one would indicate that ${\displaystyle T_{1}}$ is preferable, regardless of the true value of ${\displaystyle \theta }$.

An alternative to relative efficiency for comparing estimators, is the Pitman closeness criterion. This replaces the comparison of mean-squared-errors with comparing how often one estimator produces estimates closer to the true value than another estimator.

If ${\displaystyle T_{1}}$ and ${\displaystyle T_{2}}$ are estimators for the parameter ${\displaystyle \theta }$, then ${\displaystyle T_{1}}$ is said to dominate ${\displaystyle T_{2}}$ if:

1. its mean squared error (MSE) is smaller for at least some value of ${\displaystyle \theta }$
2. the MSE does not exceed that of ${\displaystyle T_{2}}$ for any value of θ.

Formally, ${\displaystyle T_{1}}$ dominates ${\displaystyle T_{2}}$ if

${\displaystyle \mathrm {E} \left[(T_{1}-\theta )^{2}\right]\leq \mathrm {E} \left[(T_{2}-\theta )^{2}\right]}$

holds for all ${\displaystyle \theta }$, with strict inequality holding somewhere.

#### Estimators of the mean of u.i.d. variables

In estimating the mean of uncorrelated, identically distributed variables we can take advantage of the fact that the variance of the sum is the sum of the variances. In this case efficiency can be defined as the square of the coefficient of variation, i.e., [11]

${\displaystyle e\equiv \left({\frac {\sigma }{\mu }}\right)^{2}}$

Relative efficiency of two such estimators can thus be interpreted as the relative sample size of one required to achieve the certainty of the other. Proof:

${\displaystyle {\frac {e_{1}}{e_{2}}}={\frac {s_{1}^{2}}{s_{2}^{2}}}.}$

Now because ${\displaystyle s_{1}^{2}=n_{1}\sigma ^{2},\,s_{2}^{2}=n_{2}\sigma ^{2}}$ we have ${\displaystyle {\frac {e_{1}}{e_{2}}}={\frac {n_{1}}{n_{2}}}}$, so the relative efficiency expresses the relative sample size of the first estimator needed to match the variance of the second.

### Robustness

Efficiency of an estimator may change significantly if the distribution changes, often dropping. This is one of the motivations of robust statistics – an estimator such as the sample mean is an efficient estimator of the population mean of a normal distribution, for example, but can be an inefficient estimator of a mixture distribution of two normal distributions with the same mean and different variances. For example, if a distribution is a combination of 98% N(μ,σ) and 2% N(μ, 10σ), the presence of extreme values from the latter distribution (often "contaminating outliers") significantly reduces the efficiency of the sample mean as an estimator of μ. By contrast, the trimmed mean is less efficient for a normal distribution, but is more robust (i.e., less affected) by changes in the distribution, and thus may be more efficient for a mixture distribution. Similarly, the shape of a distribution, such as skewness or heavy tails, can significantly reduce the efficiency of estimators that assume a symmetric distribution or thin tails.

### Uses of inefficient estimators

While efficiency is a desirable quality of an estimator, it must be weighed against other considerations, and an estimator that is efficient for certain distributions may well be inefficient for other distributions. Most significantly, estimators that are efficient for clean data from a simple distribution, such as the normal distribution (which is symmetric, unimodal, and has thin tails) may not be robust to contamination by outliers, and may be inefficient for more complicated distributions. In robust statistics, more importance is placed on robustness and applicability to a wide variety of distributions, rather than efficiency on a single distribution. M-estimators are a general class of solutions motivated by these concerns, yielding both robustness and high relative efficiency, though possibly lower efficiency than traditional estimators for some cases. These are potentially very computationally complicated, however.

A more traditional alternative are L-estimators, which are very simple statistics that are easy to compute and interpret, in many cases robust, and often sufficiently efficient for initial estimates. See applications of L-estimators for further discussion.

### Efficiency in statistics

Efficiency in statistics is important because they allow one to compare the performance of various estimators. Although an unbiased estimator is usually favored over a biased one, a more efficient biased estimator can sometimes be more valuable than a less efficient unbiased estimator. For example, this can occur when the values of the biased estimator gathers around a number closer to the true value. Thus, estimator performance can be predicted easily by comparing their mean squared errors or variances.

## Hypothesis tests

For comparing significance tests, a meaningful measure of efficiency can be defined based on the sample size required for the test to achieve a given task power. [12]

Pitman efficiency [13] and Bahadur efficiency (or Hodges–Lehmann efficiency) [14] [15] relate to the comparison of the performance of statistical hypothesis testing procedures. The Encyclopedia of Mathematics provides a brief exposition of these three criteria.

## Experimental design

For experimental designs, efficiency relates to the ability of a design to achieve the objective of the study with minimal expenditure of resources such as time and money. In simple cases, the relative efficiency of designs can be expressed as the ratio of the sample sizes required to achieve a given objective. [16]

## Notes

1. Everitt 2002, p. 128.
2. Nikulin, M.S. (2001) [1994], "Efficiency of a statistical procedure", Encyclopedia of Mathematics , EMS Press
3. Fisher, R (1921). "On the Mathematical Foundations of Theoretical Statistics". Philosophical Transactions of the Royal Society of London A. 222: 309–368. JSTOR   91208.
4. Everitt 2002, p.  128.
5. Dekking, F.M. (2007). . Springer. pp.  303–305. ISBN   978-1852338961.
6. Romano, Joseph P.; Siegel, Andrew F. (1986). Counterexamples in Probability and Statistics. Chapman and Hall. p. 194.
7. DeGroot; Schervish (2002). Probability and Statistics (3rd ed.). pp. 440–441.
8. Williams, D. (2001). . Cambridge University Press. p.  165. ISBN   052100618X.
9. Maindonald, John; Braun, W. John (2010-05-06). Data Analysis and Graphics Using R: An Example-Based Approach. Cambridge University Press. p. 104. ISBN   978-1-139-48667-5.
10. Wackerly, Dennis D.; Mendenhall, William; Scheaffer, Richard L. (2008). (Seventh ed.). Belmont, CA: Thomson Brooks/Cole. p.  445. ISBN   9780495110811. OCLC   183886598.
11. Grubbs, Frank (1965). Statistical Measures of Accuracy for Riflemen and Missile Engineers. pp. 26–27.
12. Everitt 2002, p. 321.
13. Nikitin, Ya.Yu. (2001) [1994], "Efficiency, asymptotic", Encyclopedia of Mathematics , EMS Press
14. Arcones M. A. "Bahadur efficiency of the likelihood ratio test" preprint
15. Canay I. A. & Otsu, T. "Hodges–Lehmann Optimality for Testing Moment Condition Models"
16. Dodge, Y. (2006). . Oxford University Press. ISBN   0-19-920613-9.

## Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

In statistics, the Lehmann–Scheffé theorem is a prominent statement, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator which is unbiased for a given unknown quantity and that depends on the data only through a complete, sufficient statistic is the unique best unbiased estimator of that quantity. The Lehmann–Scheffé theorem is named after Erich Leo Lehmann and Henry Scheffé, given their two early papers.

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

In estimation theory and statistics, the Cramér–Rao bound (CRB) expresses a lower bound on the variance of unbiased estimators of a deterministic parameter, the variance of any such estimator is at least as high as the inverse of the Fisher information. The result is named in honor of Harald Cramér and C. R. Rao, but has independently also been derived by Maurice Fréchet, Georges Darmois, as well as Alexander Aitken and Harold Silverstone.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior. The role of the Fisher information in the asymptotic theory of maximum-likelihood estimation was emphasized by the statistician Ronald Fisher. The Fisher information is also used in the calculation of the Jeffreys prior, which is used in Bayesian statistics.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:

In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias can also be measured with respect to the median, rather than the mean, in which case one distinguishes median-unbiased from the usual mean-unbiasedness property. Bias is a distinct concept from consistency. Consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In statistics, the jackknife is a resampling technique that is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size , a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size obtained by omitting one observation.

In statistics, Bessel's correction is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample. This method corrects the bias in the estimation of the population variance. It also partially corrects the bias in the estimation of the population standard deviation. However, the correction often increases the mean squared error in these estimations. This technique is named after Friedrich Bessel.

In statistics, maximum spacing estimation, or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.