Fisher consistency

Last updated February 28, 2024

In statistics, Fisher consistency, named after Ronald Fisher, is a desirable property of an estimator asserting that if the estimator were calculated using the entire population rather than a sample, the true value of the estimated parameter would be obtained.^[1]

Definition

Suppose we have a statistical sample X₁, ..., X_n where each X_i follows a cumulative distribution F_θ which depends on an unknown parameter θ. If an estimator of θ based on the sample can be represented as a functional of the empirical distribution function F̂_n:

{\hat {\theta }}=T({\hat {F}}_{n})\,,

the estimator is said to be Fisher consistent if:

T(F_{\theta })=\theta \,.

^[2]

As long as the X_i are exchangeable, an estimator T defined in terms of the X_i can be converted into an estimator T′ that can be defined in terms of F̂_n by averaging T over all permutations of the data. The resulting estimator will have the same expected value as T and its variance will be no larger than that of T.

If the strong law of large numbers can be applied, the empirical distribution functions F̂_n converge pointwise to F_θ, allowing us to express Fisher consistency as a limit — the estimator is Fisher consistent if

T\left(\lim _{n\rightarrow \infty }{\hat {F}}_{n}\right)=\theta .\,

Finite population example

Suppose our sample is obtained from a finite population Z₁, ..., Z_m. We can represent our sample of size n in terms of the proportion of the sample n_i / n taking on each value in the population. Writing our estimator of θ as T(n₁ / n, ..., n_m / n), the population analogue of the estimator is T(p₁, ..., p_m), where p_i = P(X = Z_i). Thus we have Fisher consistency if T(p₁, ..., p_m) = θ.

Suppose the parameter of interest is the expected value μ and the estimator is the sample mean, which can be written

n^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}I(X_{i}=Z_{j})Z_{j},

where I is the indicator function. The population analogue of this expression is

n^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}p_{j}Z_{j}=n^{-1}\sum _{i=1}^{n}\mu =\mu ,

so we have Fisher consistency.

Role in maximum likelihood estimation

Maximising the likelihood function L gives an estimate that is Fisher consistent for a parameter b if

E\left[{\frac {d\ln L}{db}}\right]=0{\text{ at }}b=b_{0},\,

where b₀ represents the true value of b.^[3]^[4]

Relationship to asymptotic consistency and unbiasedness

The term consistency in statistics usually refers to an estimator that is asymptotically consistent. Fisher consistency and asymptotic consistency are distinct concepts, although both aim to define a desirable property of an estimator. While many estimators are consistent in both senses, neither definition encompasses the other. For example, suppose we take an estimator T_n that is both Fisher consistent and asymptotically consistent, and then form T_n + E_n, where E_n is a deterministic sequence of nonzero numbers converging to zero. This estimator is asymptotically consistent, but not Fisher consistent for any n.

The sample mean is a Fisher consistent and unbiased estimate of the population mean, but not all Fisher consistent estimates are unbiased. Suppose we observe a sample from a uniform distribution on (0,θ) and we wish to estimate θ. The sample maximum is Fisher consistent, but downwardly biased. Conversely, the sample variance is an unbiased estimate of the population variance, but is not Fisher consistent.

Role in decision theory

A loss function is Fisher consistent if the population minimizer of the risk leads to the Bayes optimal decision rule.^[5]

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ₀ converges to one.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot need not be a statistic — the function and its 'value' can depend on the parameters of the model, but its 'distribution' must not. If it is a statistic, then it is known as an 'ancillary statistic'.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In statistics, the jackknife is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size $, a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size obtained by omitting one observation.$

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics, an adaptive estimator is an estimator in a parametric or semiparametric model with nuisance parameters such that the presence of these nuisance parameters does not affect efficiency of estimation.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

References

↑ Fisher, R.A. (1922). "On the mathematical foundations of theoretical statistics". Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character. 222 (594–604): 309–368. Bibcode:1922RSPTA.222..309F. doi: 10.1098/rsta.1922.0009 . hdl: 2440/15172 . JFM 48.1280.02. JSTOR 91208.
↑ Cox, D.R., Hinkley D.V. (1974) Theoretical Statistics, Chapman and Hall, ISBN 0-412-12420-3. (defined on p287)
↑ Jurečková, Jana; Jan Picek (2006). Robust Statistical Methods with R. CRC Press. ISBN 1-58488-454-1.
↑ "Natural Increase Refers to Net Population Growth Rates".
↑ http://www.stat.osu.edu/~yklee/881/consistency.pdf ^{[ bare URL PDF ]}

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Fisher, R.A. (1922). "On the mathematical foundations of theoretical statistics". Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character. 222 (594–604): 309–368. Bibcode:1922RSPTA.222..309F. doi: 10.1098/rsta.1922.0009 . hdl: 2440/15172 . JFM 48.1280.02. JSTOR 91208.

[2] Cox, D.R., Hinkley D.V. (1974) Theoretical Statistics, Chapman and Hall, ISBN 0-412-12420-3. (defined on p287)

[3] Jurečková, Jana; Jan Picek (2006). Robust Statistical Methods with R. CRC Press. ISBN 1-58488-454-1.

[4] "Natural Increase Refers to Net Population Growth Rates".

[5] ttp://www.stat.osu.edu/~yklee/881/consistency.pdf ^{[ bare URL PDF ]}

[1]

[2]

[3]

[4]

[5]