Minimum-distance estimation

Last updated June 23, 2024

Minimum-distance estimation (MDE) is a conceptual method for fitting a statistical model to data, usually the empirical distribution. Often-used estimators such as ordinary least squares can be thought of as special cases of minimum-distance estimation.

While consistent and asymptotically normal, minimum-distance estimators are generally not statistically efficient when compared to maximum likelihood estimators, because they omit the Jacobian usually present in the likelihood function. This, however, substantially reduces the computational complexity of the optimization problem.

Definition

Let $\displaystyle X_{1},\ldots ,X_{n}$ be an independent and identically distributed (iid) random sample from a population with distribution $F(x;\theta )\colon \theta \in \Theta$ and $\Theta \subseteq \mathbb {R} ^{k}(k\geq 1)$ .

Let $\displaystyle F_{n}(x)$ be the empirical distribution function based on the sample.

Let ${\hat {\theta }}$ be an estimator for $\displaystyle \theta$ . Then $F(x;{\hat {\theta }})$ is an estimator for $\displaystyle F(x;\theta )$ .

Let $d[\cdot ,\cdot ]$ be a functional returning some measure of "distance" between its two arguments. The functional $\displaystyle d$ is also called the criterion function.

If there exists a ${\hat {\theta }}\in \Theta$ such that $d[F(x;{\hat {\theta }}),F_{n}(x)]=\inf\{d[F(x;\theta ),F_{n}(x)];\theta \in \Theta \}$ , then ${\hat {\theta }}$ is called the minimum-distance estimate of $\displaystyle \theta$ .

( Drossos & Philippou 1980 , p. 121)

Statistics used in estimation

Most theoretical studies of minimum-distance estimation, and most applications, make use of "distance" measures which underlie already-established goodness of fit tests: the test statistic used in one of these tests is used as the distance measure to be minimised. Below are some examples of statistical tests that have been used for minimum-distance estimation.

Chi-square criterion

The chi-square test uses as its criterion the sum, over predefined groups, of the squared difference between the increases of the empirical distribution and the estimated distribution, weighted by the increase in the estimate for that group.

Cramér–von Mises criterion

The Cramér–von Mises criterion uses the integral of the squared difference between the empirical and the estimated distribution functions ( Parr & Schucany 1980 , p. 616).

Kolmogorov–Smirnov criterion

The Kolmogorov–Smirnov test uses the supremum of the absolute difference between the empirical and the estimated distribution functions ( Parr & Schucany 1980 , p. 616).

Anderson–Darling criterion

The Anderson–Darling test is similar to the Cramér–von Mises criterion except that the integral is of a weighted version of the squared difference, where the weighting relates the variance of the empirical distribution function ( Parr & Schucany 1980 , p. 616).

Theoretical results

The theory of minimum-distance estimation is related to that for the asymptotic distribution of the corresponding statistical goodness of fit tests. Often the cases of the Cramér–von Mises criterion, the Kolmogorov–Smirnov test and the Anderson–Darling test are treated simultaneously by treating them as special cases of a more general formulation of a distance measure. Examples of the theoretical results that are available are: consistency of the parameter estimates; the asymptotic covariance matrices of the parameter estimates.

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function $is the probability of observing data assuming is the actual parameter.$

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ²-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities. Strictly speaking, "invariant" would mean that the estimates themselves are unchanged when both the measurements and the parameters are transformed in a compatible way, but the meaning has been extended to allow the estimates to change in appropriate ways with such transformations. The term equivariant estimator is used in formal mathematical contexts that include a precise description of the relation of the way the estimator changes in response to changes to the dataset and parameterisation: this corresponds to the use of "equivariance" in more general mathematics.

In statistics, the Khmaladze transformation is a mathematical tool used in constructing convenient goodness of fit tests for hypothetical distribution functions. More precisely, suppose $are i.i.d., possibly multi-dimensional, random observations generated from an unknown probability distribution. A classical problem in statistics is to decide how well a given hypothetical distribution function, or a given hypothetical parametric family of distribution functions, fits the set of observations. The Khmaladze transformation allows us to construct goodness of fit tests with desirable properties. It is named after Estate V. Khmaladze.$

In statistics, Fisher consistency, named after Ronald Fisher, is a desirable property of an estimator asserting that if the estimator were calculated using the entire population rather than a sample, the true value of the estimated parameter would be obtained.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In statistics, Hodges' estimator, named for Joseph Hodges, is a famous counterexample of an estimator which is "superefficient", i.e. it attains smaller asymptotic variance than regular efficient estimators. The existence of such a counterexample is the reason for the introduction of the notion of regular estimators.

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

References

Boos, Dennis D. (1982). "Minimum anderson-darling estimation". Communications in Statistics – Theory and Methods. 11 (24): 2747–2774. doi:10.1080/03610928208828420. S2CID 119812213.
Blyth, Colin R. (June 1970). "On the Inference and Decision Models of Statistics". The Annals of Mathematical Statistics. 41 (3): 1034–1058. doi: 10.1214/aoms/1177696980 .
Drossos, Constantine A.; Philippou, Andreas N. (December 1980). "A Note on Minimum Distance Estimates". Annals of the Institute of Statistical Mathematics. 32 (1): 121–123. doi:10.1007/BF02480318. S2CID 120207485.
Parr, William C.; Schucany, William R. (1980). "Minimum Distance and Robust Estimation". Journal of the American Statistical Association. 75 (371): 616–624. CiteSeerX 10.1.1.878.5446 . doi:10.1080/01621459.1980.10477522. JSTOR 2287658.
Wolfowitz, J. (March 1957). "The minimum distance method". The Annals of Mathematical Statistics. 28 (1): 75–88. doi: 10.1214/aoms/1177707038 .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.