Extremum estimator

Last updated November 06, 2024

In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization (or minimization) of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

Definition

An estimator $\scriptstyle {\hat {\theta }}$ is called an extremum estimator, if there is an objective function $\scriptstyle {\hat {Q}}_{n}$ such that

{\hat {\theta }}={\underset {\theta \in \Theta }{\operatorname {arg\;max} }}\ {\widehat {Q}}_{n}(\theta ),

where Θ is the parameter space. Sometimes a slightly weaker definition is given:

{\widehat {Q}}_{n}({\hat {\theta }})\geq \max _{\theta \in \Theta }\,{\widehat {Q}}_{n}(\theta )-o_{p}(1),

where o_p(1) is the variable converging in probability to zero. With this modification $\scriptstyle {\hat {\theta }}$ doesn't have to be the exact maximizer of the objective function, just be sufficiently close to it.

The theory of extremum estimators does not specify what the objective function should be. There are various types of objective functions suitable for different models, and this framework allows us to analyse the theoretical properties of such estimators from a unified perspective. The theory only specifies the properties that the objective function has to possess, and so selecting a particular objective function only requires verifying that those properties are satisfied.

Consistency

Extremum estimator — When the parameter space Θ is not compact (Θ = R in this example), then even if the objective function is uniquely maximized at θ₀, this maximum may be not well-separated, in which case the estimator $\scriptscriptstyle {\hat {\theta }}$ will fail to be consistent.

If the parameter space Θ is compact and there is a limiting functionQ₀(θ) such that: $\scriptstyle {\hat {Q}}_{n}(\theta )$ converges to Q₀(θ) in probability uniformly over Θ, and the function Q₀(θ) is continuous and has a unique maximum at θ = θ₀ then $\scriptstyle {\hat {\theta }}$ is consistent for θ₀.^[1]

The uniform convergence in probability of $\scriptstyle {\hat {Q}}_{n}(\theta )$ means that

\sup _{\theta \in \Theta }{\big |}{\hat {Q}}_{n}(\theta )-Q_{0}(\theta ){\big |}\ {\xrightarrow {p}}\ 0.

The requirement for Θ to be compact can be replaced with a weaker assumption that the maximum of Q₀ was well-separated, that is there should not exist any points θ that are distant from θ₀ but such that Q₀(θ) were close to Q₀(θ₀). Formally, it means that for any sequence {θ_i} such that Q₀(θ_i) → Q₀(θ₀), it should be true that θ_i → θ₀.

Asymptotic normality

Assuming that consistency has been established and the derivatives of the sample $Q_{n}$ satisfy some other conditions,^[2] the extremum estimator converges to an asymptotically Normal distribution.

Examples

Maximum likelihood estimation uses the objective function
${\hat {Q}}_{n}(\theta )=\log \left[\prod _{i=1}^{n}f(x_{i}|\theta )\right]=\sum _{i=1}^{n}\log f(x_{i}|\theta ),$
where f(·|θ) is the density function of the distribution from where the observations are drawn. This objective function is called the log-likelihood function.^[3]
Generalized method of moments estimator is defined through the objective function
${\hat {Q}}_{n}(\theta )=-{\Bigg (}{\frac {1}{n}}\sum _{i=1}^{n}g(x_{i},\theta ){\Bigg )}'{\hat {W}}_{n}{\Bigg (}{\frac {1}{n}}\sum _{i=1}^{n}g(x_{i},\theta ){\Bigg )},$
where g(·|θ) is the moment condition of the model.^[4]
Minimum distance estimator
Least squares estimator

Notes

↑ Newey & McFadden (1994), Theorem 2.1
↑ Shi, Xiaoxia. "Lecture Notes: Asymptotic Normality of Extremum Estimators" (PDF).
↑ Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. p. 448. ISBN 0-691-01018-8.
↑ Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. p. 447. ISBN 0-691-01018-8.

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

With a shape parameter $k$ and a scale parameter $θ$
With a shape parameter $and an inverse scale parameter ⁠ ⁠, called a rate parameter.$

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It is also known as Fréchet-Cramér–Rao or Fréchet-Darmois-Cramér-Rao lower bound. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ₀ converges to one.

In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the score—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than sampling error. While the finite sample distributions of score tests are generally unknown, they have an asymptotic χ²-distribution under the null hypothesis as first proved by C. R. Rao in 1948, a fact that can be used to determine statistical significance.

In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ²-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation. The "M" initial stands for "maximum likelihood-type".

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics, the Hájek–Le Cam convolution theorem states that any regular estimator in a parametric model is asymptotically equivalent to a sum of two independent random variables, one of which is normal with asymptotic variance equal to the inverse of Fisher information, and the other having arbitrary distribution.

In estimation theory in statistics, stochastic equicontinuity is a property of estimators that is useful in dealing with their asymptotic behaviour as the amount of data increases. It is a version of equicontinuity used in the context of functions of random variables: that is, random functions. The property relates to the rate of convergence of sequences of random variables and requires that this rate is essentially the same within a region of the parameter space being considered.

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

References

Amemiya, Takeshi (1985). "Asymptotic Properties of Extremum Estimators". Advanced Econometrics . Harvard University Press. pp. 105–158. ISBN 0-674-00560-0.
Hayashi, Fumio (2000). "Extremum Estimators". Econometrics. Princeton: Princeton University Press. pp. 445–506. ISBN 0-691-01018-8.
Newey, Whitney K.; McFadden, Daniel (1994). "Large sample estimation and hypothesis testing". Handbook of Econometrics. Vol. IV. Elsevier Science. pp. 2111–2245. doi:10.1016/S1573-4412(05)80005-4. ISBN 0-444-88766-0.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Newey & McFadden (1994), Theorem 2.1

[2] Shi, Xiaoxia. "Lecture Notes: Asymptotic Normality of Extremum Estimators" (PDF).

[3] Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. p. 448. ISBN 0-691-01018-8.

[4] Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. p. 447. ISBN 0-691-01018-8.

[1]

[2]

[3]

[4]