Extremum estimator

Last updated

In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization (or minimization) of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

Contents

Definition

An estimator is called an extremum estimator, if there is an objective function such that

where Θ is the parameter space. Sometimes a slightly weaker definition is given:

where op(1) is the variable converging in probability to zero. With this modification doesn't have to be the exact maximizer of the objective function, just be sufficiently close to it.

The theory of extremum estimators does not specify what the objective function should be. There are various types of objective functions suitable for different models, and this framework allows us to analyse the theoretical properties of such estimators from a unified perspective. The theory only specifies the properties that the objective function has to possess, and so selecting a particular objective function only requires verifying that those properties are satisfied.

Consistency

When the parameter space Th is not compact (Th = R in this example), then even if the objective function is uniquely maximized at th0, this maximum may be not well-separated, in which case the estimator
th
^
{\displaystyle \scriptscriptstyle {\hat {\theta }}}
will fail to be consistent. Ee noncompactness.svg
When the parameter space Θ is not compact (Θ = in this example), then even if the objective function is uniquely maximized at θ0, this maximum may be not well-separated, in which case the estimator will fail to be consistent.

If the parameter space Θ is compact and there is a limiting functionQ0(θ) such that: converges to Q0(θ) in probability uniformly over Θ, and the function Q0(θ) is continuous and has a unique maximum at θ = θ0 then is consistent for θ0. [1]

The uniform convergence in probability of means that

The requirement for Θ to be compact can be replaced with a weaker assumption that the maximum of Q0 was well-separated, that is there should not exist any points θ that are distant from θ0 but such that Q0(θ) were close to Q0(θ0). Formally, it means that for any sequence {θi} such that Q0(θi) → Q0(θ0), it should be true that θiθ0.

Asymptotic normality

Assuming that consistency has been established and the derivatives of the sample satisfy some other conditions, [2] the extremum estimator converges to an asymptotically Normal distribution.

Examples

See also

Notes

  1. Newey & McFadden (1994), Theorem 2.1
  2. Shi, Xiaoxia. "Lecture Notes: Asymptotic Normality of Extremum Estimators" (PDF).
  3. Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. p. 448. ISBN   0-691-01018-8.
  4. Hayashi, Fumio (2000). Econometrics. Princeton: Princeton University Press. p. 447. ISBN   0-691-01018-8.

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

The likelihood function is the joint probability of the observed data viewed as a function of the parameters of a statistical model.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.
<span class="mw-page-title-main">Consistent estimator</span> Statistical estimator converging in probability to a true parameter as sample size increases

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the score—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than sampling error. While the finite sample distributions of score tests are generally unknown, they have an asymptotic χ2-distribution under the null hypothesis as first proved by C. R. Rao in 1948, a fact that can be used to determine statistical significance.

In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

<span class="mw-page-title-main">Empirical distribution function</span> Distribution function associated with the empirical measure of a sample

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation. 48 samples of robust M-estimators can be found in a recent review study.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics, the Hájek–Le Cam convolution theorem states that any regular estimator in a parametric model is asymptotically equivalent to a sum of two independent random variables, one of which is normal with asymptotic variance equal to the inverse of Fisher information, and the other having arbitrary distribution.

In statistics, Hodges' estimator, named for Joseph Hodges, is a famous counterexample of an estimator which is "superefficient", i.e. it attains smaller asymptotic variance than regular efficient estimators. The existence of such a counterexample is the reason for the introduction of the notion of regular estimators.

In estimation theory in statistics, stochastic equicontinuity is a property of estimators that is useful in dealing with their asymptotic behaviour as the amount of data increases. It is a version of equicontinuity used in the context of functions of random variables: that is, random functions. The property relates to the rate of convergence of sequences of random variables and requires that this rate is essentially the same within a region of the parameter space being considered.

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

References