Generalized method of moments

Last updated

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

Contents

The method requires that a certain number of moment conditions be specified for the model. These moment conditions are functions of the model parameters and the data, such that their expectation is zero at the parameters' true values. The GMM method then minimizes a certain norm of the sample averages of the moment conditions, and can therefore be thought of as a special case of minimum-distance estimation. [1]

The GMM estimators are known to be consistent, asymptotically normal, and most efficient in the class of all estimators that do not use any extra information aside from that contained in the moment conditions. GMM were advocated by Lars Peter Hansen in 1982 as a generalization of the method of moments, [2] introduced by Karl Pearson in 1894. However, these estimators are mathematically equivalent to those based on "orthogonality conditions" (Sargan, 1958, 1959) or "unbiased estimating equations" (Huber, 1967; Wang et al., 1997).

Description

Suppose the available data consists of T observations {Yt }t = 1,...,T, where each observation Yt is an n-dimensional multivariate random variable. We assume that the data come from a certain statistical model, defined up to an unknown parameter θ ∈ Θ. The goal of the estimation problem is to find the “true” value of this parameter, θ0, or at least a reasonably close estimate.

A general assumption of GMM is that the data Yt be generated by a weakly stationary ergodic stochastic process. (The case of independent and identically distributed (iid) variables Yt is a special case of this condition.)

In order to apply GMM, we need to have "moment conditions", that is, we need to know a vector-valued function g(Y,θ) such that

where E denotes expectation, and Yt is a generic observation. Moreover, the function m(θ) must differ from zero for θθ0, otherwise the parameter θ will not be point-identified.

The basic idea behind GMM is to replace the theoretical expected value E[⋅] with its empirical analog—sample average:

and then to minimize the norm of this expression with respect to θ. The minimizing value of θ is our estimate for θ0.

By the law of large numbers, for large values of T, and thus we expect that . The generalized method of moments looks for a number which would make as close to zero as possible. Mathematically, this is equivalent to minimizing a certain norm of (norm of m, denoted as ||m||, measures the distance between m and zero). The properties of the resulting estimator will depend on the particular choice of the norm function, and therefore the theory of GMM considers an entire family of norms, defined as

where W is a positive-definite weighting matrix, and denotes transposition. In practice, the weighting matrix W is computed based on the available data set, which will be denoted as . Thus, the GMM estimator can be written as

Under suitable conditions this estimator is consistent, asymptotically normal, and with right choice of weighting matrix also asymptotically efficient.

Properties

Consistency

Consistency is a statistical property of an estimator stating that, having a sufficient number of observations, the estimator will converge in probability to the true value of parameter:

Sufficient conditions for a GMM estimator to be consistent are as follows:

  1. where W is a positive semi-definite matrix,
  2.   only for
  3. The space of possible parameters is compact,
  4.   is continuous at each θ with probability one,

The second condition here (so-called Global identification condition) is often particularly hard to verify. There exist simpler necessary but not sufficient conditions, which may be used to detect non-identification problem:

In practice applied econometricians often simply assume that global identification holds, without actually proving it. [3] :2127

Asymptotic normality

Asymptotic normality is a useful property, as it allows us to construct confidence bands for the estimator, and conduct different tests. Before we can make a statement about the asymptotic distribution of the GMM estimator, we need to define two auxiliary matrices:

Then under conditions 1–6 listed below, the GMM estimator will be asymptotically normal with limiting distribution:

Conditions:

  1. is consistent (see previous section),
  2. The set of possible parameters is compact,
  3. is continuously differentiable in some neighborhood N of with probability one,
  4. the matrix is nonsingular.

Relative Efficiency

So far we have said nothing about the choice of matrix W, except that it must be positive semi-definite. In fact any such matrix will produce a consistent and asymptotically normal GMM estimator, the only difference will be in the asymptotic variance of that estimator. It can be shown that taking

will result in the most efficient estimator in the class of all (generalized) method of moment estimators. Only infinite number of orthogonal conditions obtains the smallest variance, the Cramér–Rao bound.

In this case the formula for the asymptotic distribution of the GMM estimator simplifies to

The proof that such a choice of weighting matrix is indeed locally optimal is often adopted with slight modifications when establishing efficiency of other estimators. As a rule of thumb, a weighting matrix inches closer to optimality when it turns into an expression closer to the Cramér–Rao bound.

Proof. We will consider the difference between asymptotic variance with arbitrary W and asymptotic variance with . If we can factor this difference into a symmetric product of the form CC' for some matrix C, then it will guarantee that this difference is nonnegative-definite, and thus will be optimal by definition.
where we introduced matrices A and B in order to slightly simplify notation; I is an identity matrix. We can see that matrix B here is symmetric and idempotent: . This means I−B is symmetric and idempotent as well: . Thus we can continue to factor the previous expression as

Implementation

One difficulty with implementing the outlined method is that we cannot take W = Ω−1 because, by the definition of matrix Ω, we need to know the value of θ0 in order to compute this matrix, and θ0 is precisely the quantity we do not know and are trying to estimate in the first place. In the case of Yt being iid we can estimate W as

Several approaches exist to deal with this issue, the first one being the most popular:

Another important issue in implementation of minimization procedure is that the function is supposed to search through (possibly high-dimensional) parameter space Θ and find the value of θ which minimizes the objective function. No generic recommendation for such procedure exists, it is a subject of its own field, numerical optimization.

Sargan–Hansen J-test

When the number of moment conditions is greater than the dimension of the parameter vector θ, the model is said to be over-identified. Sargan (1958) proposed tests for over-identifying restrictions based on instrumental variables estimators that are distributed in large samples as Chi-square variables with degrees of freedom that depend on the number of over-identifying restrictions. Subsequently, Hansen (1982) applied this test to the mathematically equivalent formulation of GMM estimators. Note, however, that such statistics can be negative in empirical applications where the models are misspecified, and likelihood ratio tests can yield insights since the models are estimated under both null and alternative hypotheses (Bhargava and Sargan, 1983).

Conceptually we can check whether is sufficiently close to zero to suggest that the model fits the data well. The GMM method has then replaced the problem of solving the equation , which chooses to match the restrictions exactly, by a minimization calculation. The minimization can always be conducted even when no exists such that . This is what J-test does. The J-test is also called a test for over-identifying restrictions.

Formally we consider two hypotheses:

Under hypothesis , the following so-called J-statistic is asymptotically chi-squared distributed with k–l degrees of freedom. Define J to be:

  under

where is the GMM estimator of the parameter , k is the number of moment conditions (dimension of vector g), and l is the number of estimated parameters (dimension of vector θ). Matrix must converge in probability to , the efficient weighting matrix (note that previously we only required that W be proportional to for estimator to be efficient; however in order to conduct the J-test W must be exactly equal to , not simply proportional).

Under the alternative hypothesis , the J-statistic is asymptotically unbounded:

  under

To conduct the test we compute the value of J from the data. It is a nonnegative number. We compare it with (for example) the 0.95 quantile of the distribution:

Scope

Many other popular estimation techniques can be cast in terms of GMM optimization:

An Alternative to the GMM

In method of moments, an alternative to the original (non-generalized) Method of Moments (MoM) is described, and references to some applications and a list of theoretical advantages and disadvantages relative to the traditional method are provided. This Bayesian-Like MoM (BL-MoM) is distinct from all the related methods described above, which are subsumed by the GMM. [5] [6] The literature does not contain a direct comparison between the GMM and the BL-MoM in specific applications.

Implementations

See also

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, the Lehmann–Scheffé theorem is a prominent statement, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator which is unbiased for a given unknown quantity and that depends on the data only through a complete, sufficient statistic is the unique best unbiased estimator of that quantity. The Lehmann–Scheffé theorem is named after Erich Leo Lehmann and Henry Scheffé, given their two early papers.

In statistics, the informant is the gradient of the log-likelihood function with respect to the parameter vector. Evaluated at a particular point of the parameter vector, the score indicates the steepness of the log-likelihood function and thereby the sensitivity to infinitesimal changes to the parameter values. If the log-likelihood function is continuous over the parameter space, the score will vanish at a local maximum or minimum; this fact is used in maximum likelihood estimation to find the parameter values that maximize the likelihood function.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

<span class="mw-page-title-main">Consistent estimator</span> Statistical estimator converging in probability to a true parameter as sample size increases

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

<span class="mw-page-title-main">Empirical distribution function</span> Distribution function associated with the empirical measure of a sample

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in the regression model. Least squares and weighted least squares may need to be more statistically efficient and prevent misleading inferences. GLS was first described by Alexander Aitken in 1935.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

<span class="mw-page-title-main">Errors-in-variables models</span> Regression models accounting for possible errors in independent variables

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

Partial (pooled) likelihood estimation for panel data is a quasi-maximum likelihood method for panel analysis that assumes that density of yit given xit is correctly specified for each time period but it allows for misspecification in the conditional density of yi≔(yi1,...,yiT) given xi≔(xi1,...,xiT).

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

In statistics and econometrics, optimal instruments are a technique for improving the efficiency of estimators in conditional moment models, a class of semiparametric models that generate conditional expectation functions. To estimate parameters of a conditional moment model, the statistician can derive an expectation function and use the generalized method of moments (GMM). However, there are infinitely many moment conditions that can be generated from a single model; optimal instruments provide the most efficient moment conditions.

References

  1. Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 206. ISBN   0-691-01018-8.
  2. Hansen, Lars Peter (1982). "Large Sample Properties of Generalized Method of Moments Estimators". Econometrica . 50 (4): 1029–1054. doi:10.2307/1912775. JSTOR   1912775.
  3. Newey, W.; McFadden, D. (1994). "Large sample estimation and hypothesis testing". Handbook of Econometrics. Vol. 4. Elsevier Science. pp. 2111–2245. CiteSeerX   10.1.1.724.4480 . doi:10.1016/S1573-4412(05)80005-4. ISBN   9780444887665.
  4. Hansen, Lars Peter; Heaton, John; Yaron, Amir (1996). "Finite-sample properties of some alternative GMM estimators" (PDF). Journal of Business & Economic Statistics. 14 (3): 262–280. doi:10.1080/07350015.1996.10524656. hdl: 1721.1/47970 . JSTOR   1392442.
  5. Armitage, Peter; Colton, Theodore, eds. (2005-02-18). Encyclopedia of Biostatistics (1 ed.). Wiley. doi:10.1002/0470011815. ISBN   978-0-470-84907-1.
  6. Godambe, V. P., ed. (2002). Estimating functions. Oxford statistical science series (Repr ed.). Oxford: Clarendon Press. ISBN   978-0-19-852228-7.

Further reading