Optimal instruments

Last updated

In statistics and econometrics, optimal instruments are a technique for improving the efficiency of estimators in conditional moment models, a class of semiparametric models that generate conditional expectation functions. To estimate parameters of a conditional moment model, the statistician can derive an expectation function (defining "moment conditions") and use the generalized method of moments (GMM). However, there are infinitely many moment conditions that can be generated from a single model; optimal instruments provide the most efficient moment conditions.

Contents

As an example, consider the nonlinear regression model

where y is a scalar (one-dimensional) random variable, x is a random vector with dimension k, and θ is a k-dimensional parameter. The conditional moment restriction is consistent with infinitely many moment conditions. For example:

More generally, for any vector-valued function z of x, it will be the case that

.

That is, z defines a finite set of orthogonality conditions.

A natural question to ask, then, is whether an asymptotically efficient set of conditions is available, in the sense that no other set of conditions achieves lower asymptotic variance. [1] Both econometricians [2] [3] and statisticians [4] have extensively studied this subject.

The answer to this question is generally that this finite set exists and have been proven for a wide range of estimators. Takeshi Amemiya was one of the first to work on this problem and show the optimal number of instruments for nonlinear simultaneous equation models with homoskedastic and serially uncorrelated errors. [5] The form of the optimal instruments was characterized by Lars Peter Hansen, [6] and results for nonparametric estimation of optimal instruments are provided by Newey. [7] A result for nearest neighbor estimators was provided by Robinson. [8]

In linear regression

The technique of optimal instruments can be used to show that, in a conditional moment linear regression model with iid data, the optimal GMM estimator is generalized least squares. Consider the model

where y is a scalar random variable, x is a k-dimensional random vector, and θ is a k-dimensional parameter vector. As above, the moment conditions are

where z = z(x) is an instrument set of dimension p (pk). The task is to choose z to minimize the asymptotic variance of the resulting GMM estimator. If the data are iid, the asymptotic variance of the GMM estimator is

where .

The optimal instruments are given by

which produces the asymptotic variance matrix

These are the optimal instruments because for any other z, the matrix

is positive semidefinite.

Given iid data , the GMM estimator corresponding to is

which is the generalized least squares estimator. (It is unfeasible because σ2(·) is unknown.) [1]

Related Research Articles

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In econometrics, the autoregressive conditional heteroskedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE) was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models. The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression. MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models. MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

<span class="mw-page-title-main">Half-normal distribution</span> Probability distribution

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) from observations an estimator is called minimax if its maximal risk is minimal among all estimators of . In a sense this means that is an estimator which performs best in the worst possible case allowed in the problem.

<span class="mw-page-title-main">Errors-in-variables models</span> Regression models accounting for possible errors in independent variables

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology.

In statistics, the Innovation Method provides an estimator for the parameters of stochastic differential equations given a time series of observations of the state variables. In the framework of continuous-discrete state space models, the innovation estimator is obtained by maximizing the log-likelihood of the corresponding discrete-time innovation process with respect to the parameters. The innovation estimator can be classified as a M-estimator, a quasi-maximum likelihood estimator or a prediction error estimator depending on the inferential considerations that want to be emphasized. The innovation method is a system identification technique for developing mathematical models of dynamical systems from measured data and for the optimal design of experiments.

References

  1. 1 2 Arellano, M. (2009). "Generalized Method of Moments and Optimal Instruments" (PDF). Class notes.
  2. Chamberlain, G. (1987). "Asymptotic Efficiency in Estimation with Conditional Moment Restrictions". Journal of Econometrics . 34 (3): 305–334. doi:10.1016/0304-4076(87)90015-7.
  3. Newey, W. K. (1988). "Adaptive Estimation of Regression Models via Moment Restrictions". Journal of Econometrics. 38 (3): 301–339. doi:10.1016/0304-4076(88)90048-6.
  4. Liang, K-Y.; Zeger, S. L. (1986). "Longitudinal Data Analysis using Generalized Linear Models". Biometrika . 73 (1): 13–22. doi: 10.1093/biomet/73.1.13 .
  5. Amemiya, T. (1977). "The Maximum Likelihood and the Nonlinear Three-Stage Least Squares Estimator in the General Nonlinear Simultaneous Equation Model". Econometrica . 45 (4): 955–968. doi:10.2307/1912684. JSTOR   1912684.
  6. Hansen, L. P. (1985). "A Method of Calculating Bounds on the Asymptotic Covariance Matrices of Generalized Method of Moments Estimators". Journal of Econometrics. 30 (1–2): 203–238. doi:10.1016/0304-4076(85)90138-1.
  7. Newey, W. K. (1990). "Efficient Instrumental Variables Estimation of Nonlinear Models". Econometrica. 58 (4): 809–837. doi:10.2307/2938351. JSTOR   2938351.
  8. Robinson, P. (1987). "Asymptotically Efficient Estimation in the Presence of Heteroskedasticity of Unknown Form". Econometrica. 55 (4): 875–891. doi:10.2307/1911033. JSTOR   1911033.

Further reading