Dynamic unobserved effects model

Last updated

A dynamic unobserved effects model is a statistical model used in econometrics for panel analysis. It is characterized by the influence of previous values of the dependent variable on its present value, and by the presence of unobservable explanatory variables.

Contents

The term “dynamic” here means the dependence of the dependent variable on its past history; this is usually used to model the “state dependence” in economics. For instance, for a person who cannot find a job this year, it will be harder to find a job next year because her present lack of a job will be a negative signal for the potential employers. “Unobserved effects” means that one or some of the explanatory variables are unobservable: for example, consumption choice of one flavor of ice cream over another is a function of personal preference, but preference is unobservable.

Continuous dependent variable

Censored dependent variable

In a panel data tobit model, [1] [2] if the outcome partially depends on the previous outcome history this tobit model is called "dynamic". For instance, taking a person who finds a job with a high salary this year, it will be easier for her to find a job with a high salary next year because the fact that she has a high-wage job this year will be a very positive signal for the potential employers. The essence of this type of dynamic effect is the state dependence of the outcome. The "unobservable effects" here refers to the factor which partially determines the outcome of individual but cannot be observed in the data. For instance, the ability of a person is very important in job-hunting, but it is not observable for researchers. A typical dynamic unobserved effects tobit model can be represented as

In this specific model, is the dynamic effect part and is the unobserved effect part whose distribution is determined by the initial outcome of individual i and some exogenous features of individual i.

Based on this setup, the likelihood function conditional on can be given as

For the initial values ,there are two different ways to treat them in the construction of the likelihood function: treating them as constant, or imposing a distribution on them and calculate out the unconditional likelihood function. But whichever way is chosen to treat the initial values in the likelihood function, we cannot get rid of the integration inside the likelihood function when estimating the model by maximum likelihood estimation (MLE). Expectation Maximum (EM) algorithm is usually a good solution for this computation issue. [3] Based on the consistent point estimates from MLE, Average Partial Effect (APE) [4] can be calculated correspondingly. [5]

Binary dependent variable

Formulation

A typical dynamic unobserved effects model with a binary dependent variable is represented [6] as:

where ci is an unobservable explanatory variable, zit are explanatory variables which are exogenous conditional on the ci, and G(∙) is a cumulative distribution function.

Estimates of parameters

In this type of model, economists have a special interest in ρ, which is used to characterize the state dependence. For example, yi,t can be a woman's choice whether to work or not, zit includes the i-th individual's age, education level, number of children, and other factors. ci can be some individual specific characteristic which cannot be observed by economists. [7] It is a reasonable conjecture that one's labor choice in period t should depend on his or her choice in period t  1 due to habit formation or other reasons. This dependence is characterized by parameter ρ.

There are several MLE-based approaches to estimate δ and ρ consistently. The simplest way is to treat yi,0 as non-stochastic and assume ci is independent with zi. Then by integrating P(yi,t , yi,t-1 , … , yi,1 | yi,0 , zi , ci) against the density of ci, we can obtain the conditional density P(yi,t , yi,t-1 , ... , yi,1 |yi,0 , zi). The objective function for the conditional MLE can be represented as: log (P (yi,t , yi,t-1, … , yi,1 | yi,0 , zi)).

Treating yi,0 as non-stochastic implicitly assumes the independence of yi,0 on zi. But in most cases in reality, yi,0 depends on ci and ci also depends on zi. An improvement on the approach above is to assume a density of yi,0 conditional on (ci, zi) and conditional likelihood P(yi,t , yi,t-1 , … , yt,1,yi,0 | ci, zi) can be obtained. By integrating this likelihood against the density of ci conditional on zi, we can obtain the conditional density P(yi,t , yi,t-1 , … , yi,1 , yi,0 | zi). The objective function for the conditional MLE [8] is log (P (yi,t , yi,t-1, … , yi,1 | yi,0 , zi)).

Based on the estimates for (δ, ρ) and the corresponding variance, values of the coefficients can be tested [9] and the average partial effect can be calculated. [10]

Related Research Articles

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

Pearson correlation coefficient

In statistics, the Pearson correlation coefficient, also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1. A value of +1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.

Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined with the dependent variable, which in economics usually is the consequence of some underlying equilibrium mechanism. For instance, in the simple model of supply and demand, price and quantity are jointly determined.

In statistics and econometrics, particularly in regression analysis, a dummy variable is one that takes only the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. They can be thought of as numeric stand-ins for qualitative facts in a regression model, sorting data into mutually exclusive categories.

In econometrics, the autoregressive conditional heteroscedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

Heteroscedasticity

In statistics, a vector of random variables is heteroscedastic if the variability of the random disturbance is different across elements of the vector. Here, variability could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. A typical example is the set of observations of income in different cities.

Empirical Bayes methods are procedures for statistical inference in which the prior distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out. Empirical Bayes, also known as maximum marginal likelihood, represents one approach for setting hyperparameters.

In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

In statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to those that were included.

Panel (data) analysis is a statistical method, widely used in social science, epidemiology, and econometrics to analyze two-dimensional panel data. The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions. Multidimensional analysis is an econometric method in which data are collected over more than two dimensions.

In statistics, a tobit model is any of a class of regression models in which the observed range of the dependent variable is censored in some way. The term was coined by Arthur Goldberger in reference to James Tobin, who developed the model in 1958 to mitigate the problem of zero-inflated data for observations of household expenditure on durable goods. Because Tobin's method can be easily extended to handle truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. If we are interested in finding to what extent there is a numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another, confounding, variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptually, this is achieved by explicitly modelling the individual sampling probability of each observation together with the conditional expectation of the dependent variable. The resulting likelihood function is mathematically similar to the Tobit model for censored dependent variables, a connection first drawn by James Heckman in 1976. Heckman also developed a two-step control function approach to estimate this model, which avoids the computional burden of having to estimate both equations jointly, albeit at the cost of inefficiency. Heckman received the Nobel Memorial Prize in Economic Sciences in 2000 for his work in this field.

In econometrics, Prais–Winsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of Cochrane–Orcutt estimation in the sense that it does not lose the first observation, which leads to more efficiency as a result and makes it a special case of feasible generalized least squares.

Arellano–Bond estimator

In econometrics, the Arellano–Bond estimator is a generalized method of moments estimator used to estimate dynamic models of panel data. It was proposed in 1991 by Manuel Arellano and Stephen Bond, based on the earlier work by Alok Bhargava and John Denis Sargan in 1983, for addressing certain endogeneity problems. The GMM-SYS estimator is a system that contains both the levels and the first difference equations. It provides an alternative to the standard first difference GMM estimator.

Partial (pooled) likelihood estimation for panel data is a quasi-maximum likelihood method for panel analysis that assumes that density of yit given xit is correctly specified for each time period but it allows for misspecification in the conditional density of yi≔(yi1,…,yiT) given xi≔(xi1,…,xiT).

In linear panel analysis, it can be desirable to estimate the magnitude of the fixed effects, as they provide measures of the unobserved components. For instance, in wage equation regressions, Fixed Effects capture ability measures that are constant over time, such as motivation. Chamberlain's approach to unobserved effects models is a way of estimating the linear unobserved effects, under Fixed Effect assumptions, in the following unobserved effects model

Control functions are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term. The approach thereby differs in important ways from other models that try to account for the same econometric problem. Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z. Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.

Given a probit model y=1[y* > 0] where y* = x1 β + zδ + u, and u ~ N(0,1), without losing generality, z can be represented as z = x1 θ1 + x2 θ2 + v. When u is correlated with v, there will be an issue of endogeneity. This can be caused by omitted variables and measurement errors. There are also many cases where z is partially determined by y and endogeneity issue arises. For instance, in a model evaluating the effect of different patient features on their choice of whether going to hospital, y is the choice and z is the amount of the medicine a respondent took, then it is very intuitive that more often the respondent goes to hospital, it is more likely that she took more medicine, hence endogeneity issue arises. When there are endogenous explanatory variables, the estimator generated by usual estimation procedure will be inconsistent, then the corresponding estimated Average Partial Effect (APE) will be inconsistent, too.

References

  1. Greene, W. H. (2003). Econometric Analysis. Upper Saddle River, NJ: Prentice Hall.
  2. The model framework comes from Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data . Cambridge, Mass: MIT Press. p.  542. But the author revises the model more general here.
  3. For more details, refer to: Cappé, O.; Moulines, E.; Ryden, T. (2005). "Part II: Parameter Inference". Inference in Hidden Markov Models. New York: Springer-Verlag.
  4. Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data . Cambridge, Mass: MIT Press. p.  22.
  5. For more details, refer to: Amemiya, Takeshi (1984). "Tobit models: A survey". Journal of Econometrics. 24 (1–2): 3–61. doi:10.1016/0304-4076(84)90074-5.
  6. Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass, pp 300.
  7. James J. Heckman (1981): Studies in Labor Markets, University of Chicago Press, Chapter Heterogeneity and State Dependence
  8. Greene, W. H. (2003), Econometric Analysis , Prentice Hall , Upper Saddle River, NJ .
  9. Whitney K. Newey, Daniel McFadden, Chapter 36 Large sample estimation and hypothesis testing, In: Robert F. Engle and Daniel L. McFadden, Editor(s), Handbook of Econometrics, Elsevier, 1994, Volume 4, Pages 2111–2245, ISSN   1573-4412, ISBN   9780444887665,
  10. Chamberlain, G. (1980), “Analysis of Covariance with Qualitative Data,” Journal of Econometrics 18, 5–46