Fixed-effect Poisson model

Last updated

In statistics, fixed-effect Poisson models are used for static panel data when the outcome variable is count data. Hausman, Hall, and Griliches pioneered the method in the mid 1980s. Their outcome of interest was the number of patents filed by firms, where they wanted to develop methods to control for the firm fixed effects. [1] Linear panel data models use the linear additivity of the fixed effects to difference them out and circumvent the incidental parameter problem. Even though Poisson models are inherently nonlinear, the use of the linear index and the exponential link function lead to multiplicative separability, more specifically [2]

E[yitxi1... xiT, ci ] = m(xit, ci, b0 ) = exp(ci + xitb0 ) = ai exp(xitb0 ) = μti (1)

This formula looks very similar to the standard Poisson premultiplied by the term ai. As the conditioning set includes the observables over all periods, we are in the static panel data world and are imposing strict exogeneity. [3] Hausman, Hall, and Griliches then use Andersen's conditional Maximum Likelihood methodology to estimate b0. Using ni = ∑ yit allows them to obtain the following nice distributional result of yi

yini, xi, ci ∼ Multinomial (ni, p1 (xi, b0), ..., pT (xi, b0 )) (2) where
[4]

At this point, the estimation of the fixed-effect Poisson model is transformed in a useful way and can be estimated by maximum-likelihood estimation techniques for multinomial log likelihoods. This is computationally not necessarily very restrictive, but the distributional assumptions up to this point are fairly stringent. Wooldridge provided evidence that these models have nice robustness properties as long as the conditional mean assumption (i.e. equation 1) holds. [5] Chamberlain also provided semi-parametric efficiency bounds for these estimators under slightly weaker exogeneity assumptions. However, these bounds are practically difficult to attain, as the proposed methodology needs high-dimensional nonparametric regressions for attaining these bounds.

See also

Related Research Articles

Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined with the dependent variable, which in economics usually is the consequence of some underlying equilibrium mechanism. Take the typical supply and demand model: whilst typically one would determine the quantity supplied and demanded to be a function of the price set by the market, it is also possible for the reverse to be true, where producers observe the quantity that consumers demand and then set the price.

Jerry Allen Hausman is the John and Jennie S. MacDonald Professor of Economics at the Massachusetts Institute of Technology and a notable econometrician. He has published numerous influential papers in microeconometrics. Hausman is the recipient of several prestigious awards including the John Bates Clark Medal in 1985 and the Frisch Medal in 1980.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

In statistics and econometrics, panel data and longitudinal data are both multi-dimensional data involving measurements over time. Panel data is a subset of longitudinal data where observations are for the same subjects each time.

In statistics, a fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are random variables. In many applications including econometrics and biostatistics a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population. Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity.

In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of hierarchical linear model, which assumes that the data being analysed are drawn from a hierarchy of different populations whose differences relate to that hierarchy. In econometrics, random effects models are used in panel analysis of hierarchical or panel data when one assumes no fixed effects. A random effects model is a special case of a mixed model.

In statistics and econometrics, the first-difference (FD) estimator is an estimator used to address the problem of omitted variables with panel data. It is consistent under the assumptions of the fixed effects model. In certain situations it can be more efficient than the standard fixed effects estimator.

NLOGIT

NLOGIT is an extension of the econometric and statistical software package LIMDEP. In addition to the estimation tools in LIMDEP, NLOGIT provides programs for estimation, model simulation and analysis of multinomial choice data, such as brand choice, transportation mode and for survey and market data in which consumers choose among a set of competing alternatives.

In econometrics, the Arellano–Bond estimator is a generalized method of moments estimator used to estimate dynamic models of panel data. It was proposed in 1991 by Manuel Arellano and Stephen Bond, based on the earlier work by Alok Bhargava and John Denis Sargan in 1983, for addressing certain endogeneity problems. The GMM-SYS estimator is a system that contains both the levels and the first difference equations. It provides an alternative to the standard first difference GMM estimator.

Partial (pooled) likelihood estimation for panel data is a quasi-maximum likelihood method for panel analysis that assumes that density of yit given xit is correctly specified for each time period but it allows for misspecification in the conditional density of yi≔(yi1,…,yiT) given xi≔(xi1,…,xiT).

In linear panel analysis, it can be desirable to estimate the magnitude of the fixed effects, as they provide measures of the unobserved components. For instance, in wage equation regressions, Fixed Effects capture ability measures that are constant over time, such as motivation. Chamberlain's approach to unobserved effects models is a way of estimating the linear unobserved effects, under Fixed Effect assumptions, in the following unobserved effects model

Control functions are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term. The approach thereby differs in important ways from other models that try to account for the same econometric problem. Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z. Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.

In applied statistics, fractional models are, to some extent, related to binary response models. However, instead of estimating the probability of being in one bin of a dichotomous variable, the fractional model typically deals with variables that take on all possible values in the unit interval. One can easily generalize this model to take on values on any other interval by appropriate transformations. Examples range from participation rates in 401(k) plans to television ratings of NBA games.

Issues of heterogeneity in duration models can take on different forms. On the one hand, unobserved heterogeneity can play a crucial role when it comes to different sampling methods, such as stock or flow sampling. On the other hand, duration models have also been extended to allow for different subpopulations, with a strong link to mixture models. Many of these models impose the assumptions that heterogeneity is independent of the observed covariates, it has a distribution that depends on a finite number of parameters only, and it enters the hazard function multiplicatively.

A dynamic unobserved effects model is a statistical model used in econometrics for panel analysis. It is characterized by the influence of previous values of the dependent variable on its present value, and by the presence of unobservable explanatory variables.

In statistics and econometrics, the maximum score estimator is a nonparametric estimator for discrete choice models developed by Charles Manski in 1975. Unlike the multinomial probit and multinomial logit estimators, it makes no assumptions about the distribution of the unobservable part of utility. However, its statistical properties are more complicated than the multinomial probit and logit models, making statistical inference difficult. To address these issues, Joel Horowitz proposed a variant, called the smoothed maximum score estimator.

In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Grouped duration data are widespread in many applications. Unemployment durations are typically measured over weeks or months and these time intervals may be considered too large for continuous approximations to hold. In this case, we will typically have grouping points , where . Models allow for time-invariant and time-variant covariates, but the latter require stronger assumptions in terms of exogeneity. The discrete-time hazard function can be written as:

In statistics, in particular in the design of experiments, a multi-valued treatment is a treatment that can take on more than two values. It is related to the dose-response model in the medical literature.

References

  1. Hausman, J. A., B. H. Hall, and Z. Griliches (1984): "Econometric Models for Count Data with an Application to the Patents-R&D Relationship." Econometrica (46), pp. 909–938
  2. Cameron, C. A. and P. K. Trivedi (2015) "Count Panel Data," Oxford Handbook of Panel Data, ed. by B. Baltagi, Oxford University Press, pp. 233–256
  3. Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.
  4. Andersen, E. B. (1970): "Asymptotic Properties of Conditional Maximum Likelihood Estimators." Journal of the Royal Statistical Society, Series B, 32, pp. 283–301
  5. Wooldridge, J. M. (1999): "Distribution-Free Estimation of Some Nonlinear Panel Data Models." Journal of Econometrics (90), pp. 77–97