Econometrics

Last updated

Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. [1] More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference." [2] An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships." [3] Jan Tinbergen is one of the two founding fathers of econometrics. [4] [5] [6] The other, Ragnar Frisch, also coined the term in the sense in which it is used today. [7]

Contents

A basic tool for econometrics is the multiple linear regression model. [8] Econometric theory uses statistical theory and mathematical statistics to evaluate and develop econometric methods. [9] [10] Econometricians try to find estimators that have desirable statistical properties including unbiasedness, efficiency, and consistency. Applied econometrics uses theoretical econometrics and real-world data for assessing economic theories, developing econometric models, analysing economic history, and forecasting.

Basic models: linear regression

A basic tool for econometrics is the multiple linear regression model. [8] In modern econometrics, other statistical tools are frequently used, but linear regression is still the most frequently used starting point for an analysis. [8] Estimating a linear regression on two variables can be visualized as fitting a line through data points representing paired values of the independent and dependent variables.

Okun's law representing the relationship between GDP growth and the unemployment rate. The fitted line is found using regression analysis. Okuns law differences 1948 to mid 2011.png
Okun's law representing the relationship between GDP growth and the unemployment rate. The fitted line is found using regression analysis.

For example, consider Okun's law, which relates GDP growth to the unemployment rate. This relationship is represented in a linear regression where the change in unemployment rate () is a function of an intercept (), a given value of GDP growth multiplied by a slope coefficient and an error term, :

The unknown parameters and can be estimated. Here is estimated to be 0.83 and is estimated to be -1.77. This means that if GDP growth increased by one percentage point, the unemployment rate would be predicted to drop by 1.77 * 1 points, other things held constant. The model could then be tested for statistical significance as to whether an increase in GDP growth is associated with a decrease in the unemployment, as hypothesized. If the estimate of were not significantly different from 0, the test would fail to find evidence that changes in the growth rate and unemployment rate were related. The variance in a prediction of the dependent variable (unemployment) as a function of the independent variable (GDP growth) is given in polynomial least squares.

Theory

Econometric theory uses statistical theory and mathematical statistics to evaluate and develop econometric methods. [9] [10] Econometricians try to find estimators that have desirable statistical properties including unbiasedness, efficiency, and consistency. An estimator is unbiased if its expected value is the true value of the parameter; it is consistent if it converges to the true value as the sample size gets larger, and it is efficient if the estimator has lower standard error than other unbiased estimators for a given sample size. Ordinary least squares (OLS) is often used for estimation since it provides the BLUE or "best linear unbiased estimator" (where "best" means most efficient, unbiased estimator) given the Gauss-Markov assumptions. When these assumptions are violated or other statistical properties are desired, other estimation techniques such as maximum likelihood estimation, generalized method of moments, or generalized least squares are used. Estimators that incorporate prior beliefs are advocated by those who favour Bayesian statistics over traditional, classical or "frequentist" approaches.

Methods

Applied econometrics uses theoretical econometrics and real-world data for assessing economic theories, developing econometric models, analysing economic history, and forecasting. [11]

Econometrics uses standard statistical models to study economic questions, but most often these are based on observational data, rather than data from controlled experiments. [12] In this, the design of observational studies in econometrics is similar to the design of studies in other observational disciplines, such as astronomy, epidemiology, sociology and political science. Analysis of data from an observational study is guided by the study protocol, although exploratory data analysis may be useful for generating new hypotheses. [13] Economics often analyses systems of equations and inequalities, such as supply and demand hypothesized to be in equilibrium. Consequently, the field of econometrics has developed methods for identification and estimation of simultaneous equations models. These methods are analogous to methods used in other areas of science, such as the field of system identification in systems analysis and control theory. Such methods may allow researchers to estimate models and investigate their empirical consequences, without directly manipulating the system.

In the absence of evidence from controlled experiments, econometricians often seek illuminating natural experiments or apply quasi-experimental methods to draw credible causal inference. [14] The methods include regression discontinuity design, instrumental variables, and difference-in-differences.

Example

A simple example of a relationship in econometrics from the field of labour economics is:

This example assumes that the natural logarithm of a person's wage is a linear function of the number of years of education that person has acquired. The parameter measures the increase in the natural log of the wage attributable to one more year of education. The term is a random variable representing all other factors that may have direct influence on wage. The econometric goal is to estimate the parameters, under specific assumptions about the random variable . For example, if is uncorrelated with years of education, then the equation can be estimated with ordinary least squares.

If the researcher could randomly assign people to different levels of education, the data set thus generated would allow estimation of the effect of changes in years of education on wages. In reality, those experiments cannot be conducted. Instead, the econometrician observes the years of education of and the wages paid to people who differ along many dimensions. Given this kind of data, the estimated coefficient on years of education in the equation above reflects both the effect of education on wages and the effect of other variables on wages, if those other variables were correlated with education. For example, people born in certain places may have higher wages and higher levels of education. Unless the econometrician controls for place of birth in the above equation, the effect of birthplace on wages may be falsely attributed to the effect of education on wages.

The most obvious way to control for birthplace is to include a measure of the effect of birthplace in the equation above. Exclusion of birthplace, together with the assumption that is uncorrelated with education produces a misspecified model. Another technique is to include in the equation additional set of measured covariates which are not instrumental variables, yet render identifiable. [15] An overview of econometric methods used to study this problem were provided by Card (1999). [16]

Journals

The main journals that publish work in econometrics are:

Limitations and criticisms

Like other forms of statistical analysis, badly specified econometric models may show a spurious relationship where two variables are correlated but causally unrelated. In a study of the use of econometrics in major economics journals, McCloskey concluded that some economists report p-values (following the Fisherian tradition of tests of significance of point null-hypotheses) and neglect concerns of type II errors; some economists fail to report estimates of the size of effects (apart from statistical significance) and to discuss their economic importance. She also argues that some economists also fail to use economic reasoning for model selection, especially for deciding which variables to include in a regression. [25] [26]

In some cases, economic variables cannot be experimentally manipulated as treatments randomly assigned to subjects. [27] In such cases, economists rely on observational studies, often using data sets with many strongly associated covariates, resulting in enormous numbers of models with similar explanatory ability but different covariates and regression estimates. Regarding the plurality of models compatible with observational data-sets, Edward Leamer urged that "professionals ... properly withhold belief until an inference can be shown to be adequately insensitive to the choice of assumptions". [27]

See also

Further reading

Related Research Articles

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more error-free independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

<span class="mw-page-title-main">Consistent estimator</span> Statistical estimator converging in probability to a true parameter as sample size increases

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

<span class="mw-page-title-main">Coefficient of determination</span> Indicator for how well data points fit a line or curve

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model.

<span class="mw-page-title-main">Ordinary least squares</span> Method for estimating the unknown parameters in a linear regression model

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

<span class="mw-page-title-main">Simple linear regression</span> Linear regression model with a single explanatory variable

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation is a valid linear regression on its own and can be estimated separately, which is why the system is called seemingly unrelated, although some authors suggest that the term seemingly related would be more appropriate, since the error terms are assumed to be correlated across the equations.

In statistics, the Breusch–Pagan test, developed in 1979 by Trevor Breusch and Adrian Pagan, is used to test for heteroskedasticity in a linear regression model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983. Derived from the Lagrange multiplier test principle, it tests whether the variance of the errors from a regression is dependent on the values of the independent variables. In that case, heteroskedasticity is present.

Cochrane–Orcutt estimation is a procedure in econometrics, which adjusts a linear model for serial correlation in the error term. Developed in the 1940s, it is named after statisticians Donald Cochrane and Guy Orcutt.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptually, this is achieved by explicitly modelling the individual sampling probability of each observation together with the conditional expectation of the dependent variable. The resulting likelihood function is mathematically similar to the tobit model for censored dependent variables, a connection first drawn by James Heckman in 1974. Heckman also developed a two-step control function approach to estimate this model, which avoids the computational burden of having to estimate both equations jointly, albeit at the cost of inefficiency. Heckman received the Nobel Memorial Prize in Economic Sciences in 2000 for his work in this field.

<span class="mw-page-title-main">Polynomial regression</span> Statistics concept

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-period's deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.

<span class="mw-page-title-main">Errors-in-variables models</span> Regression models accounting for possible errors in independent variables

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.

<span class="mw-page-title-main">Homoscedasticity and heteroscedasticity</span> Statistical property

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used. “Skedasticity” comes from the Ancient Greek word “skedánnymi”, meaning “to scatter”. Assuming a variable is homoscedastic when in reality it is heteroscedastic results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.

References

  1. M. Hashem Pesaran (1987). "Econometrics", The New Palgrave: A Dictionary of Economics , v. 2, p. 8 [pp. 8–22]. Reprinted in J. Eatwell et al., eds. (1990). Econometrics: The New Palgrave, p. 1 Archived 15 March 2023 at the Wayback Machine [pp. 1–34]. Abstract Archived 18 May 2012 at the Wayback Machine (2008 revision by J. Geweke, J. Horowitz, and H. P. Pesaran).
  2. P. A. Samuelson, T. C. Koopmans, and J. R. N. Stone (1954). "Report of the Evaluative Committee for Econometrica", Econometrica 22(2), p. 142. [p p. 141-146], as described and cited in Pesaran (1987) above.
  3. Paul A. Samuelson and William D. Nordhaus, 2004. Economics . 18th ed., McGraw-Hill, p. 5.
  4. "1969 - Jan Tinbergen: Nobelprijs economie - Elsevierweekblad.nl". elsevierweekblad.nl. 12 October 2015. Archived from the original on 1 May 2018. Retrieved 1 May 2018.
  5. Magnus, Jan & Mary S. Morgan (1987) The ET Interview: Professor J. Tinbergen in: 'Econometric Theory 3, 1987, 117–142.
  6. Willlekens, Frans (2008) International Migration in Europe: Data, Models and Estimates. New Jersey. John Wiley & Sons: 117.
  7. • H. P. Pesaran (1990), "Econometrics", Econometrics: The New Palgrave, p. 2 Archived 15 March 2023 at the Wayback Machine , citing Ragnar Frisch (1936), "A Note on the Term 'Econometrics'", Econometrica, 4(1), p. 95.
        Aris Spanos (2008), "statistics and economics", The New Palgrave Dictionary of Economics , 2nd Edition. Abstract. Archived 18 May 2012 at the Wayback Machine
  8. 1 2 3 Greene, William (2012). "Chapter 1: Econometrics". Econometric Analysis (7th ed.). Pearson Education. pp. 47–48. ISBN   9780273753568. Ultimately, all of these will require a common set of tools, including, for example, the multiple regression model, the use of moment conditions for estimation, instrumental variables (IV) and maximum likelihood estimation. With that in mind, the organization of this book is as follows: The first half of the text develops fundamental results that are common to all the applications. The concept of multiple regression and the linear regression model in particular constitutes the underlying platform of most modeling, even if the linear model itself is not ultimately used as the empirical specification.
  9. 1 2 Greene, William (2012). Econometric Analysis (7th ed.). Pearson Education. pp. 34, 41–42. ISBN   9780273753568.
  10. 1 2 Wooldridge, Jeffrey (2012). "Chapter 1: The Nature of Econometrics and Economic Data". Introductory Econometrics: A Modern Approach (5th ed.). South-Western Cengage Learning. p. 2. ISBN   9781111531041.
  11. Clive Granger (2008). "forecasting", The New Palgrave Dictionary of Economics, 2nd Edition. Abstract. Archived 18 May 2012 at the Wayback Machine
  12. Wooldridge, Jeffrey (2013). Introductory Econometrics, A modern approach. South-Western, Cengage learning. ISBN   978-1-111-53104-1.
  13. Herman O. Wold (1969). "Econometrics as Pioneering in Nonexperimental Model Building", Econometrica, 37(3), pp. 369 Archived 24 August 2017 at the Wayback Machine -381.
  14. Angrist, Joshua D.; Pischke, Jörn-Steffen (May 2010). "The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics". Journal of Economic Perspectives. 24 (2): 3–30. doi: 10.1257/jep.24.2.3 . hdl: 1721.1/54195 . ISSN   0895-3309.
  15. Pearl, Judea (2000). Causality: Model, Reasoning, and Inference . Cambridge University Press. ISBN   978-0521773621.
  16. Card, David (1999). "The Causal Effect of Education on Earning". In Ashenfelter, O.; Card, D. (eds.). Handbook of Labor Economics. Amsterdam: Elsevier. pp. 1801–1863. ISBN   978-0444822895.
  17. "Home". www.econometricsociety.org. Retrieved 14 February 2024.
  18. "The Review of Economics and Statistics". direct.mit.edu. Retrieved 14 February 2024.
  19. "The Econometrics Journal". Wiley.com. Archived from the original on 6 October 2011. Retrieved 8 October 2013.
  20. "Journal of Econometrics". www.scimagojr.com. Retrieved 14 February 2024.
  21. "Home" . Retrieved 14 March 2024.
  22. "Journal of Applied Econometrics". Journal of Applied Econometrics.
  23. Econometric Reviews Print ISSN: 0747-4938 Online ISSN: 1532-4168 https://www.tandfonline.com/action/journalInformation?journalCode=lecr20
  24. "Journals". Default. Retrieved 14 February 2024.
  25. McCloskey (May 1985). "The Loss Function has been mislaid: the Rhetoric of Significance Tests". American Economic Review. 75 (2).
  26. Stephen T. Ziliak and Deirdre N. McCloskey (2004). "Size Matters: The Standard Error of Regressions in the American Economic Review", Journal of Socio-Economics, 33(5), pp. 527-46 Archived 25 June 2010 at the Wayback Machine (press +).
  27. 1 2 Leamer, Edward (March 1983). "Let's Take the Con out of Econometrics". American Economic Review. 73 (1): 31–43. JSTOR   1803924.