Part of a series on 
Economics 



By application 
Notable economists 
Glossary 
Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships.^{ [1] } More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference".^{ [2] } An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships".^{ [3] } The first known use of the term "econometrics" (in cognate form) was by Polish economist Paweł Ciompa in 1910.^{ [4] } Jan Tinbergen is considered by many to be one of the founding fathers of econometrics.^{ [5] }^{ [6] }^{ [7] } Ragnar Frisch is credited with coining the term in the sense in which it is used today.^{ [8] }
In linguistics, cognates are words that have a common etymological origin. Cognates are often inherited from a shared parent language, but they may also involve borrowings from some other language. For example, the English words dish and desk and the German word Tisch ("table") are cognates because they all come from Latin discus, which relates to their flat surfaces. Cognates may have evolved similar, different or even opposite meanings, but in most cases there are some similar sounds or letters in the words, in some cases appearing to be dissimilar. Some words sound similar, but do not come from the same root; these are called false cognates, while some are truly cognate but differ in meaning; these are called false friends.
Jan Tinbergen was an important Dutch economist. He was awarded the first Nobel Memorial Prize in Economic Sciences in 1969, which he shared with Ragnar Frisch for having developed and applied dynamic models for the analysis of economic processes. He is widely considered to be one of the most influential economists of the 20th century and one of the founding fathers of econometrics. It has been argued that the development of the first macroeconometric models, the solution of the identification problem, and the understanding of dynamic models are his three most important legacies to econometrics. Tinbergen was a founding trustee of Economists for Peace and Security. In 1945, he founded the Bureau for Economic Policy Analysis (CPB) and was the agency's first director.
A basic tool for econometrics is the multiple linear regression model.^{ [9] } Econometric theory uses statistical theory and mathematical statistics to evaluate and develop econometric methods.^{ [10] }^{ [11] } Econometricians try to find estimators that have desirable statistical properties including unbiasedness, efficiency, and consistency. Applied econometrics uses theoretical econometrics and realworld data for assessing economic theories, developing econometric models, analysing economic history, and forecasting.
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statisticaldecision problems and to statistical inference, and the actions and deductions that satisfy the basic principles stated for these different approaches. Within a given approach, statistical theory gives ways of comparing statistical procedures; it can find a best possible procedure within a given context for given statistical problems, or can provide guidance on the choice between alternative procedures.
Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished.
A basic tool for econometrics is the multiple linear regression model.^{ [9] } In modern econometrics, other statistical tools are frequently used, but linear regression is still the most frequently used starting point for an analysis.^{ [9] } Estimating a linear regression on two variables can be visualised as fitting a line through data points representing paired values of the independent and dependent variables.
For example, consider Okun's law, which relates GDP growth to the unemployment rate. This relationship is represented in a linear regression where the change in unemployment rate () is a function of an intercept (), a given value of GDP growth multiplied by a slope coefficient and an error term, :
In economics, Okun's law is an empirically observed relationship between unemployment and losses in a country's production. The "gap version" states that for every 1% increase in the unemployment rate, a country's GDP will be roughly an additional 2% lower than its potential GDP. The "difference version" describes the relationship between quarterly changes in unemployment and quarterly changes in real GDP. The stability and usefulness of the law has been disputed.
The unknown parameters and can be estimated. Here is estimated to be −1.77 and is estimated to be 0.83. This means that if GDP growth increased by one percentage point, the unemployment rate would be predicted to drop by 1.77 points. The model could then be tested for statistical significance as to whether an increase in growth is associated with a decrease in the unemployment, as hypothesized. If the estimate of were not significantly different from 0, the test would fail to find evidence that changes in the growth rate and unemployment rate were related. The variance in a prediction of the dependent variable (unemployment) as a function of the independent variable (GDP growth) is given in polynomial least squares.
In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis. More precisely, a study's defined significance level, denoted α, is the probability of the study rejecting the null hypothesis, given that the null hypothesis were assumed to be true; and the pvalue of a result, p, is the probability of obtaining a result at least as extreme, given that the null hypothesis were true. The result is statistically significant, by the standards of the study, when . The significance level for a study is chosen before data collection, and typically set to 5% or much lower, depending on the field of study.
In mathematical statistics, polynomial least squares comprises a broad range of statistical methods for estimating an underlying polynomial that describes observations. These methods include polynomial regression, curve fitting, linear regression, least squares, ordinary least squares, simple linear regression, linear least squares, approximation theory and method of moments. Polynomial least squares has applications in radar trackers, estimation theory, signal processing, statistics, and econometrics.
Econometric theory uses statistical theory and mathematical statistics to evaluate and develop econometric methods.^{ [10] }^{ [11] } Econometricians try to find estimators that have desirable statistical properties including unbiasedness, efficiency, and consistency. An estimator is unbiased if its expected value is the true value of the parameter; it is consistent if it converges to the true value as the sample size gets larger, and it is efficient if the estimator has lower standard error than other unbiased estimators for a given sample size. Ordinary least squares (OLS) is often used for estimation since it provides the BLUE or "best linear unbiased estimator" (where "best" means most efficient, unbiased estimator) given the GaussMarkov assumptions. When these assumptions are violated or other statistical properties are desired, other estimation techniques such as maximum likelihood estimation, generalized method of moments, or generalized least squares are used. Estimators that incorporate prior beliefs are advocated by those who favour Bayesian statistics over traditional, classical or "frequentist" approaches.
In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Unlike the ordinary English use of the term "bias", it is not pejorative even though it's not a desired property.
In the comparison of various statistical procedures, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator, experiment, or test needs fewer observations than a less efficient one to achieve a given performance. This article primarily deals with efficiency of estimators.
In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ_{0}—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ_{0}. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ_{0} converges to one.
Applied econometrics uses theoretical econometrics and realworld data for assessing economic theories, developing econometric models, analysing economic history, and forecasting.^{ [12] }
Econometrics may use standard statistical models to study economic questions, but most often they are with observational data, rather than in controlled experiments.^{ [13] } In this, the design of observational studies in econometrics is similar to the design of studies in other observational disciplines, such as astronomy, epidemiology, sociology and political science. Analysis of data from an observational study is guided by the study protocol, although exploratory data analysis may be useful for generating new hypotheses.^{ [14] } Economics often analyses systems of equations and inequalities, such as supply and demand hypothesized to be in equilibrium. Consequently, the field of econometrics has developed methods for identification and estimation of simultaneousequation models. These methods are analogous to methods used in other areas of science, such as the field of system identification in systems analysis and control theory. Such methods may allow researchers to estimate models and investigate their empirical consequences, without directly manipulating the system.
One of the fundamental statistical methods used by econometricians is regression analysis.^{ [15] } Regression methods are important in econometrics because economists typically cannot use controlled experiments. Econometricians often seek illuminating natural experiments in the absence of evidence from controlled experiments. Observational data may be subject to omittedvariable bias and a list of other problems that must be addressed using causal analysis of simultaneousequation models.^{ [16] }
In addition to natural experiments, quasiexperimental methods have been used increasingly commonly by econometricians since the 1980s, in order to credibly identify causal effects.^{ [17] }
A simple example of a relationship in econometrics from the field of labour economics is:
This example assumes that the natural logarithm of a person's wage is a linear function of the number of years of education that person has acquired. The parameter measures the increase in the natural log of the wage attributable to one more year of education. The term is a random variable representing all other factors that may have direct influence on wage. The econometric goal is to estimate the parameters, under specific assumptions about the random variable . For example, if is uncorrelated with years of education, then the equation can be estimated with ordinary least squares.
If the researcher could randomly assign people to different levels of education, the data set thus generated would allow estimation of the effect of changes in years of education on wages. In reality, those experiments cannot be conducted. Instead, the econometrician observes the years of education of and the wages paid to people who differ along many dimensions. Given this kind of data, the estimated coefficient on Years of Education in the equation above reflects both the effect of education on wages and the effect of other variables on wages, if those other variables were correlated with education. For example, people born in certain places may have higher wages and higher levels of education. Unless the econometrician controls for place of birth in the above equation, the effect of birthplace on wages may be falsely attributed to the effect of education on wages.
The most obvious way to control for birthplace is to include a measure of the effect of birthplace in the equation above. Exclusion of birthplace, together with the assumption that is uncorrelated with education produces a misspecified model. Another technique is to include in the equation additional set of measured covariates which are not instrumental variables, yet render identifiable.^{ [18] } An overview of econometric methods used to study this problem were provided by Card (1999).^{ [19] }
The main journals that publish work in econometrics are Econometrica , the Journal of Econometrics , the Review of Economics and Statistics , Econometric Theory , the Journal of Applied Econometrics , Econometric Reviews , the Econometrics Journal ,^{ [20] } Applied Econometrics and International Development , and the Journal of Business & Economic Statistics .
Like other forms of statistical analysis, badly specified econometric models may show a spurious relationship where two variables are correlated but causally unrelated. In a study of the use of econometrics in major economics journals, McCloskey concluded that some economists report pvalues (following the Fisherian tradition of tests of significance of point nullhypotheses) and neglect concerns of type II errors; some economists fail to report estimates of the size of effects (apart from statistical significance) and to discuss their economic importance. She also argues that some economists also fail to use economic reasoning for model selection, especially for deciding which variables to include in a regression.^{ [21] }^{ [22] }
In some cases, economic variables cannot be experimentally manipulated as treatments randomly assigned to subjects.^{ [23] } In such cases, economists rely on observational studies, often using data sets with many strongly associated covariates, resulting in enormous numbers of models with similar explanatory ability but different covariates and regression estimates. Regarding the plurality of models compatible with observational datasets, Edward Leamer urged that "professionals ... properly withhold belief until an inference can be shown to be adequately insensitive to the choice of assumptions".^{ [23] }
Wikimedia Commons has media related to Econometrics . 
Ultimately, all of these will require a common set of tools, including, for example, the multiple regression model, the use of moment conditions for estimation, instrumental variables (IV) and maximum likelihood estimation. With that in mind, the organization of this book is as follows: The first half of the text develops fundamental results that are common to all the applications. The concept of multiple regression and the linear regression model in particular constitutes the underlying platform of most modeling, even if the linear model itself is not ultimately used as the empirical specification.
Wikimedia Commons has media related to Econometrics . 
Look up econometrics in Wiktionary, the free dictionary. 
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the residuals made in the results of every single equation.
In statistics, the Gauss–Markov theorem states that in a linear regression model in which the errors are uncorrelated, have equal variances and expectation value of zero, the best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares (OLS) estimator, provided it exists. Here "best" means giving the lowest variance of the estimate, as compared to other unbiased, linear estimators. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator or ridge regression.
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model.
In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function.
In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation is a valid linear regression on its own and can be estimated separately, which is why the system is called seemingly unrelated, although some authors suggest that the term seemingly related would be more appropriate, since the error terms are assumed to be correlated across the equations.
Cochrane–Orcutt estimation is a procedure in econometrics, which adjusts a linear model for serial correlation in the error term. Developed in the 1940s, it is named after statisticians Donald Cochrane and Guy Orcutt.
In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinary least squares and weighted least squares can be statistically inefficient, or even give misleading inferences. GLS was first described by Alexander Aitken in 1934.
In statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include. For example, given personal income together with years of schooling and onthejob experience , we might specify a functional relationship as follows:
The Heckman correction is a statistical technique to correct bias from nonrandomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptually, this is achieved by explicitly modelling the individual sampling probability of each observation together with the conditional expectation of the dependent variable. The resulting likelihood function is mathematically similar to the Tobit model for censored dependent variables, a connection first drawn by James Heckman in 1976. Heckman also developed a twostep control function approach to estimate this model, which reduced the computional burden of having to estimate both equations jointly, albeit at the cost of inefficiency. Heckman received the Nobel Memorial Prize in Economic Sciences in 2000 for his work in this field.
In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y x), and has been used to describe nonlinear phenomena such as the growth rate of tissues, the distribution of carbon isotopes in lake sediments, and the progression of disease epidemics. Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y  x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.
An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a longrun stochastic trend, also known as cointegration. ECMs are a theoreticallydriven approach useful for estimating both shortterm and longterm effects of one time series on another. The term errorcorrection relates to the fact that lastperiod's deviation from a longrun equilibrium, the error, influences its shortrun dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.
In statistics, errorsinvariables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.
Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.
Metaregression is a tool used in metaanalysis to examine the impact of moderator variables on study effect size using regressionbased techniques. Metaregression is more effective at this task than are standard metaanalytic techniques.
In statistics, linear regression is a linear approach to modeling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
Optimal Instruments