# Granger causality

Last updated When time series X Granger-causes time series Y, the patterns in X are approximately repeated in Y after some time lag (two examples are indicated with arrows). Thus, past values of X can be used for the prediction of future values of Y.

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969.  Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". 

A statistical hypothesis, sometimes called confirmatory data analysis, is a hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables. A statistical hypothesis test is a method of statistical inference. Commonly, two statistical data sets are compared, or a data set obtained by sampling is compared against a synthetic data set from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis that proposes no relationship between two data sets. The comparison is deemed statistically significant if the relationship between the data sets would be an unlikely realization of the null hypothesis according to a threshold probability—the significance level. Hypothesis tests are used when determining what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of significance. A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

Forecasting is the process of making predictions of the future based on past and present data and most commonly by analysis of trends. A commonplace example might be estimation of some variable of interest at some specified future date. Prediction is a similar, but more general term. Both might refer to formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods. Usage can differ between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

## Contents

A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

In time series analysis, the lag operator (L) or backshift operator (B) operates on an element of a time series to produce the previous element. For example, given some time series

In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis. More precisely, a study's defined significance level, denoted α, is the probability of the study rejecting the null hypothesis, given that the null hypothesis were true; and the p-value of a result, p, is the probability of obtaining a result at least as extreme, given that the null hypothesis were true. The result is statistically significant, by the standards of the study, when . The significance level for a study is chosen before data collection, and typically set to 5% or much lower, depending on the field of study.

Granger also stressed that some studies using "Granger causality" testing in areas outside economics reached "ridiculous" conclusions. "Of course, many ridiculous papers appeared", he said in his Nobel lecture.  However, it remains a popular method for causality analysis in time series due to its computational simplicity.   The original definition of Granger causality does not account for latent confounding effects and does not capture instantaneous and non-linear causal relationships, though several extensions have been proposed to address these issues. In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations.

## Intuition

We say that a variable X that evolves over time Granger-causes another evolving variable Y if predictions of the value of Y based on its own past values and on the past values of X are better than predictions of Y based only on its own past values.

## Underlying principles

Granger defined the causality relationship based on two principles:  

1. The cause happens prior to its effect.
2. The cause has unique information about the future values of its effect.

Given these two assumptions about causality, Granger proposed to test the following hypothesis for identification of a causal effect of $X$ on $Y$ :

$\mathbb {P} [Y(t+1)\in A\mid {\mathcal {I}}(t)]\neq \mathbb {P} [Y(t+1)\in A\mid {\mathcal {I}}_{-X}(t)],$ where $\mathbb {P}$ refers to probability, $A$ is an arbitrary non-empty set, and ${\mathcal {I}}(t)$ and ${\mathcal {I}}_{-X}(t)$ respectively denote the information available as of time $t$ in the entire universe, and that in the modified universe in which $X$ is excluded. If the above hypothesis is accepted, we say that $X$ Granger-causes $Y$ .  

## Method

If a time series is a stationary process, the test is performed using the level values of two (or more) variables. If the variables are non-stationary, then the test is done using first (or higher) differences. The number of lags to be included is usually chosen using an information criterion, such as the Akaike information criterion or the Schwarz information criterion. Any particular lagged value of one of the variables is retained in the regression if (1) it is significant according to a t-test, and (2) it and the other lagged values of the variable jointly add explanatory power to the model according to an F-test. Then the null hypothesis of no Granger causality is not rejected if and only if no lagged values of an explanatory variable have been retained in the regression.

In mathematics and statistics, a stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time.

The Akaike information criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

Explanatory power is the ability of a hypothesis or theory to effectively explain the subject matter it pertains to. The opposite of explanatory power is explanatory impotence.

In practice it may be found that neither variable Granger-causes the other, or that each of the two variables Granger-causes the other.

### Mathematical statement

Let y and x be stationary time series. To test the null hypothesis that x does not Granger-cause y, one first finds the proper lagged values of y to include in a univariate autoregression of y:

$y_{t}=a_{0}+a_{1}y_{t-1}+a_{2}y_{t-2}+\cdots +a_{m}y_{t-m}+{\text{error}}_{t}.$ Next, the autoregression is augmented by including lagged values of x:

$y_{t}=a_{0}+a_{1}y_{t-1}+a_{2}y_{t-2}+\cdots +a_{m}y_{t-m}+b_{p}x_{t-p}+\cdots +b_{q}x_{t-q}+{\text{error}}_{t}.$ One retains in this regression all lagged values of x that are individually significant according to their t-statistics, provided that collectively they add explanatory power to the regression according to an F-test (whose null hypothesis is no explanatory power jointly added by the x's). In the notation of the above augmented regression, p is the shortest, and q is the longest, lag length for which the lagged value of x is significant.

The null hypothesis that x does not Granger-cause y is accepted if and only if no lagged values of x are retained in the regression.

### Multivariate analysis

Multivariate Granger causality analysis is usually performed by fitting a vector autoregressive model (VAR) to the time series. In particular, let $X(t)\in \mathbb {R} ^{d\times 1}$ for $t=1,\ldots ,T$ be a $d$ -dimensional multivariate time series. Granger causality is performed by fitting a VAR model with $L$ time lags as follows:

$X(t)=\sum _{\tau =1}^{L}A_{\tau }X(t-\tau )+\varepsilon (t),$ where $\varepsilon (t)$ is a white Gaussian random vector, and $A_{\tau }$ is a matrix for every $\tau$ . A time series $X_{i}$ is called a Granger cause of another time series $X_{j}$ , if at least one of the elements $A_{\tau }(j,i)$ for $\tau =1,\ldots ,L$ is significantly larger than zero (in absolute value). 

### Non-parametric test

The above linear methods are appropriate for testing Granger causality in the mean. However they are not able to detect Granger causality in higher moments, e.g., in the variance. Non-parametric tests for Granger causality are designed to address this problem.  The definition of Granger causality in these tests is general and does not involve any modelling assumptions, such as a linear autoregressive model. The non-parametric tests for Granger causality can be used as diagnostic tools to build better parametric models including higher order moments and/or non-linearity. 

## Limitations

As its name implies, Granger causality is not necessarily true causality. In fact, the Granger-causality tests fulfill only the Humean definition of causality that identifies the cause-effect relations with constant conjunctions.  If both X and Y are driven by a common third process with different lags, one might still fail to reject the alternative hypothesis of Granger causality. Yet, manipulation of one of the variables would not change the other. Indeed, the Granger-causality tests are designed to handle pairs of variables, and may produce misleading results when the true relationship involves three or more variables. Having said this, it has been argued that given a probabilistic view of causation, Granger causality can be considered true causality in that sense, especially when Reichenbach's "screening off" notion of probabilistic causation is taken into account.  Other possible sources of misguiding test results are: (1) not frequent enough or too frequent sampling, (2) nonlinear causal relationship, (3) time series nonstationarity and nonlinearity and (4) existence of rational expectations.  A similar test involving more variables can be applied with vector autoregression.

## Extensions

A method for Granger causality has been developed that is not sensitive to deviations from the assumption that the error term is normally distributed.  This method is especially useful in financial economics, since many financial variables are non-normally distributed.  Recently, asymmetric causality testing has been suggested in the literature in order to separate the causal impact of positive changes from the negative ones.  An extension of Granger (non-)causality testing to panel data is also available. 

## In neuroscience

A long-held belief about neural function maintained that different areas of the brain were task specific; that the structural connectivity local to a certain area somehow dictated the function of that piece. Collecting work that has been performed over many years, there has been a move to a different, network-centric approach to describing information flow in the brain. Explanation of function is beginning to include the concept of networks existing at different levels and throughout different locations in the brain.  The behavior of these networks can be described by non-deterministic processes that are evolving through time. That is to say that given the same input stimulus, you will not get the same output from the network. The dynamics of these networks are governed by probabilities so we treat them as stochastic (random) processes so that we can capture these kinds of dynamics between different areas of the brain.

Different methods of obtaining some measure of information flow from the firing activities of a neuron and its surrounding ensemble have been explored in the past, but they are limited in the kinds of conclusions that can be drawn and provide little insight into the directional flow of information, its effect size, and how it can change with time.  Recently Granger causality has been applied to address some of these issues with great success.  Put plainly, one examines how to best predict the future of a neuron: using either the entire ensemble or the entire ensemble except a certain target neuron. If the prediction is made worse by excluding the target neuron, then we say it has a “g-causal” relationship with the current neuron.

### Extensions to point process models

Previous Granger-causality methods could only operate on continuous-valued data so the analysis of neural spike train recordings involved transformations that ultimately altered the stochastic properties of the data, indirectly altering the validity of the conclusions that could be drawn from it. In 2011, however, a new general-purpose Granger-causality framework was proposed that could directly operate on any modality, including neural-spike trains. 

Neural spike train data can be modeled as a point-process. A temporal point process is a stochastic time-series of binary events that occurs in continuous time. It can only take on two values at each point in time, indicating whether or not an event has actually occurred. This type of binary-valued representation of information suits the activity of neural populations because a single neuron's action potential has a typical waveform. In this way, what carries the actual information being output from a neuron is the occurrence of a “spike”, as well as the time between successive spikes. Using this approach one could abstract the flow of information in a neural-network to be simply the spiking times for each neuron through an observation period. A point-process can be represented either by the timing of the spikes themselves, the waiting times between spikes, using a counting process, or, if time is discretized enough to ensure that in each window only one event has the possibility of occurring, that is to say one time bin can only contain one event, as a set of 1s and 0s, very similar to binary.[ citation needed ]

One of the simplest types of neural-spiking models is the Poisson process. This however, is limited in that it is memory-less. It does not account for any spiking history when calculating the current probability of firing. Neurons, however, exhibit a fundamental (biophysical) history dependence by way of its relative and absolute refractory periods. To address this, a conditional intensity function is used to represent the probability of a neuron spiking, conditioned on its own history. The conditional intensity function expresses the instantaneous firing probability and implicitly defines a complete probability model for the point process. It defines a probability per unit time. So if this unit time is taken small enough to ensure that only one spike could occur in that time window, then our conditional intensity function completely specifies the probability that a given neuron will fire in a certain time.[ citation needed ]

## Related Research Articles Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s). In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

In control theory, a causal system is a system where the output depends on past and current inputs but not future inputs—i.e., the output depends on only the input for values of . In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.

Cointegration is a statistical property of a collection (X1X2, ..., Xk) of time series variables. First, all of the series must be integrated of order d. Next, if a linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated. Formally, if (X,Y,Z) are each integrated of order d, and there exist coefficients a,b,c such that aX + bY + cZ is integrated of order less than d, then X, Y, and Z are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends—either deterministic or stochastic. In an influential paper, Charles Nelson and Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series have stochastic trends—these are also called unit root processes, or processes integrated of order . They also showed that unit root processes have non-standard statistical properties, so that conventional econometric theory methods do not apply to them.

In statistics and econometrics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample. The alternative hypothesis is different depending on which version of the test is used, but is usually stationarity or trend-stationarity. It is an augmented version of the Dickey–Fuller test for a larger and more complicated set of time series models.

In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Later, John Denis Sargan and Alok Bhargava developed several von Neumann–Durbin–Watson type test statistics for the null hypothesis that the errors on a regression model follow a process with a unit root against the alternative hypothesis that the errors follow a stationary first order autoregression. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors. In philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient, is a statistic used to measure the ordinal association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient. A biological neuron model, also known as a spiking neuron model, is a mathematical description of the properties of certain cells in the nervous system that generate sharp electrical potentials across their cell membrane, roughly one millisecond in duration, as shown in Fig. 1. Spiking neurons are known to be a major signaling unit of the nervous system, and for this reason characterizing their operation is of great importance. It is worth noting that not all the cells of the nervous system produce the type of spike that define the scope of the spiking neuron models. For example, cochlear hair cells, retinal receptor cells, and retinal bipolar cells do not spike. Furthermore, many cells in the nervous system are not classified as neurons but instead are classified as glia. Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares results in estimates of the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating either the conditional median or other quantiles of the response variable. Essentially, quantile regression is the extension of linear regression and we use it when the conditions of linear regression are not applicable.

An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-period's deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.

Neural decoding is a neuroscience field concerned with the hypothetical reconstruction of sensory and other stimuli from information that has already been encoded and represented in the brain by networks of neurons. Reconstruction refers to the ability of the researcher to predict what sensory stimuli the subject is receiving based purely on neuron action potentials. Therefore, the main goal of neural decoding is to characterize how the electrical activity of neurons elicit activity and responses in the brain.

Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes. Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if and for denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:

In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two possibly dependent random variables X and Y. Somers’ D takes values between when all pairs of the variables disagree and when all pairs of the variables agree. Somers’ D is named after Robert H. Somers, who proposed it in 1962.

Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature by Jorma Rissanen in 1983, as a universal tool to data compression, but recently have been used to model data in different areas such as biology, linguistics and music.

1. Granger, C. W. J. (1969). "Investigating Causal Relations by Econometric Models and Cross-spectral Methods". Econometrica. 37 (3): 424–438. doi:10.2307/1912791. JSTOR   1912791.
2. Diebold, Francis X. (2001). Elements of Forecasting (2nd ed.). Cincinnati: South Western. p. 254. ISBN   978-0-324-02393-0.
3. Granger, Clive W. J (2004). "Time Series Analysis, Cointegration, and Applications" (PDF). American Economic Review. 94 (3): 421–425. CiteSeerX  . doi:10.1257/0002828041464669 . Retrieved 12 June 2019.
4. Eichler, Michael (2012). "Causal Inference in Time Series Analysis" (PDF). In Berzuini, Carlo (ed.). Causality : statistical perspectives and applications (3rd ed.). Hoboken, N.J.: Wiley. pp. 327–352. ISBN   978-0470665565.
5. Seth, Anil (2007). "Granger causality". Scholarpedia. 2 (7): 1667. Bibcode:2007SchpJ...2.1667S. doi:10.4249/scholarpedia.1667.
6. Granger, C.W.J. (1980). "Testing for causality: A personal viewpoint". Journal of Economic Dynamics and Control. 2: 329–352. doi:10.1016/0165-1889(80)90069-X.
7. Lütkepohl, Helmut (2005). New introduction to multiple time series analysis (3 ed.). Berlin: Springer. pp. 41–51. ISBN   978-3540262398.
8. Diks, Cees; Panchenko, Valentyn (2006). "A new statistic and practical guidelines for nonparametric Granger causality testing" (PDF). Journal of Economic Dynamics and Control. 30 (9): 1647–1669. doi:10.1016/j.jedc.2005.08.008.
9. Francis, Bill B.; Mougoue, Mbodja; Panchenko, Valentyn (2010). "Is there a Symmetric Nonlinear Causal Relationship between Large and Small Firms?" (PDF). Journal of Empirical Finance. 17 (1): 23–28. doi:10.1016/j.jempfin.2009.08.003.
10. Mariusz, Maziarz (2015-05-20). "A review of the Granger-causality fallacy". The Journal of Philosophical Economics: Reflections on Economic and Social Issues. VIII. (2). ISSN   1843-2298.
11. Mannino, Michael; Bressler, Steven L (2015). "Foundational perspectives on causality in large-scale brain networks". Physics of Life Reviews. 15: 107–23. Bibcode:2015PhLRv..15..107M. doi:10.1016/j.plrev.2015.09.002. PMID   26429630.
12. Hacker, R. Scott; Hatemi-j, A. (2006). "Tests for causality between integrated variables using asymptotic and bootstrap distributions: Theory and application". Applied Economics. 38 (13): 1489–1500. doi:10.1080/00036840500405763.
13. Mandelbrot, Benoit (1963). "The Variation of Certain Speculative Prices". The Journal of Business. 36 (4): 394–419. doi:10.1086/294632.
14. Hatemi-j, A. (2012). "Asymmetric causality tests with an application". Empirical Economics. 43: 447–456. doi:10.1007/s00181-011-0484-x.
15. Dumistrescu, E.-I.; Hurlin, C. (2012). "Testing for Granger non-causality in heterogeneous panels". Economic Modelling. 29 (4): 1450–1460. CiteSeerX  . doi:10.1016/j.econmod.2012.02.014.
16. Knight, R. T (2007). "NEUROSCIENCE: Neural Networks Debunk Phrenology". Science. 316 (5831): 1578–9. doi:10.1126/science.1144677. PMID   17569852.
17. Kim, Sanggyun; Putrino, David; Ghosh, Soumya; Brown, Emery N (2011). "A Granger Causality Measure for Point Process Models of Ensemble Neural Spiking Activity". PLoS Computational Biology. 7 (3): e1001110. Bibcode:2011PLSCB...7E1110K. doi:10.1371/journal.pcbi.1001110. PMC  . PMID   21455283.
18. Bressler, Steven L; Seth, Anil K (2011). "Wiener–Granger Causality: A well established methodology". NeuroImage. 58 (2): 323–9. doi:10.1016/j.neuroimage.2010.02.059. PMID   20202481.