The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. [1] Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". [2] Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", [3] or, as Granger himself later claimed in 1977, "temporally related". [4] Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY. [5]
A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.
Granger also stressed that some studies using "Granger causality" testing in areas outside economics reached "ridiculous" conclusions. [6] "Of course, many ridiculous papers appeared", he said in his Nobel lecture. [7] However, it remains a popular method for causality analysis in time series due to its computational simplicity. [8] [9] The original definition of Granger causality does not account for latent confounding effects and does not capture instantaneous and non-linear causal relationships, though several extensions have been proposed to address these issues. [8]
We say that a variable X that evolves over time Granger-causes another evolving variable Y if predictions of the value of Y based on its own past values and on the past values of X are better than predictions of Y based only on Y's own past values.
Granger defined the causality relationship based on two principles: [8] [10]
Given these two assumptions about causality, Granger proposed to test the following hypothesis for identification of a causal effect of on :
where refers to probability, is an arbitrary non-empty set, and and respectively denote the information available as of time in the entire universe, and that in the modified universe in which is excluded. If the above hypothesis is accepted, we say that Granger-causes . [8] [10]
If a time series is a stationary process, the test is performed using the level values of two (or more) variables. If the variables are non-stationary, then the test is done using first (or higher) differences. The number of lags to be included is usually chosen using an information criterion, such as the Akaike information criterion or the Schwarz information criterion. Any particular lagged value of one of the variables is retained in the regression if (1) it is significant according to a t-test, and (2) it and the other lagged values of the variable jointly add explanatory power to the model according to an F-test. Then the null hypothesis of no Granger causality is not rejected if and only if no lagged values of an explanatory variable have been retained in the regression.
In practice it may be found that neither variable Granger-causes the other, or that each of the two variables Granger-causes the other.
Let y and x be stationary time series. To test the null hypothesis that x does not Granger-cause y, one first finds the proper lagged values of y to include in a univariate autoregression of y:
Next, the autoregression is augmented by including lagged values of x:
One retains in this regression all lagged values of x that are individually significant according to their t-statistics, provided that collectively they add explanatory power to the regression according to an F-test (whose null hypothesis is no explanatory power jointly added by the x's). In the notation of the above augmented regression, p is the shortest, and q is the longest, lag length for which the lagged value of x is significant.
The null hypothesis that x does not Granger-cause y is not rejected if and only if no lagged values of x are retained in the regression.
Multivariate Granger causality analysis is usually performed by fitting a vector autoregressive model (VAR) to the time series. In particular, let for be a -dimensional multivariate time series. Granger causality is performed by fitting a VAR model with time lags as follows:
where is a white Gaussian random vector, and is a matrix for every . A time series is called a Granger cause of another time series , if at least one of the elements for is significantly larger than zero (in absolute value). [11]
The above linear methods are appropriate for testing Granger causality in the mean. However they are not able to detect Granger causality in higher moments, e.g., in the variance. Non-parametric tests for Granger causality are designed to address this problem. [12] The definition of Granger causality in these tests is general and does not involve any modelling assumptions, such as a linear autoregressive model. The non-parametric tests for Granger causality can be used as diagnostic tools to build better parametric models including higher order moments and/or non-linearity. [13]
As its name implies, Granger causality is not necessarily true causality. In fact, the Granger-causality tests fulfill only the Humean definition of causality that identifies the cause-effect relations with constant conjunctions. [14] If both X and Y are driven by a common third process with different lags, one might still fail to reject the alternative hypothesis of Granger causality. Yet, manipulation of one of the variables would not change the other. Indeed, the Granger-causality tests are designed to handle pairs of variables, and may produce misleading results when the true relationship involves three or more variables. Having said this, it has been argued that given a probabilistic view of causation, Granger causality can be considered true causality in that sense, especially when Reichenbach's "screening off" notion of probabilistic causation is taken into account. [15] Other possible sources of misguiding test results are: (1) not frequent enough or too frequent sampling, (2) nonlinear causal relationship, (3) time series nonstationarity and nonlinearity and (4) existence of rational expectations. [14] A similar test involving more variables can be applied with vector autoregression.
The validity of the Granger causality test has been challenged in the academic literature, [16] in a paper claiming that "not even the most fundamental requirement underlying any possible definition of causality is met by the Granger causality test... any definition of causality should refer to the prediction of the future from the past... we find that Granger also allows one to 'predict' the past from the future."
A method for Granger causality has been developed that is not sensitive to deviations from the assumption that the error term is normally distributed. [17] This method is especially useful in financial economics, since many financial variables are non-normally distributed. [18] Recently, asymmetric causality testing has been suggested in the literature in order to separate the causal impact of positive changes from the negative ones. [19] An extension of Granger (non-)causality testing to panel data is also available. [20] A modified Granger causality test based on the GARCH (generalized auto-regressive conditional heteroscedasticity) type of integer-valued time series models is available in many areas. [21] [22]
The extension of Granger causality to incorporate its dynamic, time-varying nature allows for a more nuanced understanding of how causal relationships in time-series data evolve over time. [23] The methodology uses recursive techniques such as the Forward Expanding (FE), Rolling (RO), and Recursive Evolving (RE) windows to overcome the limitations of traditional Granger causality tests and understand changes in causal relationships across different periods. [24] A central aspect of this methodology is the 'tvgc' command in Stata. [23] Empirical applications, such as data involving transaction fees and economic sub-systems on Ethereum, highlight the dynamic nature of economic relationships over time. [25]
A long-held belief about neural function maintained that different areas of the brain were task specific; that the structural connectivity local to a certain area somehow dictated the function of that piece. Collecting work that has been performed over many years, there has been a move to a different, network-centric approach to describing information flow in the brain. Explanation of function is beginning to include the concept of networks existing at different levels and throughout different locations in the brain. [26] The behavior of these networks can be described by non-deterministic processes that are evolving through time. That is to say that given the same input stimulus, you will not get the same output from the network. The dynamics of these networks are governed by probabilities so we treat them as stochastic (random) processes so that we can capture these kinds of dynamics between different areas of the brain.
Different methods of obtaining some measure of information flow from the firing activities of a neuron and its surrounding ensemble have been explored in the past, but they are limited in the kinds of conclusions that can be drawn and provide little insight into the directional flow of information, its effect size, and how it can change with time. [27] Recently Granger causality has been applied to address some of these issues. [28] Put plainly, one examines how to best predict the future of a neuron: using either the entire ensemble or the entire ensemble except a certain target neuron. If the prediction is made worse by excluding the target neuron, then we say it has a "g-causal" relationship with the current neuron.
Previous Granger-causality methods could only operate on continuous-valued data so the analysis of neural spike train recordings involved transformations that ultimately altered the stochastic properties of the data, indirectly altering the validity of the conclusions that could be drawn from it. In 2011, however, a new general-purpose Granger-causality framework was proposed that could directly operate on any modality, including neural-spike trains. [27]
Neural spike train data can be modeled as a point-process. A temporal point process is a stochastic time-series of binary events that occurs in continuous time. It can only take on two values at each point in time, indicating whether or not an event has actually occurred. This type of binary-valued representation of information suits the activity of neural populations because a single neuron's action potential has a typical waveform. In this way, what carries the actual information being output from a neuron is the occurrence of a "spike", as well as the time between successive spikes. Using this approach one could abstract the flow of information in a neural-network to be simply the spiking times for each neuron through an observation period. A point-process can be represented either by the timing of the spikes themselves, the waiting times between spikes, using a counting process, or, if time is discretized enough to ensure that in each window only one event has the possibility of occurring, that is to say one time bin can only contain one event, as a set of 1s and 0s, very similar to binary.[ citation needed ]
One of the simplest types of neural-spiking models is the Poisson process. This however, is limited in that it is memory-less. It does not account for any spiking history when calculating the current probability of firing. Neurons, however, exhibit a fundamental (biophysical) history dependence by way of its relative and absolute refractory periods. To address this, a conditional intensity function is used to represent the probability of a neuron spiking, conditioned on its own history. The conditional intensity function expresses the instantaneous firing probability and implicitly defines a complete probability model for the point process. It defines a probability per unit time. So if this unit time is taken small enough to ensure that only one spike could occur in that time window, then our conditional intensity function completely specifies the probability that a given neuron will fire in a certain time.[ citation needed ]
Software packages have been developed for measuring "Granger causality" in Python and R:
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.
Causality is an influence by which one event, process, state, or object (acause) contributes to the production of another event, process, state, or object (an effect) where the cause is at least partly responsible for the effect, and the effect is at least partly dependent on the cause. The cause of something may also be described as the reason for the event or process.
In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more error-free independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.
Cointegration is a statistical property of a collection (X1, X2, ..., Xk) of time series variables. First, all of the series must be integrated of order d. Next, if a linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated. Formally, if (X,Y,Z) are each integrated of order d, and there exist coefficients a,b,c such that aX + bY + cZ is integrated of order less than d, then X, Y, and Z are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends—either deterministic or stochastic. In an influential paper, Charles Nelson and Charles Plosser (1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.) have stochastic trends.
Vector autoregression (VAR) is a statistical model used to capture the relationship between multiple quantities as they change over time. VAR is a type of stochastic process model. VAR models generalize the single-variable (univariate) autoregressive model by allowing for multivariate time series. VAR models are often used in economics and the natural sciences.
In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann. Durbin and Watson applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.
In metaphysics, a causal model is a conceptual model that describes the causal mechanisms of a system. Several types of causal notation may be used in the development of a causal model. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.
Biological neuron models, also known as spiking neuron models, are mathematical descriptions of the conduction of electrical signals in neurons. Neurons are electrically excitable cells within the nervous system, able to fire electric signals, called action potentials, across a neural network. These mathematical models describe the role of the biophysical and geometrical characteristics of neurons on the conduction of electrical activity.
Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. [There is also a method for predicting the conditional geometric mean of the response variable, .] Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.
An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-period's deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.
Neural decoding is a neuroscience field concerned with the hypothetical reconstruction of sensory and other stimuli from information that has already been encoded and represented in the brain by networks of neurons. Reconstruction refers to the ability of the researcher to predict what sensory stimuli the subject is receiving based purely on neuron action potentials. Therefore, the main goal of neural decoding is to characterize how the electrical activity of neurons elicit activity and responses in the brain.
Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.
Convergent cross mapping (CCM) is a statistical test for a cause-and-effect relationship between two variables that, like the Granger causality test, seeks to resolve the problem that correlation does not imply causation. While Granger causality is best suited for purely stochastic systems where the influences of the causal variables are separable, CCM is based on the theory of dynamical systems and can be applied to systems where causal variables have synergistic effects. As such, CCM is specifically aimed to identify linkage between variables that can appear uncorrelated with each other.
Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes. Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if and for denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:
In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process.
Spike-and-slab regression is a type of Bayesian linear regression in which a particular hierarchical prior distribution for the regression coefficients is chosen such that only a subset of the possible regressors is retained. The technique is particularly useful when the number of possible predictors is larger than the number of observations. The idea of the spike-and-slab model was originally proposed by Mitchell & Beauchamp (1988). The approach was further significantly developed by Madigan & Raftery (1994) and George & McCulloch (1997). A recent and important contribution to this literature is Ishwaran & Rao (2005).
The Galves–Löcherbach model is a mathematical model for a network of neurons with intrinsic stochasticity.
The spike response model (SRM) is a spiking neuron model in which spikes are generated by either a deterministic or a stochastic threshold process. In the SRM, the membrane voltage V is described as a linear sum of the postsynaptic potentials (PSPs) caused by spike arrivals to which the effects of refractoriness and adaptation are added. The threshold is either fixed or dynamic. In the latter case it increases after each spike. The SRM is flexible enough to account for a variety of neuronal firing pattern in response to step current input. The SRM has also been used in the theory of computation to quantify the capacity of spiking neural networks; and in the neurosciences to predict the subthreshold voltage and the firing times of cortical neurons during stimulation with a time-dependent current stimulation. The name Spike Response Model points to the property that the two important filters and of the model can be interpreted as the response of the membrane potential to an incoming spike (response kernel , the PSP) and to an outgoing spike (response kernel , also called refractory kernel). The SRM has been formulated in continuous time and in discrete time. The SRM can be viewed as a generalized linear model (GLM) or as an (integrated version of) a generalized integrate-and-fire model with adaptation.