Transfer entropy

Last updated July 08, 2024

Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes.^[1]^[2]^[3] Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if $X_{t}$ and $Y_{t}$ for $t\in \mathbb {N}$ denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:

Transfer entropy reduces to Granger causality for vector auto-regressive processes.^[7] Hence, it is advantageous when the model assumption of Granger causality doesn't hold, for example, analysis of non-linear signals.^[8]^[9] However, it usually requires more samples for accurate estimation.^[10] The probabilities in the entropy formula can be estimated using different approaches (binning, nearest neighbors) or, in order to reduce complexity, using a non-uniform embedding.^[11] While it was originally defined for bivariate analysis, transfer entropy has been extended to multivariate forms, either conditioning on other potential source variables^[12] or considering transfer from a collection of sources,^[13] although these forms require more samples again.

Transfer entropy has been used for estimation of functional connectivity of neurons,^[13]^[14]^[15] social influence in social networks ^[8] and statistical causality between armed conflict events.^[16] Transfer entropy is a finite version of the directed information which was defined in 1990 by James Massey ^[17] as $I(X^{n}\to Y^{n})=\sum _{i=1}^{n}I(X^{i};Y_{i}|Y^{i-1})$ , where $X^{n}$ denotes the vector $X_{1},X_{2},...,X_{n}$ and $Y^{n}$ denotes $Y_{1},Y_{2},...,Y_{n}$ . The directed information places an important role in characterizing the fundamental limits (channel capacity) of communication channels with or without feedback^[18]^[19] and gambling with causal side information.^[20]

Related Research Articles

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. It is at the intersection of electronic engineering, mathematics, statistics, computer science, neurobiology, physics, and electrical engineering.

Quantum entanglement is the phenomenon of a group of particles being generated, interacting, or sharing spatial proximity in such a way that the quantum state of each particle of the group cannot be described independently of the state of the others, including when the particles are separated by a large distance. The topic of quantum entanglement is at the heart of the disparity between classical and quantum physics: entanglement is a primary feature of quantum mechanics not present in classical mechanics.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

In quantum physics, a measurement is the testing or manipulation of a physical system to yield a numerical result. A fundamental feature of quantum theory is that the predictions it makes are probabilistic. The procedure for finding a probability involves combining a quantum state, which mathematically describes a quantum system, with a mathematical representation of the measurement to be performed on that system. The formula for this calculation is known as the Born rule. For example, a quantum particle like an electron can be described by a quantum state that associates to each point in space a complex number called a probability amplitude. Applying the Born rule to these amplitudes gives the probabilities that the electron will be found in one region or another when an experiment is performed to locate it. This is the best the theory can do; it cannot say for certain where the electron will be found. The same quantum state can also be used to make a prediction of how the electron will be moving, if an experiment is performed to measure its momentum instead of its position. The uncertainty principle implies that, whatever the quantum state, the range of predictions for the electron's position and the range of predictions for its momentum cannot both be narrow. Some quantum states imply a near-certain prediction of the result of a position measurement, but the result of a momentum measurement will be highly unpredictable, and vice versa. Furthermore, the fact that nature violates the statistical conditions known as Bell inequalities indicates that the unpredictability of quantum measurement results cannot be explained away as due to ignorance about "local hidden variables" within quantum systems.

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA was invented by Jeanny Hérault and Christian Jutten in 1985. ICA is a special case of blind source separation. A common example application of ICA is the "cocktail party problem" of listening in on one person's speech in a noisy room.

<span class="mw-page-title-main">Granger causality</span> Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

In information theory, redundancy measures the fractional difference between the entropy $H(X)$ of an ensemble $X$ , and its maximum possible value $. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy, while forward error correction is a way of adding desired redundancy for purposes of error detection and correction when communicating over a noisy channel of limited capacity.$

In physics, the Tsallis entropy is a generalization of the standard Boltzmann–Gibbs entropy. It is proportional to the expectation of the q-logarithm of a distribution.

<span class="mw-page-title-main">Interaction information</span>

The interaction information is a generalization of the mutual information for more than two variables.

<span class="mw-page-title-main">One-way quantum computer</span> Method of quantum computing

The one-way or measurement-based quantum computer (MBQC) is a method of quantum computing that first prepares an entangled resource state, usually a cluster state or graph state, then performs single qubit measurements on it. It is "one-way" because the resource state is destroyed by the measurements.

In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.

The Tracy–Widom distribution is a probability distribution from random matrix theory introduced by Craig Tracy and Harold Widom. It is the distribution of the normalized largest eigenvalue of a random Hermitian matrix. The distribution is defined as a Fredholm determinant.

In information theory, dual total correlation, information rate, excess entropy, or binding information is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation.

Brain connectivity estimators represent patterns of links in the brain. Connectivity can be considered at different levels of the brain's organisation: from neurons, to neural assemblies and brain structures. Brain connectivity involves different concepts such as: neuroanatomical or structural connectivity, functional connectivity and effective connectivity.

Direct coupling analysis or DCA is an umbrella term comprising several methods for analyzing sequence data in computational biology. The common idea of these methods is to use statistical modeling to quantify the strength of the direct relationship between two positions of a biological sequence, excluding effects from other positions. This contrasts usual measures of correlation, which can be large even if there is no direct relationship between the positions. Such a direct relationship can for example be the evolutionary pressure for two positions to maintain mutual compatibility in the biomolecular structure of the sequence, leading to molecular coevolution between the two positions.

Directed information is an information theory measure that quantifies the information flow from the random string $to the random string . The term directed information was coined by James Massey and is defined as$

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

Partial Information Decomposition is an extension of information theory, that aims to generalize the pairwise relations described by information theory to the interaction of multiple variables.

References

↑ Schreiber, Thomas (1 July 2000). "Measuring information transfer". Physical Review Letters. 85 (2): 461–464. arXiv: nlin/0001042 . Bibcode:2000PhRvL..85..461S. doi:10.1103/PhysRevLett.85.461. PMID 10991308. S2CID 7411376.
↑ Seth, Anil (2007). "Granger causality". Scholarpedia . 2 (7): 1667. Bibcode:2007SchpJ...2.1667S. doi: 10.4249/scholarpedia.1667 .
1 2 Hlaváčková-Schindler, Katerina; Palus, M; Vejmelka, M; Bhattacharya, J (1 March 2007). "Causality detection based on information-theoretic approaches in time series analysis". Physics Reports. 441 (1): 1–46. Bibcode:2007PhR...441....1H. CiteSeerX 10.1.1.183.1617 . doi:10.1016/j.physrep.2006.12.004.
↑ Jizba, Petr; Kleinert, Hagen; Shefaat, Mohammad (2012-05-15). "Rényi's information transfer between financial time series". Physica A: Statistical Mechanics and Its Applications. 391 (10): 2971–2989. arXiv: 1106.5913 . Bibcode:2012PhyA..391.2971J. doi:10.1016/j.physa.2011.12.064. ISSN 0378-4371. S2CID 51789622.
↑ Wyner, A. D. (1978). "A definition of conditional mutual information for arbitrary ensembles". Information and Control. 38 (1): 51–59. doi: 10.1016/s0019-9958(78)90026-8 .
↑ Dobrushin, R. L. (1959). "General formulation of Shannon's main theorem in information theory". Uspekhi Mat. Nauk. 14: 3–104.
↑ Barnett, Lionel (1 December 2009). "Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables". Physical Review Letters. 103 (23): 238701. arXiv: 0910.4514 . Bibcode:2009PhRvL.103w8701B. doi:10.1103/PhysRevLett.103.238701. PMID 20366183. S2CID 1266025.
1 2 Ver Steeg, Greg; Galstyan, Aram (2012). "Information transfer in social media". Proceedings of the 21st international conference on World Wide Web (WWW '12). ACM. pp. 509–518. arXiv: 1110.2724 . Bibcode:2011arXiv1110.2724V.
↑ Lungarella, M.; Ishiguro, K.; Kuniyoshi, Y.; Otsu, N. (1 March 2007). "Methods for quantifying the causal structure of bivariate time series". International Journal of Bifurcation and Chaos. 17 (3): 903–921. Bibcode:2007IJBC...17..903L. CiteSeerX 10.1.1.67.3585 . doi:10.1142/S0218127407017628.
↑ Pereda, E; Quiroga, RQ; Bhattacharya, J (Sep–Oct 2005). "Nonlinear multivariate analysis of neurophysiological signals". Progress in Neurobiology. 77 (1–2): 1–37. arXiv: nlin/0510077 . Bibcode:2005nlin.....10077P. doi:10.1016/j.pneurobio.2005.10.003. PMID 16289760. S2CID 9529656.
↑ Montalto, A; Faes, L; Marinazzo, D (Oct 2014). "MuTE: A MATLAB Toolbox to Compare Established and Novel Estimators of the Multivariate Transfer Entropy". PLOS ONE. 9 (10): e109462. Bibcode:2014PLoSO...9j9462M. doi: 10.1371/journal.pone.0109462 . PMC 4196918 . PMID 25314003.
↑ Lizier, Joseph; Prokopenko, Mikhail; Zomaya, Albert (2008). "Local information transfer as a spatiotemporal filter for complex systems". Physical Review E. 77 (2): 026110. arXiv: 0809.3275 . Bibcode:2008PhRvE..77b6110L. doi:10.1103/PhysRevE.77.026110. PMID 18352093. S2CID 15634881.
1 2 Lizier, Joseph; Heinzle, Jakob; Horstmann, Annette; Haynes, John-Dylan; Prokopenko, Mikhail (2011). "Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity". Journal of Computational Neuroscience. 30 (1): 85–107. doi:10.1007/s10827-010-0271-2. PMID 20799057. S2CID 3012713.
↑ Vicente, Raul; Wibral, Michael; Lindner, Michael; Pipa, Gordon (February 2011). "Transfer entropy—a model-free measure of effective connectivity for the neurosciences". Journal of Computational Neuroscience. 30 (1): 45–67. doi:10.1007/s10827-010-0262-3. PMC 3040354 . PMID 20706781.
↑ Shimono, Masanori; Beggs, John (October 2014). "Functional clusters, hubs, and communities in the cortical microconnectome". Cerebral Cortex. 25 (10): 3743–57. doi:10.1093/cercor/bhu252. PMC 4585513 . PMID 25336598.
↑ Kushwaha, Niraj; Lee, Edward D (July 2023). "Discovering the mesoscale for chains of conflict". PNAS Nexus. 2 (7): pgad228. doi:10.1093/pnasnexus/pgad228. ISSN 2752-6542. PMC 10392960 . PMID 37533894.
↑ Massey, James (1990). "Causality, Feedback And Directed Information" (ISITA). CiteSeerX 10.1.1.36.5688 .{{cite journal}}: Cite journal requires |journal= (help)
↑ Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv: cs/0608070 . doi:10.1109/TIT.2008.2009849. S2CID 13178.
↑ Kramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory. 49 (1): 4–21. doi:10.1109/TIT.2002.806135.
↑ Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory. 57 (6): 3248–3259. arXiv: 0912.4872 . doi:10.1109/TIT.2011.2136270. S2CID 11722596.

External links

"Transfer Entropy Toolbox". Google Code., a toolbox, developed in C++ and MATLAB, for computation of transfer entropy between spike trains.
"Java Information Dynamics Toolkit (JIDT)". GitHub. 2019-01-16., a toolbox, developed in Java and usable in MATLAB, GNU Octave and Python, for computation of transfer entropy and related information-theoretic measures in both discrete and continuous-valued data.
"Multivariate Transfer Entropy (MuTE) toolbox". GitHub. 2019-01-09., a toolbox, developed in MATLAB, for computation of transfer entropy with different estimators.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Schreiber, Thomas (1 July 2000). "Measuring information transfer". Physical Review Letters. 85 (2): 461–464. arXiv: nlin/0001042 . Bibcode:2000PhRvL..85..461S. doi:10.1103/PhysRevLett.85.461. PMID 10991308. S2CID 7411376.

[Scholarpedia-2] Seth, Anil (2007). "Granger causality". Scholarpedia . 2 (7): 1667. Bibcode:2007SchpJ...2.1667S. doi: 10.4249/scholarpedia.1667 .

[Schindler07-3] 1 2 Hlaváčková-Schindler, Katerina; Palus, M; Vejmelka, M; Bhattacharya, J (1 March 2007). "Causality detection based on information-theoretic approaches in time series analysis". Physics Reports. 441 (1): 1–46. Bibcode:2007PhR...441....1H. CiteSeerX 10.1.1.183.1617 . doi:10.1016/j.physrep.2006.12.004.

[4] Jizba, Petr; Kleinert, Hagen; Shefaat, Mohammad (2012-05-15). "Rényi's information transfer between financial time series". Physica A: Statistical Mechanics and Its Applications. 391 (10): 2971–2989. arXiv: 1106.5913 . Bibcode:2012PhyA..391.2971J. doi:10.1016/j.physa.2011.12.064. ISSN 0378-4371. S2CID 51789622.

[Wyner1978-5] Wyner, A. D. (1978). "A definition of conditional mutual information for arbitrary ensembles". Information and Control. 38 (1): 51–59. doi: 10.1016/s0019-9958(78)90026-8 .

[Dobrushin1959-6] Dobrushin, R. L. (1959). "General formulation of Shannon's main theorem in information theory". Uspekhi Mat. Nauk. 14: 3–104.

[Equal-7] Barnett, Lionel (1 December 2009). "Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables". Physical Review Letters. 103 (23): 238701. arXiv: 0910.4514 . Bibcode:2009PhRvL.103w8701B. doi:10.1103/PhysRevLett.103.238701. PMID 20366183. S2CID 1266025.

[Greg-8] 1 2 Ver Steeg, Greg; Galstyan, Aram (2012). "Information transfer in social media". Proceedings of the 21st international conference on World Wide Web (WWW '12). ACM. pp. 509–518. arXiv: 1110.2724 . Bibcode:2011arXiv1110.2724V.

[9] Lungarella, M.; Ishiguro, K.; Kuniyoshi, Y.; Otsu, N. (1 March 2007). "Methods for quantifying the causal structure of bivariate time series". International Journal of Bifurcation and Chaos. 17 (3): 903–921. Bibcode:2007IJBC...17..903L. CiteSeerX 10.1.1.67.3585 . doi:10.1142/S0218127407017628.

[10] Pereda, E; Quiroga, RQ; Bhattacharya, J (Sep–Oct 2005). "Nonlinear multivariate analysis of neurophysiological signals". Progress in Neurobiology. 77 (1–2): 1–37. arXiv: nlin/0510077 . Bibcode:2005nlin.....10077P. doi:10.1016/j.pneurobio.2005.10.003. PMID 16289760. S2CID 9529656.

[11] Montalto, A; Faes, L; Marinazzo, D (Oct 2014). "MuTE: A MATLAB Toolbox to Compare Established and Novel Estimators of the Multivariate Transfer Entropy". PLOS ONE. 9 (10): e109462. Bibcode:2014PLoSO...9j9462M. doi: 10.1371/journal.pone.0109462 . PMC 4196918 . PMID 25314003.

[12] Lizier, Joseph; Prokopenko, Mikhail; Zomaya, Albert (2008). "Local information transfer as a spatiotemporal filter for complex systems". Physical Review E. 77 (2): 026110. arXiv: 0809.3275 . Bibcode:2008PhRvE..77b6110L. doi:10.1103/PhysRevE.77.026110. PMID 18352093. S2CID 15634881.

[Lizier2011-13] 1 2 Lizier, Joseph; Heinzle, Jakob; Horstmann, Annette; Haynes, John-Dylan; Prokopenko, Mikhail (2011). "Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity". Journal of Computational Neuroscience. 30 (1): 85–107. doi:10.1007/s10827-010-0271-2. PMID 20799057. S2CID 3012713.

[14] Vicente, Raul; Wibral, Michael; Lindner, Michael; Pipa, Gordon (February 2011). "Transfer entropy—a model-free measure of effective connectivity for the neurosciences". Journal of Computational Neuroscience. 30 (1): 45–67. doi:10.1007/s10827-010-0262-3. PMC 3040354 . PMID 20706781.

[Shimono2014-15] Shimono, Masanori; Beggs, John (October 2014). "Functional clusters, hubs, and communities in the cortical microconnectome". Cerebral Cortex. 25 (10): 3743–57. doi:10.1093/cercor/bhu252. PMC 4585513 . PMID 25336598.

[16] Kushwaha, Niraj; Lee, Edward D (July 2023). "Discovering the mesoscale for chains of conflict". PNAS Nexus. 2 (7): pgad228. doi:10.1093/pnasnexus/pgad228. ISSN 2752-6542. PMC 10392960 . PMID 37533894.

[17] Massey, James (1990). "Causality, Feedback And Directed Information" (ISITA). CiteSeerX 10.1.1.36.5688 .{{cite journal}}: Cite journal requires |journal= (help)

[18] Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv: cs/0608070 . doi:10.1109/TIT.2008.2009849. S2CID 13178.

[19] Kramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory. 49 (1): 4–21. doi:10.1109/TIT.2002.806135.

[20] Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory. 57 (6): 3248–3259. arXiv: 0912.4872 . doi:10.1109/TIT.2011.2136270. S2CID 11722596.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

Transfer entropy

Contents

See also

Related Research Articles

References

External links