Transfer entropy

Last updated

Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes. [1] [2] [3] Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if and for denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:

Contents

where H(X) is Shannon's entropy of X. The above definition of transfer entropy has been extended by other types of entropy measures such as Rényi entropy. [3] [4]

Transfer entropy is conditional mutual information, [5] [6] with the history of the influenced variable in the condition:

Transfer entropy reduces to Granger causality for vector auto-regressive processes. [7] Hence, it is advantageous when the model assumption of Granger causality doesn't hold, for example, analysis of non-linear signals. [8] [9] However, it usually requires more samples for accurate estimation. [10] The probabilities in the entropy formula can be estimated using different approaches (binning, nearest neighbors) or, in order to reduce complexity, using a non-uniform embedding. [11] While it was originally defined for bivariate analysis, transfer entropy has been extended to multivariate forms, either conditioning on other potential source variables [12] or considering transfer from a collection of sources, [13] although these forms require more samples again.

Transfer entropy has been used for estimation of functional connectivity of neurons, [13] [14] [15] social influence in social networks [8] and statistical causality between armed conflict events. [16] Transfer entropy is a finite version of the Directed Information which was defined in 1990 by James Massey [17] as , where denotes the vector and denotes . The directed information places an important role in characterizing the fundamental limits (channel capacity) of communication channels with or without feedback [18] [19] and gambling with causal side information. [20]

See also

Related Research Articles

<span class="mw-page-title-main">Lyapunov exponent</span> The rate of separation of infinitesimally close trajectories

In mathematics, the Lyapunov exponent or Lyapunov characteristic exponent of a dynamical system is a quantity that characterizes the rate of separation of infinitesimally close trajectories. Quantitatively, two trajectories in phase space with initial separation vector diverge at a rate given by

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA was invented by Jeanny Hérault and Christian Jutten in 1985. ICA is a special case of blind source separation. A common example application of ICA is the "cocktail party problem" of listening in on one person's speech in a noisy room.

<span class="mw-page-title-main">Granger causality</span> Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle-particle interactions within the lattice.

In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

In physics, the Tsallis entropy is a generalization of the standard Boltzmann–Gibbs entropy. It is proportional to the expectation of the q-logarithm of a distribution.

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the Kullback–Leibler divergence, with some notable differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance.

<span class="mw-page-title-main">Interaction information</span>

The interaction information is a generalization of the mutual information for more than two variables.

<span class="mw-page-title-main">One-way quantum computer</span> Method of quantum computing

The one-way or measurement-based quantum computer (MBQC) is a method of quantum computing that first prepares an entangled resource state, usually a cluster state or graph state, then performs single qubit measurements on it. It is "one-way" because the resource state is destroyed by the measurements.

In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.

<span class="mw-page-title-main">Tracy–Widom distribution</span> Probability distribution

The Tracy–Widom distribution is a probability distribution from random matrix theory introduced by Craig Tracy and Harold Widom. It is the distribution of the normalized largest eigenvalue of a random Hermitian matrix. The distribution is defined as a Fredholm determinant.

In information theory, dual total correlation, information rate, excess entropy, or binding information is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation.

In statistics, the maximal information coefficient (MIC) is a measure of the strength of the linear or non-linear association between two variables X and Y.

Brain connectivity estimators represent patterns of links in the brain. Connectivity can be considered at different levels of the brain's organisation: from neurons, to neural assemblies and brain structures. Brain connectivity involves different concepts such as: neuroanatomical or structural connectivity, functional connectivity and effective connectivity.

Direct coupling analysis or DCA is an umbrella term comprising several methods for analyzing sequence data in computational biology. The common idea of these methods is to use statistical modeling to quantify the strength of the direct relationship between two positions of a biological sequence, excluding effects from other positions. This contrasts usual measures of correlation, which can be large even if there is no direct relationship between the positions. Such a direct relationship can for example be the evolutionary pressure for two positions to maintain mutual compatibility in the biomolecular structure of the sequence, leading to molecular coevolution between the two positions.

Directed information is an information theory measure that quantifies the information flow from the random string to the random string . The term directed information was coined by James Massey and is defined as

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

This glossary of quantum computing is a list of definitions of terms and concepts used in quantum computing, its sub-disciplines, and related fields.

References

  1. Schreiber, Thomas (1 July 2000). "Measuring information transfer". Physical Review Letters. 85 (2): 461–464. arXiv: nlin/0001042 . Bibcode:2000PhRvL..85..461S. doi:10.1103/PhysRevLett.85.461. PMID   10991308. S2CID   7411376.
  2. Seth, Anil (2007). "Granger causality". Scholarpedia . 2 (7): 1667. Bibcode:2007SchpJ...2.1667S. doi: 10.4249/scholarpedia.1667 .
  3. 1 2 Hlaváčková-Schindler, Katerina; Palus, M; Vejmelka, M; Bhattacharya, J (1 March 2007). "Causality detection based on information-theoretic approaches in time series analysis". Physics Reports. 441 (1): 1–46. Bibcode:2007PhR...441....1H. CiteSeerX   10.1.1.183.1617 . doi:10.1016/j.physrep.2006.12.004.
  4. Jizba, Petr; Kleinert, Hagen; Shefaat, Mohammad (2012-05-15). "Rényi's information transfer between financial time series". Physica A: Statistical Mechanics and Its Applications. 391 (10): 2971–2989. arXiv: 1106.5913 . Bibcode:2012PhyA..391.2971J. doi:10.1016/j.physa.2011.12.064. ISSN   0378-4371. S2CID   51789622.
  5. Wyner, A. D. (1978). "A definition of conditional mutual information for arbitrary ensembles". Information and Control. 38 (1): 51–59. doi: 10.1016/s0019-9958(78)90026-8 .
  6. Dobrushin, R. L. (1959). "General formulation of Shannon's main theorem in information theory". Uspekhi Mat. Nauk. 14: 3–104.
  7. Barnett, Lionel (1 December 2009). "Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables". Physical Review Letters. 103 (23): 238701. arXiv: 0910.4514 . Bibcode:2009PhRvL.103w8701B. doi:10.1103/PhysRevLett.103.238701. PMID   20366183. S2CID   1266025.
  8. 1 2 Ver Steeg, Greg; Galstyan, Aram (2012). "Information transfer in social media". Proceedings of the 21st international conference on World Wide Web (WWW '12). ACM. pp. 509–518. arXiv: 1110.2724 . Bibcode:2011arXiv1110.2724V.
  9. Lungarella, M.; Ishiguro, K.; Kuniyoshi, Y.; Otsu, N. (1 March 2007). "Methods for quantifying the causal structure of bivariate time series". International Journal of Bifurcation and Chaos. 17 (3): 903–921. Bibcode:2007IJBC...17..903L. CiteSeerX   10.1.1.67.3585 . doi:10.1142/S0218127407017628.
  10. Pereda, E; Quiroga, RQ; Bhattacharya, J (Sep–Oct 2005). "Nonlinear multivariate analysis of neurophysiological signals". Progress in Neurobiology. 77 (1–2): 1–37. arXiv: nlin/0510077 . Bibcode:2005nlin.....10077P. doi:10.1016/j.pneurobio.2005.10.003. PMID   16289760. S2CID   9529656.
  11. Montalto, A; Faes, L; Marinazzo, D (Oct 2014). "MuTE: A MATLAB Toolbox to Compare Established and Novel Estimators of the Multivariate Transfer Entropy". PLOS ONE. 9 (10): e109462. Bibcode:2014PLoSO...9j9462M. doi: 10.1371/journal.pone.0109462 . PMC   4196918 . PMID   25314003.
  12. Lizier, Joseph; Prokopenko, Mikhail; Zomaya, Albert (2008). "Local information transfer as a spatiotemporal filter for complex systems". Physical Review E. 77 (2): 026110. arXiv: 0809.3275 . Bibcode:2008PhRvE..77b6110L. doi:10.1103/PhysRevE.77.026110. PMID   18352093. S2CID   15634881.
  13. 1 2 Lizier, Joseph; Heinzle, Jakob; Horstmann, Annette; Haynes, John-Dylan; Prokopenko, Mikhail (2011). "Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity". Journal of Computational Neuroscience. 30 (1): 85–107. doi:10.1007/s10827-010-0271-2. PMID   20799057. S2CID   3012713.
  14. Vicente, Raul; Wibral, Michael; Lindner, Michael; Pipa, Gordon (February 2011). "Transfer entropy—a model-free measure of effective connectivity for the neurosciences". Journal of Computational Neuroscience. 30 (1): 45–67. doi:10.1007/s10827-010-0262-3. PMC   3040354 . PMID   20706781.
  15. Shimono, Masanori; Beggs, John (October 2014). "Functional clusters, hubs, and communities in the cortical microconnectome". Cerebral Cortex. 25 (10): 3743–57. doi:10.1093/cercor/bhu252. PMC   4585513 . PMID   25336598.
  16. Kushwaha, Niraj; Lee, Edward D (July 2023). "Discovering the mesoscale for chains of conflict". PNAS Nexus. 2 (7). doi:10.1093/pnasnexus/pgad228. ISSN   2752-6542. PMC   10392960 . PMID   37533894.
  17. Massey, James (1990). "Causality, Feedback And Directed Information" (ISITA). CiteSeerX   10.1.1.36.5688 .{{cite journal}}: Cite journal requires |journal= (help)
  18. Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv: cs/0608070 . doi:10.1109/TIT.2008.2009849. S2CID   13178.
  19. Kramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory. 49 (1): 4–21. doi:10.1109/TIT.2002.806135.
  20. Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory. 57 (6): 3248–3259. arXiv: 0912.4872 . doi:10.1109/TIT.2011.2136270. S2CID   11722596.