Partial information decomposition

Last updated May 29, 2024

Partial Information Decomposition is an extension of information theory, that aims to generalize the pairwise relations described by information theory to the interaction of multiple variables.^[1]

Motivation

Information theory can quantify the amount of information a single source variable $X_{1}$ has about a target variable $Y$ via the mutual information $I(X_{1};Y)$ . If we now consider a second source variable $X_{2}$ , classical information theory can only describe the mutual information of the joint variable $\{X_{1},X_{2}\}$ with $Y$ , given by $I(X_{1},X_{2};Y)$ . In general however, it would be interesting to know how exactly the individual variables $X_{1}$ and $X_{2}$ and their interactions relate to $Y$ .

Consider that we are given two source variables $X_{1},X_{2}\in \{0,1\}$ and a target variable $Y=XOR(X_{1},X_{2})$ . In this case the total mutual information $I(X_{1},X_{2};Y)=1$ , while the individual mutual information $I(X_{1};Y)=I(X_{2};Y)=0$ . That is, there is synergistic information arising from the interaction of $X_{1},X_{2}$ about $Y$ , which cannot be easily captured with classical information theoretic quantities.

Definition

Partial information decomposition further decomposes the mutual information between the source variables $\{X_{1},X_{2}\}$ with the target variable $Y$ as

$I(X_{1},X_{2};Y)={\text{Unq}}(X_{1};Y\setminus X_{2})+{\text{Unq}}(X_{2};Y\setminus X_{1})+{\text{Syn}}(X_{1},X_{2};Y)+{\text{Red}}(X_{1},X_{2};Y)$

Here the individual information atoms are defined as

${\text{Unq}}(X_{1};Y\setminus X_{2})$ is the unique information that $X_{1}$ has about $Y$ , which is not in $X_{2}$
${\text{Syn}}(X_{1},X_{2};Y)$ is the synergistic information that is in the interaction of $X_{1}$ and $X_{2}$ about $Y$
${\text{Red}}(X_{1},X_{2};Y)$ is the redundant information that is in both $X_{1}$ or $X_{2}$ about $Y$

There is, thus far, no universal agreement on how these terms should be defined, with different approaches that decompose information into redundant, unique, and synergistic components appearing in the literature.^[1]^[2]^[3]^[4]

Applications

Despite the lack of universal agreement, partial information decomposition has been applied to diverse fields, including climatology,^[5] neuroscience^[6]^[7]^[8] sociology,^[9] and machine learning^[10] Partial information decomposition has also been proposed as a possible foundation on which to build a mathematically robust definition of emergence in complex systems ^[11] and may be relevant to formal theories of consciousness.^[12]

Related Research Articles

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field, in applied mathematics, is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering.

Quantum entanglement is the phenomenon of a group of particles being generated, interacting, or sharing spatial proximity in such a way that the quantum state of each particle of the group cannot be described independently of the state of the others, including when the particles are separated by a large distance. The topic of quantum entanglement is at the heart of the disparity between classical and quantum physics: entanglement is a primary feature of quantum mechanics not present in classical mechanics.

Synergy is an interaction or cooperation giving rise to a whole that is greater than the simple sum of its parts. The term synergy comes from the Attic Greek word συνεργία synergia from synergos, συνεργός, meaning "working together". Synergy is similar in concept to emergence.

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space. It can be used to calculate the informational difference between measurements.

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA was invented by Jeanny Hérault and Christian Jutten in 1985. ICA is a special case of blind source separation. A common example application of ICA is the "cocktail party problem" of listening in on one person's speech in a noisy room.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted $, is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q . A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model instead of P when the actual distribution is P . While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.$

In information theory, redundancy measures the fractional difference between the entropy $H(X)$ of an ensemble $X$ , and its maximum possible value $. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy, while forward error correction is a way of adding desired redundancy for purposes of error detection and correction when communicating over a noisy channel of limited capacity.$

In physics, the Tsallis entropy is a generalization of the standard Boltzmann–Gibbs entropy. It is proportional to the expectation of the q-logarithm of a distribution.

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the Kullback–Leibler divergence, with some notable differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance. The similarity between the distributions is greater when the Jensen-Shannon distance is closer to zero.

This article discusses how information theory is related to measure theory.

<span class="mw-page-title-main">Interaction information</span>

The interaction information is a generalization of the mutual information for more than two variables.

<span class="mw-page-title-main">Information diagram</span> Venn diagram to illustrate relationship

An information diagram is a type of Venn diagram used in information theory to illustrate relationships among Shannon's basic measures of information: entropy, joint entropy, conditional entropy and mutual information. Information diagrams are a useful pedagogical tool for teaching and learning about these basic measures of information. Information diagrams have also been applied to specific problems such as for displaying the information theoretic similarity between sets of ontological terms.

In quantum information theory, a set of bases in Hilbert space C^d are said to be mutually unbiased to mean, that, if a system is prepared in an eigenstate of one of the bases, then all outcomes of the measurement with respect to the other basis are predicted to occur with an equal probability inexorably equal to 1/d.

In information theory, dual total correlation, information rate, excess entropy, or binding information is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation.

In statistics, the maximal information coefficient (MIC) is a measure of the strength of the linear or non-linear association between two variables X and Y.

The free energy principle is a theoretical framework suggesting that the brain reduces surprise or uncertainty by making predictions based on internal models and updating them using sensory input. It highlights the brain's objective of aligning its internal model with the external world to enhance prediction accuracy. This principle integrates Bayesian inference with active inference, where actions are guided by predictions and sensory feedback refines them. It has wide-ranging implications for comprehending brain function, perception, and action.

Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes. Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if $and for denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:$

Direct coupling analysis or DCA is an umbrella term comprising several methods for analyzing sequence data in computational biology. The common idea of these methods is to use statistical modeling to quantify the strength of the direct relationship between two positions of a biological sequence, excluding effects from other positions. This contrasts usual measures of correlation, which can be large even if there is no direct relationship between the positions. Such a direct relationship can for example be the evolutionary pressure for two positions to maintain mutual compatibility in the biomolecular structure of the sequence, leading to molecular coevolution between the two positions.

References

1 2 Williams PL, Beer RD (2010-04-14). "Nonnegative Decomposition of Multivariate Information". arXiv: 1004.2515 [cs.IT].
↑ Quax R, Har-Shemesh O, Sloot PM (February 2017). "Quantifying Synergistic Information Using Intermediate Stochastic Variables". Entropy. 19 (2): 85. arXiv: 1602.01265 . doi: 10.3390/e19020085 . ISSN 1099-4300.
↑ Rosas FE, Mediano PA, Rassouli B, Barrett AB (2020-12-04). "An operational information decomposition via synergistic disclosure". Journal of Physics A: Mathematical and Theoretical. 53 (48): 485001. arXiv: 2001.10387 . Bibcode:2020JPhA...53V5001R. doi:10.1088/1751-8121/abb723. ISSN 1751-8113. S2CID 210932609.
↑ Kolchinsky A (March 2022). "A Novel Approach to the Partial Information Decomposition". Entropy. 24 (3): 403. arXiv: 1908.08642 . Bibcode:2022Entrp..24..403K. doi: 10.3390/e24030403 . PMC 8947370 . PMID 35327914.
↑ Goodwell AE, Jiang P, Ruddell BL, Kumar P (February 2020). "Debates—Does Information Theory Provide a New Paradigm for Earth Science? Causality, Interaction, and Feedback". Water Resources Research. 56 (2). Bibcode:2020WRR....5624940G. doi: 10.1029/2019WR024940 . ISSN 0043-1397. S2CID 216201598.
↑ Newman EL, Varley TF, Parakkattu VK, Sherrill SP, Beggs JM (July 2022). "Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition". Entropy. 24 (7): 930. Bibcode:2022Entrp..24..930N. doi: 10.3390/e24070930 . PMC 9319160 . PMID 35885153.
↑ Luppi AI, Mediano PA, Rosas FE, Holland N, Fryer TD, O'Brien JT, et al. (June 2022). "A synergistic core for human brain evolution and cognition". Nature Neuroscience. 25 (6): 771–782. doi:10.1038/s41593-022-01070-0. PMC 7614771 . PMID 35618951. S2CID 249096746.
↑ Wibral M, Priesemann V, Kay JW, Lizier JT, Phillips WA (March 2017). "Partial information decomposition as a unified approach to the specification of neural goal functions". Brain and Cognition. Perspectives on Human Probabilistic Inferences and the 'Bayesian Brain'. 112: 25–38. arXiv: 1510.00831 . doi: 10.1016/j.bandc.2015.09.004 . PMID 26475739. S2CID 4394452.
↑ Varley TF, Kaminski P (October 2022). "Untangling Synergistic Effects of Intersecting Social Identities with Partial Information Decomposition". Entropy. 24 (10): 1387. Bibcode:2022Entrp..24.1387V. doi: 10.3390/e24101387 . ISSN 1099-4300. PMC 9611752 . PMID 37420406.
↑ Tax TM, Mediano PA, Shanahan M (September 2017). "The Partial Information Decomposition of Generative Neural Network Models". Entropy. 19 (9): 474. Bibcode:2017Entrp..19..474T. doi: 10.3390/e19090474 . hdl: 10044/1/50586 . ISSN 1099-4300.
↑ Mediano PA, Rosas FE, Luppi AI, Jensen HJ, Seth AK, Barrett AB, et al. (July 2022). "Greater than the parts: a review of the information decomposition approach to causal emergence". Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences. 380 (2227): 20210246. doi:10.1098/rsta.2021.0246. PMC 9125226 . PMID 35599558.
↑ Luppi AI, Mediano PA, Rosas FE, Harrison DJ, Carhart-Harris RL, Bor D, Stamatakis EA (2021). "What it is like to be a bit: an integrated information decomposition account of emergent mental phenomena". Neuroscience of Consciousness. 2021 (2): niab027. doi:10.1093/nc/niab027. PMC 8600547 . PMID 34804593.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[williamsbeer-1] 1 2 Williams PL, Beer RD (2010-04-14). "Nonnegative Decomposition of Multivariate Information". arXiv: 1004.2515 [cs.IT].

[2] Quax R, Har-Shemesh O, Sloot PM (February 2017). "Quantifying Synergistic Information Using Intermediate Stochastic Variables". Entropy. 19 (2): 85. arXiv: 1602.01265 . doi: 10.3390/e19020085 . ISSN 1099-4300.

[3] Rosas FE, Mediano PA, Rassouli B, Barrett AB (2020-12-04). "An operational information decomposition via synergistic disclosure". Journal of Physics A: Mathematical and Theoretical. 53 (48): 485001. arXiv: 2001.10387 . Bibcode:2020JPhA...53V5001R. doi:10.1088/1751-8121/abb723. ISSN 1751-8113. S2CID 210932609.

[4] Kolchinsky A (March 2022). "A Novel Approach to the Partial Information Decomposition". Entropy. 24 (3): 403. arXiv: 1908.08642 . Bibcode:2022Entrp..24..403K. doi: 10.3390/e24030403 . PMC 8947370 . PMID 35327914.

[5] Goodwell AE, Jiang P, Ruddell BL, Kumar P (February 2020). "Debates—Does Information Theory Provide a New Paradigm for Earth Science? Causality, Interaction, and Feedback". Water Resources Research. 56 (2). Bibcode:2020WRR....5624940G. doi: 10.1029/2019WR024940 . ISSN 0043-1397. S2CID 216201598.

[6] Newman EL, Varley TF, Parakkattu VK, Sherrill SP, Beggs JM (July 2022). "Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition". Entropy. 24 (7): 930. Bibcode:2022Entrp..24..930N. doi: 10.3390/e24070930 . PMC 9319160 . PMID 35885153.

[7] Luppi AI, Mediano PA, Rosas FE, Holland N, Fryer TD, O'Brien JT, et al. (June 2022). "A synergistic core for human brain evolution and cognition". Nature Neuroscience. 25 (6): 771–782. doi:10.1038/s41593-022-01070-0. PMC 7614771 . PMID 35618951. S2CID 249096746.

[8] Wibral M, Priesemann V, Kay JW, Lizier JT, Phillips WA (March 2017). "Partial information decomposition as a unified approach to the specification of neural goal functions". Brain and Cognition. Perspectives on Human Probabilistic Inferences and the 'Bayesian Brain'. 112: 25–38. arXiv: 1510.00831 . doi: 10.1016/j.bandc.2015.09.004 . PMID 26475739. S2CID 4394452.

[9] Varley TF, Kaminski P (October 2022). "Untangling Synergistic Effects of Intersecting Social Identities with Partial Information Decomposition". Entropy. 24 (10): 1387. Bibcode:2022Entrp..24.1387V. doi: 10.3390/e24101387 . ISSN 1099-4300. PMC 9611752 . PMID 37420406.

[10] Tax TM, Mediano PA, Shanahan M (September 2017). "The Partial Information Decomposition of Generative Neural Network Models". Entropy. 19 (9): 474. Bibcode:2017Entrp..19..474T. doi: 10.3390/e19090474 . hdl: 10044/1/50586 . ISSN 1099-4300.

[11] Mediano PA, Rosas FE, Luppi AI, Jensen HJ, Seth AK, Barrett AB, et al. (July 2022). "Greater than the parts: a review of the information decomposition approach to causal emergence". Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences. 380 (2227): 20210246. doi:10.1098/rsta.2021.0246. PMC 9125226 . PMID 35599558.

[12] Luppi AI, Mediano PA, Rosas FE, Harrison DJ, Carhart-Harris RL, Bor D, Stamatakis EA (2021). "What it is like to be a bit: an integrated information decomposition account of emergent mental phenomena". Neuroscience of Consciousness. 2021 (2): niab027. doi:10.1093/nc/niab027. PMC 8600547 . PMID 34804593.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]