Interaction information

Last updated
Venn diagram of information theoretic measures for three variables x, y, and z, represented by the lower left, lower right, and upper circles, respectively. The interaction information is represented by gray region, and it is the only one that can be negative. VennInfo3Var.svg
Venn diagram of information theoretic measures for three variables x, y, and z, represented by the lower left, lower right, and upper circles, respectively. The interaction information is represented by gray region, and it is the only one that can be negative.

The interaction information is a generalization of the mutual information for more than two variables.

Contents

There are many names for interaction information, including amount of information, [1] information correlation, [2] co-information, [3] and simply mutual information. [4] Interaction information expresses the amount of information (redundancy or synergy) bound up in a set of variables, beyond that which is present in any subset of those variables. Unlike the mutual information, the interaction information can be either positive or negative. These functions, their negativity and minima have a direct interpretation in algebraic topology. [5]

Definition

The conditional mutual information can be used to inductively define the interaction information for any finite number of variables as follows:

where

Some authors [6] define the interaction information differently, by swapping the two terms being subtracted in the preceding equation. This has the effect of reversing the sign for an odd number of variables.

For three variables , the interaction information is given by

where is the mutual information between variables and , and is the conditional mutual information between variables and given . The interaction information is symmetric, so it does not matter which variable is conditioned on. This is easy to see when the interaction information is written in terms of entropy and joint entropy, as follows:

In general, for the set of variables , the interaction information can be written in the following form (compare with Kirkwood approximation):

For three variables, the interaction information measures the influence of a variable on the amount of information shared between and . Because the term can be larger than , the interaction information can be negative as well as positive. This will happen, for example, when and are independent but not conditionally independent given . Positive interaction information indicates that variable inhibits (i.e., accounts for or explains some of) the correlation between and , whereas negative interaction information indicates that variable facilitates or enhances the correlation.

Properties

Interaction information is bounded. In the three variable case, it is bounded by [4]

If three variables form a Markov chain , then , but . Therefore

Examples

Positive interaction information

Positive interaction information seems much more natural than negative interaction information in the sense that such explanatory effects are typical of common-cause structures. For example, clouds cause rain and also block the sun; therefore, the correlation between rain and darkness is partly accounted for by the presence of clouds, . The result is positive interaction information .

Negative interaction information

A car's engine can fail to start due to either a dead battery or a blocked fuel pump. Ordinarily, we assume that battery death and fuel pump blockage are independent events, . But knowing that the car fails to start, if an inspection shows the battery to be in good health, we can conclude that the fuel pump must be blocked. Therefore , and the result is negative interaction information.

Difficulty of interpretation

The possible negativity of interaction information can be the source of some confusion. [3] Many authors have taken zero interaction information as a sign that three or more random variables do not interact, but this interpretation is wrong. [7]

To see how difficult interpretation can be, consider a set of eight independent binary variables . Agglomerate these variables as follows:

Because the 's overlap each other (are redundant) on the three binary variables , we would expect the interaction information to equal bits, which it does. However, consider now the agglomerated variables

These are the same variables as before with the addition of . However, in this case is actually equal to bit, indicating less redundancy. This is correct in the sense that

but it remains difficult to interpret.

Uses


See also

Related Research Articles

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the set and is distributed according to , the entropy is

<span class="mw-page-title-main">Independence (probability theory)</span> When the occurrence of one event does not affect the likelihood of another

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In mathematics, a measure-preserving dynamical system is an object of study in the abstract formulation of dynamical systems, and ergodic theory in particular. Measure-preserving systems obey the Poincaré recurrence theorem, and are a special case of conservative systems. They provide the formal, mathematical basis for a broad range of physical systems, and, in particular, many systems from classical mechanics as well as systems in thermodynamic equilibrium.

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

<span class="mw-page-title-main">Conditional entropy</span> Measure of relative information in probability theory

In information theory, the conditional entropy quantifies the amount of information needed to describe the outcome of a random variable given that the value of another random variable is known. Here, information is measured in shannons, nats, or hartleys. The entropy of conditioned on is written as .

<span class="mw-page-title-main">Joint entropy</span> Measure of information in probability and information theory

In information theory, joint entropy is a measure of the uncertainty associated with a set of variables.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model instead of P when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

In information theory, redundancy measures the fractional difference between the entropy H(X) of an ensemble X, and its maximum possible value . Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy, while forward error correction is a way of adding desired redundancy for purposes of error detection and correction when communicating over a noisy channel of limited capacity.

In mathematics, the Bernoulli scheme or Bernoulli shift is a generalization of the Bernoulli process to more than two possible outcomes. Bernoulli schemes appear naturally in symbolic dynamics, and are thus important in the study of dynamical systems. Many important dynamical systems exhibit a repellor that is the product of the Cantor set and a smooth manifold, and the dynamics on the Cantor set are isomorphic to that of the Bernoulli shift. This is essentially the Markov partition. The term shift is in reference to the shift operator, which may be used to study Bernoulli schemes. The Ornstein isomorphism theorem shows that Bernoulli shifts are isomorphic when their entropy is equal.

<span class="mw-page-title-main">Scoring rule</span> Measure for evaluating probabilistic forecasts

In decision theory, a scoring rule provides are evaluation metrics for probabilistic predictions or forecasts. While "regular" loss functions assign a goodness-of-fit score to a predicted value and an observed value, scoring rules assign such a score to a predicted probability distribution and an observed value. On the other hand, a scoring function provides a summary measure for the evaluation of point predictions, i.e. one predicts a property or functional , like the expectation or the median.

This article discusses how information theory is related to measure theory.

In probability theory and in particular in information theory, total correlation is one of several generalizations of the mutual information. It is also known as the multivariate constraint or multiinformation. It quantifies the redundancy or dependency among a set of n random variables.

The Kirkwood superposition approximation was introduced in 1935 by John G. Kirkwood as a means of representing a discrete probability distribution. The Kirkwood approximation for a discrete probability density function is given by

The entropic vector or entropic function is a concept arising in information theory. It represents the possible values of Shannon's information entropy that subsets of one set of random variables may take. Understanding which vectors are entropic is a way to represent all possible inequalities between entropies of various subsets. For example, for any two random variables , their joint entropy is at most the sum of the entropies of and of :

<span class="mw-page-title-main">Conditional mutual information</span> Information theory

In probability theory, particularly information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.

In information theory, dual total correlation, information rate, excess entropy, or binding information is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation.

Partial Information Decomposition is an extension of information theory, that aims to generalize the pairwise relations described by information theory to the interaction of multiple variables.

References

  1. Ting, Hu Kuo (January 1962). "On the Amount of Information". Theory of Probability & Its Applications. 7 (4): 439–447. doi:10.1137/1107041. ISSN   0040-585X.
  2. Wolf, David (May 1, 1996). The Generalization of Mutual Information as the Information between a Set of Variables: The Information Correlation Function Hierarchy and the Information Structure of Multi-Agent Systems (Technical report). NASA Ames Research Center.
  3. 1 2 Bell, Anthony (2003). The co-information lattice. 4th Int. Symp. Independent Component Analysis and Blind Source Separation.
  4. 1 2 Yeung, R.W. (May 1991). "A new outlook on Shannon's information measures". IEEE Transactions on Information Theory. 37 (3): 466–474. doi:10.1109/18.79902. ISSN   0018-9448.
  5. Baudot, Pierre; Bennequin, Daniel (2015-05-13). "The Homological Nature of Entropy". Entropy. 17 (5): 3253–3318. Bibcode:2015Entrp..17.3253B. doi: 10.3390/e17053253 . ISSN   1099-4300.
  6. McGill, William J. (June 1954). "Multivariate information transmission". Psychometrika. 19 (2): 97–116. doi:10.1007/bf02289159. ISSN   0033-3123. S2CID   126431489.
  7. Krippendorff, Klaus (August 2009). "Information of interactions in complex systems". International Journal of General Systems. 38 (6): 669–680. doi:10.1080/03081070902993160. ISSN   0308-1079. S2CID   13923485.
  8. Killian, Benjamin J.; Yundenfreund Kravitz, Joslyn; Gilson, Michael K. (2007-07-14). "Extraction of configurational entropy from molecular simulations via an expansion approximation". The Journal of Chemical Physics. 127 (2): 024107. Bibcode:2007JChPh.127b4107K. doi:10.1063/1.2746329. ISSN   0021-9606. PMC   2707031 . PMID   17640119.
  9. LeVine, Michael V.; Perez-Aguilar, Jose Manuel; Weinstein, Harel (2014-06-18). "N-body Information Theory (NbIT) Analysis of Rigid-Body Dynamics in Intracellular Loop 2 of the 5-HT2A Receptor". arXiv: 1406.4730 [q-bio.BM].
  10. "InfoTopo: Topological Information Data Analysis. Deep statistical unsupervised and supervised learning - File Exchange - Github". github.com/pierrebaudot/infotopopy/. Retrieved 26 September 2020.