Dual total correlation

Last updated

In information theory, dual total correlation, [1] information rate, [2] excess entropy, [3] [4] or binding information [5] is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation. [3]

Contents

Definition

Venn diagram of information theoretic measures for three variables x, y, and z. The dual total correlation is represented by the union of the three mutual informations and is shown in the diagram by the yellow, magenta, cyan, and gray regions. VennInfo3Var.svg
Venn diagram of information theoretic measures for three variables x, y, and z. The dual total correlation is represented by the union of the three mutual informations and is shown in the diagram by the yellow, magenta, cyan, and gray regions.

For a set of n random variables , the dual total correlation is given by

where is the joint entropy of the variable set and is the conditional entropy of variable , given the rest.

Normalized

The dual total correlation normalized between [0,1] is simply the dual total correlation divided by its maximum value ,

Relationship with Total Correlation

Dual total correlation is non-negative and bounded above by the joint entropy .

Secondly, Dual total correlation has a close relationship with total correlation, , and can be written in terms of differences between the total correlation of the whole, and all subsets of size [6] :

where and

Furthermore, the total correlation and dual total correlation are related by the following bounds:

Finally, the difference between the total correlation and the dual total correlation defines a novel measure of higher-order information-sharing: the O-information: [7]

.

The O-information (first introduced as the "enigmatic information" by James and Crutchfield [8] is a signed measure that quantifies the extent to which the information in a multivariate random variable is dominated by synergistic interactions (in which case ) or redundant interactions (in which case .

History

Han (1978) originally defined the dual total correlation as,

However Abdallah and Plumbley (2010) showed its equivalence to the easier-to-understand form of the joint entropy minus the sum of conditional entropies via the following:

See also

Bibliography

Footnotes

  1. Han, Te Sun (1978). "Nonnegative entropy measures of multivariate symmetric correlations". Information and Control. 36 (2): 133–156. doi: 10.1016/S0019-9958(78)90275-9 .
  2. Dubnov, Shlomo (2006). "Spectral Anticipations". Computer Music Journal. 30 (2): 63–83. doi:10.1162/comj.2006.30.2.63. S2CID   2202704.
  3. 1 2 Nihat Ay, E. Olbrich, N. Bertschinger (2001). A unifying framework for complexity measures of finite systems. European Conference on Complex Systems. pdf.
  4. Olbrich, E.; Bertschinger, N.; Ay, N.; Jost, J. (2008). "How should complexity scale with system size?". The European Physical Journal B. 63 (3): 407–415. Bibcode:2008EPJB...63..407O. doi: 10.1140/epjb/e2008-00134-9 . S2CID   120391127.
  5. Abdallah, Samer A.; Plumbley, Mark D. (2010). "A measure of statistical complexity based on predictive information". arXiv: 1012.1890v1 [math.ST].
  6. Varley, Thomas F.; Pope, Maria; Faskowitz, Joshua; Sporns, Olaf (24 April 2023). "Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex". Communications Biology. 6 (1). doi: 10.1038/s42003-023-04843-w . PMC   10125999 .
  7. Rosas, Fernando E.; Mediano, Pedro A. M.; Gastpar, Michael; Jensen, Henrik J. (13 September 2019). "Quantifying high-order interdependencies via multivariate extensions of the mutual information". Physical Review E. 100 (3). arXiv: 1902.11239 . doi:10.1103/PhysRevE.100.032305.
  8. James, Ryan G.; Ellison, Christopher J.; Crutchfield, James P. (1 September 2011). "Anatomy of a bit: Information in a time series observation". Chaos: An Interdisciplinary Journal of Nonlinear Science. 21 (3). arXiv: 1105.2988 . doi:10.1063/1.3637494.

Related Research Articles

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In statistics, the Wishart distribution is a generalization of the gamma distribution to multiple dimensions. It is named in honor of John Wishart, who first formulated the distribution in 1928. Other names include Wishart ensemble, or Wishart–Laguerre ensemble, or LOE, LUE, LSE.

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

<span class="mw-page-title-main">Joint probability distribution</span> Type of probability distribution

Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).

<span class="mw-page-title-main">Joint entropy</span> Measure of information in probability and information theory

In information theory, joint entropy is a measure of the uncertainty associated with a set of variables.

In mathematical statistics, the Kullback–Leibler divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In information theory, redundancy measures the fractional difference between the entropy H(X) of an ensemble X, and its maximum possible value . Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy, while forward error correction is a way of adding desired redundancy for purposes of error detection and correction when communicating over a noisy channel of limited capacity.

Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average (surprisal) of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the Kullback–Leibler divergence, with some notable differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance.

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

In probability theory and in particular in information theory, total correlation is one of several generalizations of the mutual information. It is also known as the multivariate constraint or multiinformation. It quantifies the redundancy or dependency among a set of n random variables.

<span class="mw-page-title-main">Interaction information</span>

The interaction information is a generalization of the mutual information for more than two variables.

<span class="mw-page-title-main">Singular spectrum analysis</span> Nonparametric spectral estimation method

In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method. It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. Its roots lie in the classical Karhunen (1946)–Loève spectral decomposition of time series and random fields and in the Mañé (1981)–Takens (1981) embedding theorem. SSA can be an aid in the decomposition of time series into a sum of components, each having a meaningful interpretation. The name "singular spectrum analysis" relates to the spectrum of eigenvalues in a singular value decomposition of a covariance matrix, and not directly to a frequency domain decomposition.

The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing constant in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated probability measure, the Gibbs measure, has the Markov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks, and applications such as genomics, corpus linguistics and artificial intelligence, which employ Markov networks, and Markov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing the entropy for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom.

<span class="mw-page-title-main">Variation of information</span> Measure of distance between two clusterings related to mutual information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings. It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.

In probability and statistics, a factorial moment measure is a mathematical quantity, function or, more precisely, measure that is defined in relation to mathematical objects known as point processes, which are types of stochastic processes often used as mathematical models of physical phenomena representable as randomly positioned points in time, space or both. Moment measures generalize the idea of factorial moments, which are useful for studying non-negative integer-valued random variables.

References