Information field theory

Last updated

Information field theory (IFT) is a Bayesian statistical field theory relating to signal reconstruction, cosmography, and other related areas. [1] [2] IFT summarizes the information available on a physical field using Bayesian probabilities. It uses computational techniques developed for quantum field theory and statistical field theory to handle the infinite number of degrees of freedom of a field and to derive algorithms for the calculation of field expectation values. For example, the posterior expectation value of a field generated by a known Gaussian process and measured by a linear device with known Gaussian noise statistics is given by a generalized Wiener filter applied to the measured data. IFT extends such known filter formula to situations with nonlinear physics [ disambiguation needed ], nonlinear devices, non-Gaussian field or noise statistics, dependence of the noise statistics on the field values, and partly unknown parameters of measurement. For this it uses Feynman diagrams, renormalisation flow equations, and other methods from mathematical physics. [3]

Contents

Motivation

Fields play an important role in science, technology, and economy. They describe the spatial variations of a quantity, like the air temperature, as a function of position. Knowing the configuration of a field can be of large value. Measurements of fields, however, can never provide the precise field configuration with certainty. Physical fields have an infinite number of degrees of freedom, but the data generated by any measurement device is always finite, providing only a finite number of constraints on the field. Thus, an unambiguous deduction of such a field from measurement data alone is impossible and only probabilistic inference remains as a means to make statements about the field. Fortunately, physical fields exhibit correlations and often follow known physical laws. Such information is best fused into the field inference in order to overcome the mismatch of field degrees of freedom to measurement points. To handle this, an information theory for fields is needed, and that is what information field theory is.

Concepts

Bayesian inference

is a field value at a location in a space . The prior knowledge about the unknown signal field is encoded in the probability distribution . The data provides additional information on via the likelihood that gets incorporated into the posterior probability

according to Bayes theorem.

Information Hamiltonian

In IFT Bayes theorem is usually rewritten in the language of a statistical field theory,

with the information Hamiltonian defined as

the negative logarithm of the joint probability of data and signal and with the partition function being

This reformulation of Bayes theorem permits the usage of methods of mathematical physics developed for the treatment of statistical field theories and quantum field theories.

Fields

As fields have an infinite number of degrees of freedom, the definition of probabilities over spaces of field configurations has subtleties. Identifying physical fields as elements of function spaces provides the problem that no Lebesgue measure is defined over the latter and therefore probability densities can not be defined there. However, physical fields have much more regularity than most elements of function spaces, as they are continuous and smooth at most of their locations. Therefore, less general, but sufficiently flexible constructions can be used to handle the infinite number of degrees of freedom of a field.

A pragmatic approach is to regard the field to be discretized in terms of pixels. Each pixel carries a single field value that is assumed to be constant within the pixel volume. All statements about the continuous field have then to be cast into its pixel representation. This way, one deals with finite dimensional field spaces, over which probability densities are well definable.

In order for this description to be a proper field theory, it is further required that the pixel resolution can always be refined, while expectation values of the discretized field converge to finite values:

Path integrals

If this limit exists, one can talk about the field configuration space integral or path integral

irrespective of the resolution it might be evaluated numerically.

Gaussian prior

The simplest prior for a field is that of a zero mean Gaussian probability distribution

The determinant in the denominator might be ill-defined in the continuum limit , however, all what is necessary for IFT to be consistent is that this determinant can be estimated for any finite resolution field representation with and that this permits the calculation of convergent expectation values. A Gaussian probability distribution requires the specification of the field two point correlation function with coefficients

and a scalar product for continuous fields

with respect to which the inverse signal field covariance is constructed, i.e. The corresponding prior information Hamiltonian reads

Measurement equation

The measurement data was generated with the likelihood . In case the instrument was linear, a measurement equation of the form

can be given, in which is the instrument response, which describes how the data on average reacts to the signal, and is the noise, simply the difference between data and linear signal response . It is essential to note that the response translates the infinite dimensional signal vector into the finite dimensional data space. In components this reads

where a vector component notation was also introduced for signal and data vectors.

If the noise follows a signal independent zero mean Gaussian statistics with covariance , then the likelihood is Gaussian as well,

and the likelihood information Hamiltonian is

A linear measurement of a Gaussian signal, subject to Gaussian and signal-independent noise leads to a free IFT.

Free theory

Free Hamiltonian

The joint information Hamiltonian of the Gaussian scenario described above is

where denotes equality up to irrelevant constants, which, in this case, means expressions that are independent of . From this is it clear, that the posterior must be a Gaussian with mean and variance ,

where equality between the right and left hand sides holds as both distributions are normalized, .

Generalized Wiener filter

The posterior mean

is also known as the generalized Wiener filter solution and the uncertainty covariance

as the Wiener variance. In IFT, is called the information source, as it acts as a source term to excite the field (knowledge), and the information propagator, as it propagates information from one location to another in

Interacting theory

Interacting Hamiltonian

If any of the assumptions that lead to the free theory is violated, IFT becomes an interacting theory, with terms that are of higher than quadratic order in the signal field. This happens when the signal or the noise are not following Gaussian statistics, when the response is non-linear, when the noise depends on the signal, or when response or covariances are uncertain.

In this case, the information Hamiltonian might be expandable in a Taylor-Fréchet series,

where is the free Hamiltonian, which alone would lead to a Gaussian posterior, and is the interacting Hamiltonian, which encodes non-Gaussian corrections. The first and second order Taylor coefficients are often identified with the (negative) information source and information propagator , respectively. The higher coefficients are associated with non-linear self-interactions.

Classical field

The classical field minimizes the information Hamiltonian,

and therefore maximizes the posterior:

The classical field is therefore the maximum a posteriori estimator of the field inference problem.

Critical filter

The Wiener filter problem requires the two point correlation of a field to be known. If it is unknown, it has to be inferred along with the field itself. This requires the specification of a hyperprior . Often, statistical homogeneity (translation invariance) can be assumed, implying that is diagonal in Fourier space (for being a dimensional Cartesian space). In this case, only the Fourier space power spectrum needs to be inferred. Given a further assumption of statistical isotropy, this spectrum depends only on the length of the Fourier vector and only a one dimensional spectrum has to be determined. The prior field covariance reads then in Fourier space coordinates .

If the prior on is flat, the joint probability of data and spectrum is

where the notation of the information propagator and source of the Wiener filter problem was used again. The corresponding information Hamiltonian is

where denotes equality up to irrelevant constants (here: constant with respect to ). Minimizing this with respect to , in order to get its maximum a posteriori power spectrum estimator, yields

where the Wiener filter mean and the spectral band projector were introduced. The latter commutes with , since is diagonal in Fourier space. The maximum a posteriori estimator for the power spectrum is therefore

It has to be calculated iteratively, as and depend both on themselves. In an empirical Bayes approach, the estimated would be taken as given. As a consequence, the posterior mean estimate for the signal field is the corresponding and its uncertainty the corresponding in the empirical Bayes approximation. The resulting non-linear filter is called the critical filter. [4] The generalization of the power spectrum estimation formula as

exhibits a perception thresholds for , meaning that the data variance in a Fourier band has to exceed the expected noise level by a certain threshold before the signal reconstruction becomes non-zero for this band. Whenever the data variance exceeds this threshold slightly, the signal reconstruction jumps to a finite excitation level, similar to a first order phase transition in thermodynamic systems. For filter with perception of the signal starts continuously as soon the data variance exceeds the noise level. The disappearance of the discontinuous perception at is similar to a thermodynamic system going through a critical point. Hence the name critical filter.

The critical filter, extensions thereof to non-linear measurements, and the inclusion of non-flat spectrum priors, permitted the application of IFT to real world signal inference problems, for which the signal covariance is usually unknown a priori.

IFT application examples

Radio interferometric image of radio galaxies in the galaxy cluster Abell 2219. The images were constructed by data back-projection (top), the CLEAN algorithm (middle), and the RESOLVE algorithm (bottom). Negative and therefore not physical fluxes are displayed in white. Abell 2219 radio.png
Radio interferometric image of radio galaxies in the galaxy cluster Abell 2219. The images were constructed by data back-projection (top), the CLEAN algorithm (middle), and the RESOLVE algorithm (bottom). Negative and therefore not physical fluxes are displayed in white.

The generalized Wiener filter, that emerges in free IFT, is in broad usage in signal processing. Algorithms explicitly based on IFT were derived for a number of applications. Many of them are implemented using the Numerical Information Field Theory (NIFTy) library.

Advanced theory

Many techniques from quantum field theory can be used to tackle IFT problems, like Feynman diagrams, effective actions, and the field operator formalism.

Feynman diagrams

First three Feynman diagrams contributing to the posterior mean estimate of a field. A line expresses an information propagator, a dot at the end of a line to an information source, and a vertex to an interaction term. The first diagram encodes the Wiener filter, the second a non-linear correction, and the third an uncertainty correction to the Wiener filter. IFT Feynman.png
First three Feynman diagrams contributing to the posterior mean estimate of a field. A line expresses an information propagator, a dot at the end of a line to an information source, and a vertex to an interaction term. The first diagram encodes the Wiener filter, the second a non-linear correction, and the third an uncertainty correction to the Wiener filter.

In case the interaction coefficients in a Taylor-Fréchet expansion of the information Hamiltonian

are small, the log partition function, or Helmholtz free energy,

can be expanded asymptotically in terms of these coefficients. The free Hamiltonian specifies the mean and variance of the Gaussian distribution over which the expansion is integrated. This leads to a sum over the set of all connected Feynman diagrams. From the Helmholtz free energy, any connected moment of the field can be calculated via

Situations where small expansion parameters exist that are needed for such a diagrammatic expansion to converge are given by nearly Gaussian signal fields, where the non-Gaussianity of the field statistics leads to small interaction coefficients . For example, the statistics of the Cosmic Microwave Background is nearly Gaussian, with small amounts of non-Gaussianities believed to be seeded during the inflationary epoch in the Early Universe.

Effective action

In order to have a stable numerics for IFT problems, a field functional that if minimized provides the posterior mean field is needed. Such is given by the effective action or Gibbs free energy of a field. The Gibbs free energy can be constructed from the Helmholtz free energy via a Legendre transformation.

In IFT, it is given by the difference of the internal information energy

and the Shannon entropy

for temperature ,

where a Gaussian posterior approximation is used with the approximate data containing the mean and the dispersion of the field. [5]

The Gibbs free energy is then

the Kullback-Leibler divergence between approximative and exact posterior plus the Helmholtz free energy. As the latter does not depend on the approximate data , minimizing the Gibbs free energy is equivalent to minimizing the Kullback-Leibler divergence between approximate and exact posterior. Thus, the effective action approach of IFT is equivalent to the variational Bayesian methods, which also minimize the Kullback-Leibler divergence between approximate and exact posteriors. Minimizing the Gibbs free energy provides approximatively the posterior mean field

whereas minimizing the information Hamiltonian provides the maximum a posteriori field. As the latter is known to over-fit noise, the former is usually a better field estimator.

Operator formalism

The calculation of the Gibbs free energy requires the calculation of Gaussian integrals over an information Hamiltonian, since the internal information energy is

Such integrals can be calculated via a field operator formalism, [6] in which

is the field operator. This generates the field expression within the integral if applied to the Gaussian distribution function,

and any higher power of the field if applied several times,

If the information Hamiltonian is analytical, all its terms can be generated via the field operator

As the field operator does not depend on the field itself, it can be pulled out of the path integral of the internal information energy construction,

where should be regarded as a functional that always returns the value irrespective the value of its input . The resulting expression can be calculated by commuting the mean field annihilator to the right of the expression, where they vanish since . The mean field annihilator commutes with the mean field as

By the usage of the field operator formalism the Gibbs free energy can be calculated, which permits the (approximate) inference of the posterior mean field via a numerical robust functional minimization.

History

The book of Norbert Wiener [7] might be regarded as one of the first works on field inference. The usage of path integrals for field inference was proposed by a number of authors, e.g. Edmund Bertschinger [8] or William Bialek and A. Zee. [9] The connection of field theory and Bayesian reasoning was made explicit by Jörg Lemm. [10] The term information field theorywas coined by Torsten Enßlin. [11] See the latter reference for more information on the history of IFT.

See also

Related Research Articles

In vector calculus and differential geometry the generalized Stokes theorem, also called the Stokes–Cartan theorem, is a statement about the integration of differential forms on manifolds, which both simplifies and generalizes several theorems from vector calculus. In particular, the fundamental theorem of calculus is the special case where the manifold is a line segment, Green’s theorem and Stokes' theorem are the cases of a surface in or and the divergence theorem is the case of a volume in Hence, the theorem is sometimes referred to as the Fundamental Theorem of Multivariate Calculus.

<span class="mw-page-title-main">Noether's theorem</span> Statement relating differentiable symmetries to conserved quantities

Noether's theorem states that every continuous symmetry of the action of a physical system with conservative forces has a corresponding conservation law. This is the first of two theorems proven by mathematician Emmy Noether in 1915 and published in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries of physical space.

In physics, a Langevin equation is a stochastic differential equation describing how a system evolves when subjected to a combination of deterministic and fluctuating ("random") forces. The dependent variables in a Langevin equation typically are collective (macroscopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. One application is to Brownian motion, which models the fluctuating motion of a small particle in a fluid.

<span class="mw-page-title-main">Hamiltonian mechanics</span> Formulation of classical mechanics using momenta

In physics, Hamiltonian mechanics is a reformulation of Lagrangian mechanics that emerged in 1833. Introduced by Sir William Rowan Hamilton, Hamiltonian mechanics replaces (generalized) velocities used in Lagrangian mechanics with (generalized) momenta. Both theories provide interpretations of classical mechanics and describe the same physical phenomena.

In the calculus of variations and classical mechanics, the Euler–Lagrange equations are a system of second-order ordinary differential equations whose solutions are stationary points of the given action functional. The equations were discovered in the 1750s by Swiss mathematician Leonhard Euler and Italian mathematician Joseph-Louis Lagrange.

In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate thermodynamic variables of the system, such as the total energy, free energy, entropy, and pressure, can be expressed in terms of the partition function or its derivatives. The partition function is dimensionless.

<span class="mw-page-title-main">Electrostatics</span> Study of stationary or slow-moving electric charges

Electrostatics is a branch of physics that studies slow-moving or stationary electric charges.

<span class="mw-page-title-main">Path integral formulation</span> Formulation of quantum mechanics

The path integral formulation is a description in quantum mechanics that generalizes the stationary action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

In probability theory and related fields, Malliavin calculus is a set of mathematical techniques and ideas that extend the mathematical field of calculus of variations from deterministic functions to stochastic processes. In particular, it allows the computation of derivatives of random variables. Malliavin calculus is also called the stochastic calculus of variations. P. Malliavin first initiated the calculus on infinite dimensional space. Then, the significant contributors such as S. Kusuoka, D. Stroock, J-M. Bismut, Shinzo Watanabe, I. Shigekawa, and so on finally completed the foundations.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.
<span class="mw-page-title-main">LSZ reduction formula</span> Connection between correlation functions and the S-matrix

In quantum field theory, the Lehmann–Symanzik–Zimmermann (LSZ) reduction formula is a method to calculate S-matrix elements from the time-ordered correlation functions of a quantum field theory. It is a step of the path that starts from the Lagrangian of some quantum field theory and leads to prediction of measurable quantities. It is named after the three German physicists Harry Lehmann, Kurt Symanzik and Wolfhart Zimmermann.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In quantum field theory, a fermionic field is a quantum field whose quanta are fermions; that is, they obey Fermi–Dirac statistics. Fermionic fields obey canonical anticommutation relations rather than the canonical commutation relations of bosonic fields.

In physics, a sigma model is a field theory that describes the field as a point particle confined to move on a fixed manifold. This manifold can be taken to be any Riemannian manifold, although it is most commonly taken to be either a Lie group or a symmetric space. The model may or may not be quantized. An example of the non-quantized version is the Skyrme model; it cannot be quantized due to non-linearities of power greater than 4. In general, sigma models admit (classical) topological soliton solutions, for example, the Skyrmion for the Skyrme model. When the sigma field is coupled to a gauge field, the resulting model is described by Ginzburg–Landau theory. This article is primarily devoted to the classical field theory of the sigma model; the corresponding quantized theory is presented in the article titled "non-linear sigma model".

In quantum mechanics, the probability current is a mathematical quantity describing the flow of probability. Specifically, if one thinks of probability as a heterogeneous fluid, then the probability current is the rate of flow of this fluid. It is a real vector that changes with space and time. Probability currents are analogous to mass currents in hydrodynamics and electric currents in electromagnetism. As in those fields, the probability current is related to the probability density function via a continuity equation. The probability current is invariant under gauge transformation.

In theoretical physics, scalar field theory can refer to a relativistically invariant classical or quantum theory of scalar fields. A scalar field is invariant under any Lorentz transformation.

<span class="mw-page-title-main">Weyl equation</span> Relativistic wave equation describing massless fermions

In physics, particularly in quantum field theory, the Weyl equation is a relativistic wave equation for describing massless spin-1/2 particles called Weyl fermions. The equation is named after Hermann Weyl. The Weyl fermions are one of the three possible types of elementary fermions, the other two being the Dirac and the Majorana fermions.

The Gibbs rotational ensemble represents the possible states of a mechanical system in thermal and rotational equilibrium at temperature and angular velocity . The Jaynes procedure can be used to obtain this ensemble. An ensemble is the set of microstates corresponding to a given macrostate.

References

  1. Enßlin, Torsten (2013). "Information field theory". AIP Conference Proceedings. 1553 (1): 184–191. arXiv: 1301.2556 . Bibcode:2013AIPC.1553..184E. doi:10.1063/1.4819999.
  2. Enßlin, Torsten A. (2019). "Information theory for fields". Annalen der Physik. 531 (3): 1800127. arXiv: 1804.03350 . Bibcode:2019AnP...53100127E. doi:10.1002/andp.201800127.
  3. "Information field theory". Max Planck Society. Retrieved 13 Nov 2014.
  4. Enßlin, Torsten A.; Frommert, Mona (2011-05-19). "Reconstruction of signals with unknown spectra in information field theory with parameter uncertainty". Physical Review D. 83 (10): 105014. arXiv: 1002.2928 . Bibcode:2011PhRvD..83j5014E. doi:10.1103/PhysRevD.83.105014.
  5. Enßlin, Torsten A. (2010). "Inference with minimal Gibbs free energy in information field theory". Physical Review E. 82 (5): 051112. arXiv: 1004.2868 . Bibcode:2010PhRvE..82e1112E. doi:10.1103/physreve.82.051112. PMID   21230442.
  6. Leike, Reimar H.; Enßlin, Torsten A. (2016-11-16). "Operator calculus for information field theory". Physical Review E. 94 (5): 053306. arXiv: 1605.00660 . Bibcode:2016PhRvE..94e3306L. doi:10.1103/PhysRevE.94.053306. PMID   27967173.
  7. Wiener, Norbert (1964). Extrapolation, interpolation, and smoothing of stationary time series with engineering applications (Fifth printing ed.). Cambridge, Mass.: Technology Press of the Massachusetts Institute of Technology. ISBN   0262730057. OCLC   489911338.
  8. Bertschinger, Edmund (December 1987). "Path integral methods for primordial density perturbations - Sampling of constrained Gaussian random fields". The Astrophysical Journal. 323: L103–L106. Bibcode:1987ApJ...323L.103B. doi: 10.1086/185066 . ISSN   0004-637X.
  9. Bialek, William; Zee, A. (1988-09-26). "Understanding the Efficiency of Human Perception". Physical Review Letters. 61 (13): 1512–1515. Bibcode:1988PhRvL..61.1512B. doi:10.1103/PhysRevLett.61.1512. PMID   10038817.
  10. Lemm, Jörg C. (2003). Bayesian field theory. Baltimore, Md.: Johns Hopkins University Press. ISBN   9780801872204. OCLC   52762436.
  11. Enßlin, Torsten A.; Frommert, Mona; Kitaura, Francisco S. (2009-11-09). "Information field theory for cosmological perturbation reconstruction and nonlinear signal analysis". Physical Review D. 80 (10): 105005. arXiv: 0806.3474 . Bibcode:2009PhRvD..80j5005E. doi:10.1103/PhysRevD.80.105005.