Factor graph

Last updated

A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum–product algorithm. One of the important success stories of factor graphs and the sum–product algorithm is the decoding of capacity-approaching error-correcting codes, such as LDPC and turbo codes.

Contents

Factor graphs generalize constraint graphs. A factor whose value is either 0 or 1 is called a constraint. A constraint graph is a factor graph where all factors are constraints. The max-product algorithm for factor graphs can be viewed as a generalization of the arc-consistency algorithm for constraint processing.

Definition

A factor graph is a bipartite graph representing the factorization of a function. Given a factorization of a function ,

where , the corresponding factor graph consists of variable vertices , factor vertices , and edges . The edges depend on the factorization as follows: there is an undirected edge between factor vertex and variable vertex if . The function is tacitly assumed to be real-valued: .

Factor graphs can be combined with message passing algorithms to efficiently compute certain characteristics of the function , such as the marginal distributions.

Examples

An example factor graph Factorgraph.jpg
An example factor graph

Consider a function that factorizes as follows:

,

with a corresponding factor graph shown on the right. Observe that the factor graph has a cycle. If we merge into a single factor, the resulting factor graph will be a tree. This is an important distinction, as message passing algorithms are usually exact for trees, but only approximate for graphs with cycles.

Message passing on factor graphs

A popular message passing algorithm on factor graphs is the sum–product algorithm, which efficiently computes all the marginals of the individual variables of the function. In particular, the marginal of variable is defined as

where the notation means that the summation goes over all the variables, except. The messages of the sum–product algorithm are conceptually computed in the vertices and passed along the edges. A message from or to a variable vertex is always a function of that particular variable. For instance, when a variable is binary, the messages over the edges incident to the corresponding vertex can be represented as vectors of length 2: the first entry is the message evaluated in 0, the second entry is the message evaluated in 1. When a variable belongs to the field of real numbers, messages can be arbitrary functions, and special care needs to be taken in their representation.

In practice, the sum–product algorithm is used for statistical inference, whereby is a joint distribution or a joint likelihood function, and the factorization depends on the conditional independencies among the variables.

The Hammersley–Clifford theorem shows that other probabilistic models such as Bayesian networks and Markov networks can be represented as factor graphs; the latter representation is frequently used when performing inference over such networks using belief propagation. On the other hand, Bayesian networks are more naturally suited for generative models, as they can directly represent the causalities of the model.

See also

Related Research Articles

In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It is closely related to the concepts of an ancillary statistic which contains no information about the model parameters, and of a complete statistic which only contains information about the parameters and no ancillary information.

<span class="mw-page-title-main">Tree decomposition</span> Mapping of a graph into a tree

In graph theory, a tree decomposition is a mapping of a graph into a tree that can be used to define the treewidth of the graph and speed up solving certain computational problems on the graph.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

<span class="mw-page-title-main">Graph coloring</span> Methodic assignment of colors to elements of a graph

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

In the mathematical field of graph theory, Kirchhoff's theorem or Kirchhoff's matrix tree theorem named after Gustav Kirchhoff is a theorem about the number of spanning trees in a graph, showing that this number can be computed in polynomial time from the determinant of a submatrix of the graph's Laplacian matrix; specifically, the number is equal to any cofactor of the Laplacian matrix. Kirchhoff's theorem is a generalization of Cayley's formula which provides the number of spanning trees in a complete graph.

<span class="mw-page-title-main">Markov random field</span> Set of random variables

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

<span class="mw-page-title-main">Cartesian product of graphs</span> Operation in graph theory

In graph theory, the Cartesian productGH of graphs G and H is a graph such that:

<span class="mw-page-title-main">Estimation of distribution algorithm</span> Family of stochastic optimization methods

Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

Distributed constraint optimization is the distributed analogue to constraint optimization. A DCOP is a problem in which a group of agents must distributedly choose values for a set of variables such that the cost of a set of constraints over the variables is minimized.

In mathematics and physics, a quantum graph is a linear, network-shaped structure of vertices connected on edges in which each edge is given a length and where a differential equation is posed on each edge. An example would be a power network consisting of power lines (edges) connected at transformer stations (vertices); the differential equations would then describe the voltage along each of the lines, with boundary conditions for each edge provided at the adjacent vertices ensuring that the current added over all edges adds to zero at each vertex.

The Hammersley–Clifford theorem is a result in probability theory, mathematical statistics and statistical mechanics that gives necessary and sufficient conditions under which a strictly positive probability distribution can be represented as events generated by a Markov network. It is the fundamental theorem of random fields. It states that a probability distribution that has a strictly positive mass or density satisfies one of the Markov properties with respect to an undirected graph G if and only if it is a Gibbs random field, that is, its density can be factorized over the cliques of the graph.

In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

Variable elimination (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields. It can be used for inference of maximum a posteriori (MAP) state or estimation of conditional or marginal distributions over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for low-treewidth graphs, if the proper elimination order is used.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

The blow-up lemma, proved by János Komlós, Gábor N. Sárközy, and Endre Szemerédi in 1997, is an important result in extremal graph theory, particularly within the context of the regularity method. It states that the regular pairs in the statement of Szemerédi's regularity lemma behave like complete bipartite graphs in the context of embedding spanning graphs of bounded degree.

References