Junction tree algorithm

Last updated October 26, 2024

Example of a junction tree Junction-tree-example.gif — Example of a junction tree

The junction tree algorithm (also known as 'Clique Tree') is a method used in machine learning to extract marginalization in general graphs. In essence, it entails performing belief propagation on a modified graph called a junction tree. The graph is called a tree because it branches into different sections of data; nodes of variables are the branches.^[1] The basic premise is to eliminate cycles by clustering them into single nodes. Multiple extensive classes of queries can be compiled at the same time into larger structures of data.^[1] There are different algorithms to meet specific needs and for what needs to be calculated. Inference algorithms gather new developments in the data and calculate it based on the new information provided.^[2]

Junction tree algorithm

Hugin algorithm

If the graph is directed then moralize it to make it un-directed.
Introduce the evidence.
Triangulate the graph to make it chordal.
Construct a junction tree from the triangulated graph (we will call the vertices of the junction tree "supernodes").
Propagate the probabilities along the junction tree (via belief propagation)

Note that this last step is inefficient for graphs of large treewidth. Computing the messages to pass between supernodes involves doing exact marginalization over the variables in both supernodes. Performing this algorithm for a graph with treewidth k will thus have at least one computation which takes time exponential in k. It is a message passing algorithm.^[3] The Hugin algorithm takes fewer computations to find a solution compared to Shafer-Shenoy.

Shafer-Shenoy algorithm

Computed recursively^[3]
Multiple recursions of the Shafer-Shenoy algorithm results in Hugin algorithm^[4]
Found by the message passing equation^[4]
Separator potentials are not stored^[5]

The Shafer-Shenoy algorithm is the sum product of a junction tree.^[6] It is used because it runs programs and queries more efficiently than the Hugin algorithm. The algorithm makes calculations for conditionals for belief functions possible.^[7] Joint distributions are needed to make local computations happen.^[7]

Underlying theory

Example of a Dynamic Bayesian network

The first step concerns only Bayesian networks, and is a procedure to turn a directed graph into an undirected one. We do this because it allows for the universal applicability of the algorithm, regardless of direction.

The second step is setting variables to their observed value. This is usually needed when we want to calculate conditional probabilities, so we fix the value of the random variables we condition on. Those variables are also said to be clamped to their particular value.

Example of a chordal graph Chordal-graph.svg — Example of a chordal graph

The third step is to ensure that graphs are made chordal if they aren't already chordal. This is the first essential step of the algorithm. It makes use of the following theorem:^[8]

Theorem: For an undirected graph, G, the following properties are equivalent:

Graph G is triangulated.
The clique graph of G has a junction tree.
There is an elimination ordering for G that does not lead to any added edges.

Thus, by triangulating a graph, we make sure that the corresponding junction tree exists. A usual way to do this, is to decide an elimination order for its nodes, and then run the Variable elimination algorithm. The variable elimination algorithm states that the algorithm must be run each time there is a different query.^[1] This will result to adding more edges to the initial graph, in such a way that the output will be a chordal graph. All chordal graphs have a junction tree.^[4] The next step is to construct the junction tree. To do so, we use the graph from the previous step, and form its corresponding clique graph.^[9] Now the next theorem gives us a way to find a junction tree:^[8]

Theorem: Given a triangulated graph, weight the edges of the clique graph by their cardinality, |A∩B|, of the intersection of the adjacent cliques A and B. Then any maximum-weight spanning tree of the clique graph is a junction tree.

So, to construct a junction tree we just have to extract a maximum weight spanning tree out of the clique graph. This can be efficiently done by, for example, modifying Kruskal's algorithm. The last step is to apply belief propagation to the obtained junction tree.^[10]

Usage: A junction tree graph is used to visualize the probabilities of the problem. The tree can become a binary tree to form the actual building of the tree.^[11] A specific use could be found in auto encoders, which combine the graph and a passing network on a large scale automatically.^[12]

Inference Algorithms

Cutset conditioning Cutset-4.svg — Cutset conditioning

Loopy belief propagation: A different method of interpreting complex graphs. The loopy belief propagation is used when an approximate solution is needed instead of the exact solution.^[13] It is an approximate inference.^[3]

Cutset conditioning: Used with smaller sets of variables. Cutset conditioning allows for simpler graphs that are easier to read but are not exact.^[3]

Related Research Articles

In graph theory, a tree decomposition is a mapping of a graph into a tree that can be used to define the treewidth of the graph and speed up solving certain computational problems on the graph.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.

<span class="mw-page-title-main">Perfect graph</span> Graph with tight clique-coloring relation

In graph theory, a perfect graph is a graph in which the chromatic number equals the size of the maximum clique, both in the graph itself and in every induced subgraph. In all graphs, the chromatic number is greater than or equal to the size of the maximum clique, but they can be far apart. A graph is perfect when these numbers are equal, and remain equal after the deletion of arbitrary subsets of vertices.

<span class="mw-page-title-main">Chordal graph</span> Graph where all long cycles have a chord

In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced cycle in the graph should have exactly three vertices. The chordal graphs may also be characterized as the graphs that have perfect elimination orderings, as the graphs in which each minimal separator is a clique, and as the intersection graphs of subtrees of a tree. They are sometimes also called rigid circuit graphs or triangulated graphs: a chordal completion of a graph is typically called a triangulation of that graph.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

<span class="mw-page-title-main">Markov random field</span> Set of random variables

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum–product algorithm. One of the important success stories of factor graphs and the sum–product algorithm is the decoding of capacity-approaching error-correcting codes, such as LDPC and turbo codes.

In graph theory, the treewidth of an undirected graph is an integer number which specifies, informally, how far the graph is from being a tree. The smallest treewidth is 1; the graphs with treewidth 1 are exactly the trees and the forests. The graphs with treewidth at most 2 are the series–parallel graphs. The maximal graphs with treewidth exactly $k$ are called $k$ -trees, and the graphs with treewidth at most $k$ are called partial $k$ -trees. Many other well-studied graph families also have bounded treewidth.

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

In constraint satisfaction, a decomposition method translates a constraint satisfaction problem into another constraint satisfaction problem that is binary and acyclic. Decomposition methods work by grouping variables into sets, and solving a subproblem for each set. These translations are done because solving binary acyclic problems is a tractable problem.

<span class="mw-page-title-main">Moral graph</span>

In graph theory, a moral graph is used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models.

In graph theory, an area of mathematics, an equitable coloring is an assignment of colors to the vertices of an undirected graph, in such a way that

Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction, and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein-protein interactions, protein-drug interaction, and free energy calculations.

In the study of graph algorithms, Courcelle's theorem is the statement that every graph property definable in the monadic second-order logic of graphs can be decided in linear time on graphs of bounded treewidth. The result was first proved by Bruno Courcelle in 1990 and independently rediscovered by Borie, Parker & Tovey (1992). It is considered the archetype of algorithmic meta-theorems.

In the mathematical fields of graph theory and finite model theory, the logic of graphs deals with formal specifications of graph properties using sentences of mathematical logic. There are several variations in the types of logical operation that can be used in these sentences. The first-order logic of graphs concerns sentences in which the variables and predicates concern individual vertices and edges of a graph, while monadic second-order graph logic allows quantification over sets of vertices or edges. Logics based on least fixed point operators allow more general predicates over tuples of vertices, but these predicates can only be constructed through fixed-point operators, restricting their power.

In graph theory, a branch of mathematics, a chordal completion of a given undirected graph $G$ is a chordal graph, on the same vertex set, that has $G$ as a subgraph. A minimal chordal completion is a chordal completion such that any graph formed by removing an edge would no longer be a chordal completion. A minimum chordal completion is a chordal completion with as few edges as possible.

References

1 2 3 Paskin, Mark. "A Short Course on Graphical Models" (PDF). Stanford.
↑ "The Inference Algorithm". www.dfki.de. Retrieved 2018-10-25.
1 2 3 4 "Recap on Graphical Models" (PDF).
1 2 3 "Algorithms" (PDF). Massachusetts Institute of Technology. 2014.
↑ Roweis, Sam (2004). "Hugin Inference Algorithm" (PDF). NYU.
↑ "Algorithms for Inference" (PDF). Massachusetts Institute of Technology. 2014.
1 2 Kłopotek, Mieczysław A. (2018-06-06). "Dempsterian-Shaferian Belief Network From Data". arXiv: 1806.02373 [cs.AI].
1 2 Wainwright, Martin (31 March 2008). "Graphical models, message-passing algorithms, and variational methods: Part I" (PDF). Berkeley EECS. Retrieved 16 November 2016.
↑ "Clique Graph" . Retrieved 16 November 2016.
↑ Barber, David (28 January 2014). "Probabilistic Modelling and Reasoning, The Junction Tree Algorithm" (PDF). University of Helsinki. Retrieved 16 November 2016.
↑ Ramirez, Julio C.; Munoz, Guillermina; Gutierrez, Ludivina (September 2009). "Fault Diagnosis in an Industrial Process Using Bayesian Networks: Application of the Junction Tree Algorithm". 2009 Electronics, Robotics and Automotive Mechanics Conference (CERMA). IEEE. pp. 301–306. doi:10.1109/cerma.2009.28. ISBN 978-0-7695-3799-3.
↑ Jin, Wengong (Feb 2018). "Junction Tree Variational Autoencoder for Molecular Graph Generation". Cornell University. arXiv: 1802.04364 . Bibcode:2018arXiv180204364J.
↑ CERMA 2009 : proceedings : 2009 Electronics, Robotics and Automotive Mechanics Conference : 22-25 September 2009 : Cuernavaca, Morelos, Mexico. Institute of Electrical and Electronics Engineers. Los Alamitos, Calif.: IEEE Computer Society. 2009. ISBN 9780769537993. OCLC 613519385.{{cite book}}: CS1 maint: others (link)

Junction tree algorithm

Contents