Junction tree algorithm

Last updated
Example of a junction tree Junction-tree-example.gif
Example of a junction tree

The junction tree algorithm (also known as 'Clique Tree') is a method used in machine learning to extract marginalization in general graphs. In essence, it entails performing belief propagation on a modified graph called a junction tree. The graph is called a tree because it branches into different sections of data; nodes of variables are the branches. [1] The basic premise is to eliminate cycles by clustering them into single nodes. Multiple extensive classes of queries can be compiled at the same time into larger structures of data. [1] There are different algorithms to meet specific needs and for what needs to be calculated. Inference algorithms gather new developments in the data and calculate it based on the new information provided. [2]

Contents

Junction tree algorithm

Hugin algorithm

Note that this last step is inefficient for graphs of large treewidth. Computing the messages to pass between supernodes involves doing exact marginalization over the variables in both supernodes. Performing this algorithm for a graph with treewidth k will thus have at least one computation which takes time exponential in k. It is a message passing algorithm. [3] The Hugin algorithm takes fewer computations to find a solution compared to Shafer-Shenoy.

Shafer-Shenoy algorithm

The Shafer-Shenoy algorithm is the sum product of a junction tree. [6] It is used because it runs programs and queries more efficiently than the Hugin algorithm. The algorithm makes calculations for conditionals for belief functions possible. [7] Joint distributions are needed to make local computations happen. [7]

Underlying theory

Example of a Dynamic Bayesian network Hmm temporal bayesian net.svg
Example of a Dynamic Bayesian network

The first step concerns only Bayesian networks, and is a procedure to turn a directed graph into an undirected one. We do this because it allows for the universal applicability of the algorithm, regardless of direction.

The second step is setting variables to their observed value. This is usually needed when we want to calculate conditional probabilities, so we fix the value of the random variables we condition on. Those variables are also said to be clamped to their particular value.

Example of a chordal graph Chordal-graph.svg
Example of a chordal graph

The third step is to ensure that graphs are made chordal if they aren't already chordal. This is the first essential step of the algorithm. It makes use of the following theorem: [8]

Theorem: For an undirected graph, G, the following properties are equivalent:

Thus, by triangulating a graph, we make sure that the corresponding junction tree exists. A usual way to do this, is to decide an elimination order for its nodes, and then run the Variable elimination algorithm. The variable elimination algorithm states that the algorithm must be run each time there is a different query. [1] This will result to adding more edges to the initial graph, in such a way that the output will be a chordal graph. All chordal graphs have a junction tree. [4] The next step is to construct the junction tree. To do so, we use the graph from the previous step, and form its corresponding clique graph. [9] Now the next theorem gives us a way to find a junction tree: [8]

Theorem: Given a triangulated graph, weight the edges of the clique graph by their cardinality, |A∩B|, of the intersection of the adjacent cliques A and B. Then any maximum-weight spanning tree of the clique graph is a junction tree.

So, to construct a junction tree we just have to extract a maximum weight spanning tree out of the clique graph. This can be efficiently done by, for example, modifying Kruskal's algorithm. The last step is to apply belief propagation to the obtained junction tree. [10]

Usage: A junction tree graph is used to visualize the probabilities of the problem. The tree can become a binary tree to form the actual building of the tree. [11] A specific use could be found in auto encoders, which combine the graph and a passing network on a large scale automatically. [12]

Inference Algorithms

Cutset conditioning Cutset-4.svg
Cutset conditioning

Loopy belief propagation: A different method of interpreting complex graphs. The loopy belief propagation is used when an approximate solution is needed instead of the exact solution. [13] It is an approximate inference. [3]

Cutset conditioning: Used with smaller sets of variables. Cutset conditioning allows for simpler graphs that are easier to read but are not exact. [3]

Related Research Articles

Tree decomposition

In graph theory, a tree decomposition is a mapping of a graph into a tree that can be used to define the treewidth of the graph and speed up solving certain computational problems on the graph.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Graphical model Probabilistic model

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

Chordal graph Graph family

In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced cycle in the graph should have exactly three vertices. The chordal graphs may also be characterized as the graphs that have perfect elimination orderings, as the graphs in which each minimal separator is a clique, and as the intersection graphs of subtrees of a tree. They are sometimes also called rigid circuit graphs or triangulated graphs.

Belief propagation, also known as sum-product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory and has demonstrated empirical success in numerous applications including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

Markov random field

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties.

Maximal independent set

In graph theory, a maximal independent set (MIS) or maximal stable set is an independent set that is not a subset of any other independent set. In other words, there is no vertex outside the independent set that may join it because it is maximal with respect to the independent set property.

A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum-product algorithm. One of the important success stories of factor graphs and the sum-product algorithm is the decoding of capacity-approaching error-correcting codes, such as LDPC and turbo codes.

Conditional random field a class of statistical modeling methods

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

Moral graph

In graph theory, a moral graph is used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models.

Probabilistic logic involves the use of probability and logic to deal with uncertain situations. The result is a richer and more expressive formalism with a broad range of possible application areas. Probabilistic logics attempt to find a natural extension of traditional logic truth tables: the results they define are derived through probabilistic expressions instead. A difficulty with probabilistic logics is that they tend to multiply the computational complexities of their probabilistic and logical components. Other difficulties include the possibility of counter-intuitive results, such as those of Dempster–Shafer theory in evidence-based subjective logic. The need to deal with a broad variety of contexts and issues has led to many different proposals.

Valuation-based system (VBS) is a framework for knowledge representation and inference. Real-world problems are modeled in this framework by a network of interrelated entities, called variables. The relationships between variables are represented by the functions called valuations. The two basic operations for performing inference in a VBS are combination and marginalization. Combination corresponds to the aggregation of knowledge, while marginalization refers to the focusing (coarsening) of it. VBSs were introduced by Prakash P. Shenoy in 1989 as general frameworks for managing uncertainty in expert systems.

In graph theory, an area of mathematics, an equitable coloring is an assignment of colors to the vertices of an undirected graph, in such a way that

Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction, and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein-protein interactions, protein-drug interaction, and free energy calculations.

In the study of graph algorithms, Courcelle's theorem is the statement that every graph property definable in the monadic second-order logic of graphs can be decided in linear time on graphs of bounded treewidth. The result was first proved by Bruno Courcelle in 1990 and independently rediscovered by Borie, Parker & Tovey (1992). It is considered the archetype of algorithmic meta-theorems.

Skew partition

In graph theory, a skew partition of a graph is a partition of its vertices into two subsets, such that the induced subgraph formed by one of the two subsets is disconnected and the induced subgraph formed by the other subset is the complement of a disconnected graph. Skew partitions play an important role in the theory of perfect graphs.

Variable elimination (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields. It can be used for inference of maximum a posteriori (MAP) state or estimation of conditional or marginal distributions over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for low-treewidth graphs, if the proper elimination order is used.

In graph theory, a branch of mathematics, a chordal completion of a given undirected graph G is a chordal graph, on the same vertex set, that has G as a subgraph. A minimal chordal completion is a chordal completion such that any graph formed by removing an edge would no longer be a chordal completion. A minimum chordal completion is a chordal completion with as few edges as possible.

In network theory, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.

References

  1. 1 2 3 Paskin, Mark. "A Short Course on Graphical Models" (PDF). Stanford.
  2. "The Inference Algorithm". www.dfki.de. Retrieved 2018-10-25.
  3. 1 2 3 4 "Recap on Graphical Models" (PDF).
  4. 1 2 3 "Algorithms" (PDF). Massachusetts Institute of Technology. 2014.
  5. Roweis, Sam (2004). "Hugin Inference Algorithm" (PDF). NYU.
  6. "Algorithms for Inference" (PDF). Massachusetts Institute of Technology. 2014.
  7. 1 2 Kłopotek, Mieczysław A. (2018-06-06). "Dempsterian-Shaferian Belief Network From Data". arXiv: 1806.02373 [cs.AI].
  8. 1 2 Wainwright, Martin (31 March 2008). "Graphical models, message-passing algorithms, and variational methods: Part I" (PDF). Berkeley EECS. Retrieved 16 November 2016.
  9. "Clique Graph" . Retrieved 16 November 2016.
  10. Barber, David (28 January 2014). "Probabilistic Modelling and Reasoning, The Junction Tree Algorithm" (PDF). University of Helsinki. Retrieved 16 November 2016.
  11. "Fault Diagnosis in an Industrial Process Using Bayesian Networks: Application of the Junction Tree Algorithm - IEEE Conference Publication". doi:10.1109/CERMA.2009.28. S2CID   6068245.{{cite journal}}: Cite journal requires |journal= (help)
  12. Jin, Wengong (Feb 2018). "Junction Tree Variational Autoencoder for Molecular Graph Generation". Cornell University. arXiv: 1802.04364 . Bibcode:2018arXiv180204364J.
  13. CERMA 2009 : proceedings : 2009 Electronics, Robotics and Automotive Mechanics Conference : 22-25 September 2009 : Cuernavaca, Morelos, Mexico. Institute of Electrical and Electronics Engineers. Los Alamitos, Calif.: IEEE Computer Society. 2009. ISBN   9780769537993. OCLC   613519385.{{cite book}}: CS1 maint: others (link)