Part of a series on |

Machine learning and data mining |
---|

This article includes a list of general references, but it remains largely unverified because it lacks sufficient corresponding inline citations .(May 2017) |

A **graphical model** or **probabilistic graphical model** (**PGM**) or **structured probabilistic model** is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

Generally, probabilistic graphical models use a graph-based representation as the foundation for encoding a distribution over a multi-dimensional space and a graph that is a compact or factorized representation of a set of independences that hold in the specific distribution. Two branches of graphical representations of distributions are commonly used, namely, Bayesian networks and Markov random fields. Both families encompass the properties of factorization and independences, but they differ in the set of independences they can encode and the factorization of the distribution that they induce.^{ [1] }

The undirected graph shown may have one of several interpretations; the common feature is that the presence of an edge implies some sort of dependence between the corresponding random variables. From this graph we might deduce that are all mutually independent, once is known, or (equivalently in this case) that

for some non-negative functions .

If the network structure of the model is a directed acyclic graph, the model represents a factorization of the joint probability of all random variables. More precisely, if the events are then the joint probability satisfies

where is the set of parents of node (nodes with edges directed towards ). In other words, the joint distribution factors into a product of conditional distributions. For example, the directed acyclic graph shown in the Figure this factorization would be

- .

Any two nodes are conditionally independent given the values of their parents. In general, any two sets of nodes are conditionally independent given a third set if a criterion called *d*-separation holds in the graph. Local independences and global independences are equivalent in Bayesian networks.

This type of graphical model is known as a directed graphical model, Bayesian network, or belief network. Classic machine learning models like hidden Markov models, neural networks and newer models such as variable-order Markov models can be considered special cases of Bayesian networks.

The next figure depicts a graphical model with a cycle. This may be interpreted in terms of each variable 'depending' on the values of its parents in some manner. The particular graph shown suggests a joint probability density that factors as

- ,

but other interpretations are possible. ^{ [2] }

- Naive Bayes classifier where we use a tree with a single root
- Dependency network where cycles are allowed
- Tree-augmented classifier or
**TAN model** - A factor graph is an undirected bipartite graph connecting variables and factors. Each factor represents a function over the variables it is connected to. This is a helpful representation for understanding and implementing belief propagation.
- A clique tree or junction tree is a tree of cliques, used in the junction tree algorithm.
- A chain graph is a graph which may have both directed and undirected edges, but without any directed cycles (i.e. if we start at any vertex and move along the graph respecting the directions of any arrows, we cannot return to the vertex we started from if we have passed an arrow). Both directed acyclic graphs and undirected graphs are special cases of chain graphs, which can therefore provide a way of unifying and generalizing Bayesian and Markov networks.
^{ [3] } - An ancestral graph is a further extension, having directed, bidirected and undirected edges.
^{ [4] } - Random field techniques
- A Markov random field, also known as a Markov network, is a model over an undirected graph. A graphical model with many repeated subunits can be represented with plate notation.
- A conditional random field is a discriminative model specified over an undirected graph.

- A restricted Boltzmann machine is a bipartite generative model specified over an undirected graph.

The framework of the models, which provides algorithms for discovering and analyzing structure in complex distributions to describe them succinctly and extract the unstructured information, allows them to be constructed and utilized effectively.^{ [1] } Applications of graphical models include causal inference, information extraction, speech recognition, computer vision, decoding of low-density parity-check codes, modeling of gene regulatory networks, gene finding and diagnosis of diseases, and graphical models for protein structure.

- 1 2 Koller, D.; Friedman, N. (2009).
*Probabilistic Graphical Models*. Massachusetts: MIT Press. p. 1208. ISBN 978-0-262-01319-2. Archived from the original on 2014-04-27. - ↑ Richardson, Thomas (1996). "A discovery algorithm for directed cyclic graphs".
*Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence*. ISBN 978-1-55860-412-4. - ↑ Frydenberg, Morten (1990). "The Chain Graph Markov Property".
*Scandinavian Journal of Statistics*.**17**(4): 333–353. JSTOR 4616181. MR 1096723. - ↑ Richardson, Thomas; Spirtes, Peter (2002). "Ancestral graph Markov models".
*Annals of Statistics*.**30**(4): 962–1030. CiteSeerX 10.1.1.33.4906 . doi:10.1214/aos/1031689015. MR 1926166. Zbl 1033.60008.

- Barber, David (2012).
*Bayesian Reasoning and Machine Learning*. Cambridge University Press. ISBN 978-0-521-51814-7.

- Bishop, Christopher M. (2006). "Chapter 8. Graphical Models" (PDF).
*Pattern Recognition and Machine Learning*. Springer. pp. 359–422. ISBN 978-0-387-31073-2. MR 2247587. - Cowell, Robert G.; Dawid, A. Philip; Lauritzen, Steffen L.; Spiegelhalter, David J. (1999).
*Probabilistic networks and expert systems*. Berlin: Springer. ISBN 978-0-387-98767-5. MR 1697175. A more advanced and statistically oriented book - Jensen, Finn (1996).
*An introduction to Bayesian networks*. Berlin: Springer. ISBN 978-0-387-91502-9. - Pearl, Judea (1988).
*Probabilistic Reasoning in Intelligent Systems*(2nd revised ed.). San Mateo, CA: Morgan Kaufmann. ISBN 978-1-55860-479-7. MR 0965765. A computational reasoning approach, where the relationships between graphs and probabilities were formally introduced.

- Edoardo M. Airoldi (2007). "Getting Started in Probabilistic Graphical Models".
*PLOS Computational Biology*.**3**(12): e252. arXiv: 0706.2040 . Bibcode:2007PLSCB...3..252A. doi:10.1371/journal.pcbi.0030252. PMC 2134967 . PMID 18069887. - Jordan, M. I. (2004). "Graphical Models".
*Statistical Science*.**19**: 140–155. doi: 10.1214/088342304000000026 . - Ghahramani, Zoubin (May 2015). "Probabilistic machine learning and artificial intelligence".
*Nature*.**521**(7553): 452–459. Bibcode:2015Natur.521..452G. doi: 10.1038/nature14541 . PMID 26017444. S2CID 216356.

**Hidden Markov Model** (**HMM**) is a statistical Markov model in which the system being modeled is assumed to be a Markov process – call it – with unobservable ("*hidden*") states. HMM assumes that there is another process whose behavior "depends" on . The goal is to learn about by observing . HMM stipulates that, for each time instance , the conditional probability distribution of given the history must **not** depend on .

A **Bayesian network** is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

In statistics, **Gibbs sampling** or a **Gibbs sampler** is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

**Belief propagation**, also known as **sum-product message passing**, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory and has demonstrated empirical success in numerous applications including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

The **Markov condition**, sometimes called the **Markov assumption**, is an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendents, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket.

In the domain of physics and probability, a **Markov random field** (**MRF**), **Markov network** or **undirected graphical model** is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties.

A **factor graph** is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum-product algorithm. One of the important success stories of factor graphs and the sum-product algorithm is the decoding of capacity-approaching error-correcting codes, such as LDPC and turbo codes.

* Estimation of distribution algorithms* (

**Conditional random fields** (**CRFs**) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighboring" samples, a CRF can take context into account. To do so, the prediction is modeled as a graphical model, which implements dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, linear chain CRFs are popular, which implement sequential dependencies in the predictions. In image processing the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

In graph theory, a **moral graph** is used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models.

In the mathematical theory of stochastic processes, **variable-order Markov (VOM) models** are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization.

This page lists articles related to probability theory. In particular, it lists many articles corresponding to specific probability distributions. Such articles are marked here by a code of the form (X:Y), which refers to number of random variables involved and the type of the distribution. For example (2:DC) indicates a distribution with two random variables, discrete or continuous. Other codes are just abbreviations for topics. The list of codes can be found in the table of contents.

The **Hammersley–Clifford theorem** is a result in probability theory, mathematical statistics and statistical mechanics that gives necessary and sufficient conditions under which a strictly positive probability distribution can be represented as events generated by a Markov network. It is the **fundamental theorem of random fields**. It states that a probability distribution that has a strictly positive mass or density satisfies one of the Markov properties with respect to an undirected graph *G* if and only if it is a Gibbs random field, that is, its density can be factorized over the cliques of the graph.

Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction, and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein-protein interactions, protein-drug interaction, and free energy calculations.

**Variable elimination** (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields. It can be used for inference of maximum a posteriori (MAP) state or estimation of conditional or marginal distributions over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for the low-treewidth graphs, if the proper elimination order is used.

**Bayesian programming** is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available.

A **graphoid** is a set of statements of the form, "*X* is irrelevant to *Y* given that we know *Z*" where *X*, *Y* and *Z* are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, including probabilistic, relational and correlational, depending on the application. These interpretations share common properties that can be captured by paths in graphs. The theory of graphoids characterizes these properties in a finite set of axioms that are common to informational irrelevance and its graphical representations.

**Dependency networks (DNs)** are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents.

In network theory, **link prediction** is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a social network, predicting co-authorship links in a citation network, and predicting interactions between genes and proteins in a biological network. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time , the goal is to predict the links at time . Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.

In network theory, **collective classification** is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.