Relational dependency network

Last updated

Relational dependency networks (RDNs) are graphical models which extend dependency networks to account for relational data. Relational data is data organized into one or more tables, which are cross-related through standard fields. A relational database is a canonical example of a system that serves to maintain relational data. A relational dependency network can be used to characterize the knowledge contained in a database.

Contents

Introduction

Relational Dependency Networks (or RDNs) aims to get the joint probability distribution over the variables of a dataset represented in the relational domain. They are based on Dependency Networks (or DNs) and extend them to the relational setting. RDNs have efficient learning methods where an RDN can learn the parameters independently, with the conditional probability distributions estimated separately. Since there may be some inconsistencies due to the independent learning method, RDNs use Gibbs sampling to recover joint distribution, like DNs.

Unlike Dependency Networks, RDNs need three graphs to fully represent them.

In other words, the data graph guides how the model graph will be rolled out to generate the inference graph.

RDN Learning

The learning methods of an RDN are similar to that employed by a DNs. i.e., all conditional probability distributions can be learned for each of the variables independently. However, only conditional relational learners can be used during the parameter estimation process for RDNs. Therefore, the learners used by DNs, like decision trees or logistic regression, do not work for RDNs.

Neville, J., & Jensen, D. (2007) [1] conducted some experiments comparing RDNs when learning with Relational Bayesian Classifiers and RDNs when learning with Relational Probability Trees. Natarajan et al. (2012) [2] used a series of regression models to represent conditional distributions.

This learning method makes the RDN a model with an efficient learning time. However, this method also makes RDNs susceptible to some structural or numerical inconsistencies. If the conditional probability distribution estimation method uses feature selection, it is possible that a given variable finds a dependency between itself and another variable while the latter doesn't find this dependency. In this case, the RDN is structurally inconsistent. In addition, if the joint distribution doesn't sum to one owing to the approximations caused by the independent learning, then it is called a numerical inconsistency. Such inconsistencies can, however, be bypassed during the inference step.

RDN Inference

RDN inference begins with the creation of an inference graph through a process called roll out. In this process, the model graph is rolled out over the data graph to form the inference graph. Next, Gibbs sampling technique can be used to recover a conditional probability distribution.

Applications

RDNs have been applied in many real-world domains. The main advantages of RDNs are their ability to use relationship information to improve the model's performance. Diagnosis, forecasting, automated vision, sensor fusion and manufacturing control are some examples of problems where RDNs were applied.

Implementations

Some suggestions of RDN implementations:

Related Research Articles

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

<span class="mw-page-title-main">Unsupervised learning</span> Machine learning task

Unsupervised learning is a type of algorithm that learns patterns from untagged data. The goal is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and then generate imaginative content from it.

<span class="mw-page-title-main">Graphical model</span> Probabilistic model

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

<span class="mw-page-title-main">Decision tree learning</span> Machine learning algorithm

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

<span class="mw-page-title-main">Markov random field</span>

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, enabling uncertain inference. Markov logic networks generalize first-order logic, in the sense that, in a certain limit, all unsatisfiable statements have a probability of zero, and all tautologies have probability one.

<span class="mw-page-title-main">Conditional random field</span> Class of statistical modeling methods

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

<span class="mw-page-title-main">Junction tree algorithm</span>

The junction tree algorithm is a method used in machine learning to extract marginalization in general graphs. In essence, it entails performing belief propagation on a modified graph called a junction tree. The graph is called a tree because it branches into different sections of data; nodes of variables are the branches. The basic premise is to eliminate cycles by clustering them into single nodes. Multiple extensive classes of queries can be compiled at the same time into larger structures of data. There are different algorithms to meet specific needs and for what needs to be calculated. Inference algorithms gather new developments in the data and calculate it based on the new information provided.

Discriminative models, also referred to as conditional models, are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/sick.

Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both uncertainty and complex, relational structure. Note that SRL is sometimes called Relational Machine Learning (RML) in the literature. Typically, the knowledge representation formalisms developed in SRL use first-order logic to describe relational properties of a domain in a general manner and draw upon probabilistic graphical models to model the uncertainty; some also build upon the methods of inductive logic programming. Significant contributions to the field have been made since the late 1990s.

Information fuzzy networks (IFN) is a greedy machine learning algorithm for supervised learning. The data structure produced by the learning algorithm is also called Info Fuzzy Network. IFN construction is quite similar to decision trees' construction. However, IFN constructs a directed graph and not a tree. IFN also uses the conditional mutual information metric in order to choose features during the construction stage while decision trees usually use other metrics like entropy or gini.

Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction, and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein-protein interactions, protein-drug interaction, and free energy calculations.

The following is provided as an overview of and topical guide to databases:

A graphoid is a set of statements of the form, "X is irrelevant to Y given that we know Z" where X, Y and Z are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, including probabilistic, relational and correlational, depending on the application. These interpretations share common properties that can be captured by paths in graphs. The theory of graphoids characterizes these properties in a finite set of axioms that are common to informational irrelevance and its graphical representations.

Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents.

An energy-based model (EBM) is a form of generative model (GM) imported directly from statistical physics to learning. GMs learn an underlying data distribution by analyzing a sample dataset. Once trained, a GM can produce other datasets that also match the data distribution. EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models.

<span class="mw-page-title-main">Autologistic actor attribute models</span>

Autologistic actor attribute models (ALAAMs) are a family of statistical models used to model the occurrence of node attributes in network data. They are frequently used with social network data to model social influence, the process by which connections in a social network influence the outcomes experienced by nodes. The dependent variable can strictly be binary. However, they may be applied to any type of network data that incorporates binary, ordinal or continuous node attributes as dependent variables.

In network theory, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a social network, predicting co-authorship links in a citation network, and predicting interactions between genes and proteins in a biological network. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time , the goal is to predict the links at time . Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.

In network theory, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors. Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.

References

  1. Neville, Jennifer; Jensen, David (2007). "Relational Dependency Networks" (PDF). Journal of Machine Learning Research. 8: 653–692. Retrieved 9 February 2020.
  2. 1 2 Natarajan, Sriraam; Khot, Tushar; Kersting, Kristian; Gutmann, Bernd; Shavlik, Jude (10 May 2011). "Gradient-based boosting for statistical relational learning: The relational dependency network case" (PDF). Machine Learning. 86 (1): 25–56. doi: 10.1007/s10994-011-5244-9 . Retrieved 9 February 2020.
  3. Lab, StARLinG. "BoostSRL Wiki". StARLinG. Retrieved 9 February 2020.