Influence diagram

Last updated

An influence diagram (ID) (also called a relevance diagram, decision diagram or a decision network) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following the maximum expected utility criterion) can be modeled and solved.

Contents

ID was first developed in the mid-1970s by decision analysts with an intuitive semantic that is easy to understand. It is now adopted widely and becoming an alternative to the decision tree which typically suffers from exponential growth in number of branches with each variable modeled. ID is directly applicable in team decision analysis, since it allows incomplete sharing of information among team members to be modeled and solved explicitly. Extensions of ID also find their use in game theory as an alternative representation of the game tree.

Semantics

An ID is a directed acyclic graph with three types (plus one subtype) of node and three types of arc (or arrow) between nodes.

Nodes:

  • Decision node (corresponding to each decision to be made) is drawn as a rectangle.
  • Uncertainty node (corresponding to each uncertainty to be modeled) is drawn as an oval.
  • Deterministic node (corresponding to special kind of uncertainty that its outcome is deterministically known whenever the outcome of some other uncertainties are also known) is drawn as a double oval.

Arcs:

  • Functional arcs (ending in value node) indicate that one of the components of additively separable utility function is a function of all the nodes at their tails.
  • Conditional arcs (ending in uncertainty node) indicate that the uncertainty at their heads is probabilistically conditioned on all the nodes at their tails.
  • Conditional arcs (ending in deterministic node) indicate that the uncertainty at their heads is deterministically conditioned on all the nodes at their tails.
  • Informational arcs (ending in decision node) indicate that the decision at their heads is made with the outcome of all the nodes at their tails known beforehand.

Given a properly structured ID:

  • Decision nodes and incoming information arcs collectively state the alternatives (what can be done when the outcome of certain decisions and/or uncertainties are known beforehand)
  • Uncertainty/deterministic nodes and incoming conditional arcs collectively model the information (what are known and their probabilistic/deterministic relationships)
  • Value nodes and incoming functional arcs collectively quantify the preference (how things are preferred over one another).

Alternative, information, and preference are termed decision basis in decision analysis, they represent three required components of any valid decision situation.

Formally, the semantic of influence diagram is based on sequential construction of nodes and arcs, which implies a specification of all conditional independencies in the diagram. The specification is defined by the -separation criterion of Bayesian network. According to this semantic, every node is probabilistically independent on its non-successor nodes given the outcome of its immediate predecessor nodes. Likewise, a missing arc between non-value node and non-value node implies that there exists a set of non-value nodes , e.g., the parents of , that renders independent of given the outcome of the nodes in .

Example

Simple influence diagram for making decision about vacation activity Simple Influence Diagram.svg
Simple influence diagram for making decision about vacation activity

Consider the simple influence diagram representing a situation where a decision-maker is planning their vacation.

  • There is 1 decision node (Vacation Activity), 2 uncertainty nodes (Weather Condition, Weather Forecast), and 1 value node (Satisfaction).
  • There are 2 functional arcs (ending in Satisfaction), 1 conditional arc (ending in Weather Forecast), and 1 informational arc (ending in Vacation Activity).
  • Functional arcs ending in Satisfaction indicate that Satisfaction is a utility function of Weather Condition and Vacation Activity. In other words, their satisfaction can be quantified if they know what the weather is like and what their choice of activity is. (Note that they do not value Weather Forecast directly)
  • Conditional arc ending in Weather Forecast indicates their belief that Weather Forecast and Weather Condition can be dependent.
  • Informational arc ending in Vacation Activity indicates that they will only know Weather Forecast, not Weather Condition, when making their choice. In other words, actual weather will be known after they make their choice, and only forecast is what they can count on at this stage.
  • It also follows semantically, for example, that Vacation Activity is independent on (irrelevant to) Weather Condition given Weather Forecast is known.

Applicability to value of information

The above example highlights the power of the influence diagram in representing an extremely important concept in decision analysis known as the value of information. Consider the following three scenarios;

  • Scenario 1: The decision-maker could make their Vacation Activity decision while knowing what Weather Condition will be like. This corresponds to adding extra informational arc from Weather Condition to Vacation Activity in the above influence diagram.
  • Scenario 2: The original influence diagram as shown above.
  • Scenario 3: The decision-maker makes their decision without even knowing the Weather Forecast. This corresponds to removing informational arc from Weather Forecast to Vacation Activity in the above influence diagram.

Scenario 1 is the best possible scenario for this decision situation since there is no longer any uncertainty on what they care about (Weather Condition) when making their decision. Scenario 3, however, is the worst possible scenario for this decision situation since they need to make their decision without any hint (Weather Forecast) on what they care about (Weather Condition) will turn out to be.

The decision-maker is usually better off (definitely no worse off, on average) to move from scenario 3 to scenario 2 through the acquisition of new information. The most they should be willing to pay for such move is called the value of information on Weather Forecast, which is essentially the value of imperfect information on Weather Condition.

The applicability of this simple ID and the value of information concept is tremendous, especially in medical decision making when most decisions have to be made with imperfect information about their patients, diseases, etc.

Influence diagrams are hierarchical and can be defined either in terms of their structure or in greater detail in terms of the functional and numerical relation between diagram elements. An ID that is consistently defined at all levelsstructure, function, and numberis a well-defined mathematical representation and is referred to as a well-formed influence diagram (WFID). WFIDs can be evaluated using reversal and removal operations to yield answers to a large class of probabilistic, inferential, and decision questions. More recent techniques have been developed by artificial intelligence researchers concerning Bayesian network inference (belief propagation).

An influence diagram having only uncertainty nodes (i.e., a Bayesian network) is also called a relevance diagram. An arc connecting node A to B implies not only that "A is relevant to B", but also that "B is relevant to A" (i.e., relevance is a symmetric relationship).

See also

Bibliography

Related Research Articles

A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent Markov process. An HMM requires that there be an observable process whose outcomes depend on the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about state of by observing By definition of being a Markov model, an HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time Estimation of the parameters in an HMM can be performed using maximum likelihood. For linear chain HMMs, the Baum–Welch algorithm can be used to estimate the parameters.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

<span class="mw-page-title-main">Decision tree</span> Decision support tool

A decision tree is a decision support hierarchical model that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

The Markov condition, sometimes called the Markov assumption, is an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket.

Decision analysis (DA) is the discipline comprising the philosophy, methodology, and professional practice necessary to address important decisions in a formal manner. Decision analysis includes many procedures, methods, and tools for identifying, clearly representing, and formally assessing important aspects of a decision; for prescribing a recommended course of action by applying the maximum expected-utility axiom to a well-formed representation of the decision; and for translating the formal representation of a decision and its corresponding recommendation into insight for the decision maker, and other corporate and non-corporate stakeholders.

Value of information is the amount a decision maker would be willing to pay for information prior to making a decision.

<span class="mw-page-title-main">Scoring rule</span> Measure for evaluating probabilistic forecasts

In decision theory, a scoring rule provides are evaluation metrics for probabilistic predictions or forecasts. While "regular" loss functions assign a goodness-of-fit score to a predicted value and an observed value, scoring rules assign such a score to a predicted probability distribution and an observed value. On the other hand, a scoring function provides a summary measure for the evaluation of point predictions, i.e. one predicts a property or functional , like the expectation or the median.

Probabilistic forecasting summarizes what is known about, or opinions about, future events. In contrast to single-valued forecasts, probabilistic forecasts assign a probability to each of a number of different outcomes, and the complete set of probabilities represents a probability forecast. Thus, probabilistic forecasting is a type of probabilistic classification.

<span class="mw-page-title-main">Event chain methodology</span> Network analysis technique

Event chain methodology is a network analysis technique that is focused on identifying and managing events and relationships between them that affect project schedules. It is an uncertainty modeling schedule technique. Event chain methodology is an extension of quantitative project risk analysis with Monte Carlo simulations. It is the next advance beyond critical path method and critical chain project management. Event chain methodology tries to mitigate the effect of motivational and cognitive biases in estimating and scheduling. It improves accuracy of risk assessment and helps to generate more realistic risk adjusted project schedules.

In decision theory, the expected value of sample information (EVSI) is the expected increase in utility that a decision-maker could obtain from gaining access to a sample of additional observations before making a decision. The additional information obtained from the sample may allow them to make a more informed, and thus better, decision, thus resulting in an increase in expected utility. EVSI attempts to estimate what this improvement would be before seeing actual sample data; hence, EVSI is a form of what is known as preposterior analysis. The use of EVSI in decision theory was popularized by Robert Schlaifer and Howard Raiffa in the 1960s.

In Bayesian inference, plate notation is a method of representing variables that repeat in a graphical model. Instead of drawing each repeated variable individually, a plate or rectangle is used to group variables into a subgraph that repeat together, and a number is drawn on the plate to represent the number of repetitions of the subgraph in the plate. The assumptions are that the subgraph is duplicated that many times, the variables in the subgraph are indexed by the repetition number, and any links that cross a plate boundary are replicated once for each subgraph repetition.

An optimal decision is a decision that leads to at least as good a known or expected outcome as all other available decision options. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a utility value to each of them.

In mathematics and computer science, the method of conditional probabilities is a systematic method for converting non-constructive probabilistic existence proofs into efficient deterministic algorithms that explicitly construct the desired object.

<span class="mw-page-title-main">Bayesian inference in marketing</span>

In marketing, Bayesian inference allows for decision making and market research evaluation under uncertainty and with limited data. The communication between marketer and market can be seen as a form of Bayesian persuasion.

In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation should belong to. Probabilistic classifiers provide classification that can be useful in its own right or when combining classifiers into ensembles.

A graphoid is a set of statements of the form, "X is irrelevant to Y given that we know Z" where X, Y and Z are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, including probabilistic, relational and correlational, depending on the application. These interpretations share common properties that can be captured by paths in graphs. The theory of graphoids characterizes these properties in a finite set of axioms that are common to informational irrelevance and its graphical representations.

<span class="mw-page-title-main">Richard Neapolitan</span>

Richard Eugene Neapolitan was an American scientist. Neapolitan is most well-known for his role in establishing the use of probability theory in artificial intelligence and in the development of the field Bayesian networks.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.