Expected value of sample information

Last updated

In decision theory, the expected value of sample information (EVSI) is the expected increase in utility that a decision-maker could obtain from gaining access to a sample of additional observations before making a decision. The additional information obtained from the sample may allow them to make a more informed, and thus better, decision, thus resulting in an increase in expected utility. EVSI attempts to estimate what this improvement would be before seeing actual sample data; hence, EVSI is a form of what is known as preposterior analysis. The use of EVSI in decision theory was popularized by Robert Schlaifer and Howard Raiffa in the 1960s. [1]

Contents

Formulation

Let

It is common (but not essential) in EVSI scenarios for , and , which is to say that each observation is an unbiased sensor reading of the underlying state , with each sensor reading being independent and identically distributed.

The utility from the optimal decision based only on the prior, without making any further observations, is given by

If the decision-maker could gain access to a single sample, , the optimal posterior utility would be

where is obtained from Bayes' rule:

Since they don't know what sample would actually be obtained if one were obtained, they must average over all possible samples to obtain the expected utility given a sample:

The expected value of sample information is then defined as

Computation

It is seldom feasible to carry out the integration over the space of possible observations in E[U|SI] analytically, so the computation of EVSI usually requires a Monte Carlo simulation. The method involves randomly simulating a sample, , then using it to compute the posterior and maximizing utility based on . This whole process is then repeated many times, for to obtain a Monte Carlo sample of optimal utilities. These are averaged to obtain the expected utility given a hypothetical sample.

Example

A regulatory agency is to decide whether to approve a new treatment. Before making the final approve/reject decision, they ask what the value would be of conducting a further trial study on subjects. This question is answered by the EVSI.

diagram of EVSI model EVSI diagram.png
diagram of EVSI model

The diagram shows an influence diagram for computing the EVSI in this example.

The model classifies the outcome for any given subject into one of five categories:

{"Cure", "Improvement", "Ineffective", "Mild side-effect", "Serious side-effect"}

And for each of these outcomes, assigns a utility equal to an estimated patient-equivalent monetary value of the outcome.

A decision state, in this example is a vector of five numbers between 0 and 1 that sum to 1, giving the proportion of future patients that will experience each of the five possible outcomes. For example, a state denotes the case where 5% of patients are cured, 60% improve, 20% find the treatment ineffective, 10% experience mild side-effects and 5% experience dangerous side-effects.

The prior, is encoded using a Dirichlet distribution, requiring five numbers (that don't sum to 1) whose relative values capture the expected relative proportion of each outcome, and whose sum encodes the strength of this prior belief. In the diagram, the parameters of the Dirichlet distribution are contained in the variable dirichlet alpha prior, while the prior distribution itself is in the chance variable Prior. The probability density graph of the marginals is shown here:

EVSI prior marginals.png

In the chance variable Trial data, trial data is simulated as a Monte Carlo sample from a Multinomial distribution. For example, when Trial_size=100, each Monte Carlo sample of Trial_data contains a vector that sums to 100 showing the number of subjects in the simulated study that experienced each of the five possible outcomes. The following result table depicts the first 8 simulated trial outcomes:

EVSI trial data.png

Combining this trial data with a Dirichlet prior requires only adding the outcome frequencies to the Dirichlet prior alpha values, resulting in a Dirichlet posterior distribution for each simulated trial. For each of these, the decision to approve is made based on whether the mean utility is positive, and using a utility of zero when the treatment is not approved, the Pre-posterior utility is obtained. Repeating the computation for a range of possible trial sizes, an EVSI is obtained at each possible candidate trial size as depicted in this graph:

EVSI result.png

Expected value of sample information (EVSI) is a relaxation of the expected value of perfect information (EVPI) metric, which encodes the increase of utility that would be obtained if one were to learn the true underlying state, . Essentially EVPI indicates the value of perfect information, while EVSI indicates the value of some limited and incomplete information.

The expected value of including uncertainty (EVIU) compares the value of modeling uncertain information as compared to modeling a situation without taking uncertainty into account. Since the impact of uncertainty on computed results is often analysed using Monte Carlo methods, EVIU appears to be very similar to the value of carrying out an analysis using a Monte Carlo sample , which closely resembles in statement the notion captured with EVSI. However, EVSI and EVIU are quite distinct—a notable difference between the manner in which EVSI uses Bayesian updating to incorporate the simulated sample.

See also

Related Research Articles

<span class="mw-page-title-main">Probability density function</span> Function whose integral over a region describes the probability of an event occurring in that region

In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.

A prior probability distribution of an uncertain quantity, often simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of N, its subsequence x1, ..., xN has a low discrepancy.

In the field of mathematical optimization, stochastic programming is a framework for modeling optimization problems that involve uncertainty. A stochastic program is an optimization problem in which some or all problem parameters are uncertain, but follow known probability distributions. This framework contrasts with deterministic optimization, in which all problem parameters are assumed to be known exactly. The goal of stochastic programming is to find a decision which both optimizes some criteria chosen by the decision maker, and appropriately accounts for the uncertainty of the problem parameters. Because many real-world decisions involve uncertainty, stochastic programming has found applications in a broad range of areas ranging from finance to transportation to energy optimization.

The expected utility hypothesis is a popular concept in economics that serves as a reference guide for decisions when the payoff is uncertain. The theory recommends which option rational individuals should choose in a complex situation, based on their risk appetite and preferences.

Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper by Teun Kloek and Herman K. van Dijk in 1978, but its precursors can be found in statistical physics as early as 1949. Importance sampling is also related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.

Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment. This allows accounting for both any prior knowledge on the parameters to be determined as well as uncertainties in observations.

In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information. A common discipline that uses the EVPI concept is health economics. In that context and when looking at a decision of whether to adopt a new treatment technology, there is always some degree of uncertainty surrounding the decision, because there is always a chance that the decision turns out to be wrong. The expected value of perfect information analysis tries to measure the expected cost of that uncertainty, which “can be interpreted as the expected value of perfect information (EVPI), since perfect information can eliminate the possibility of making the wrong decision” at least from a theoretical perspective.

Stochastic dominance is a partial order between random variables. It is a form of stochastic ordering. The concept arises in decision theory and decision analysis in situations where one gamble can be ranked as superior to another gamble for a broad class of decision-makers. It is based on shared preferences regarding sets of possible outcomes and their associated probabilities. Only limited knowledge of preferences is required for determining dominance. Risk aversion is a factor only in second order stochastic dominance.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution. It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n. The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.

In probability theory and statistics, a categorical distribution is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution,. The K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.

<span class="mw-page-title-main">Revenue equivalence</span>

Revenue equivalence is a concept in auction theory that states that given certain conditions, any mechanism that results in the same outcomes also has the same expected revenue.

In statistics, additive smoothing, also called Laplace smoothing or Lidstone smoothing, is a technique used to smooth categorical data. Given a set of observation counts from a -dimensional multinomial distribution with trials, a "smoothed" version of the counts gives the estimator:

In statistics, the antithetic variates method is a variance reduction technique used in Monte Carlo methods. Considering that the error in the simulated signal has a one-over square root convergence, a very large number of sample paths is required to obtain an accurate result. The antithetic variates method reduces the variance of the simulation results.

In decision theory and quantitative policy analysis, the expected value of including uncertainty (EVIU) is the expected difference in the value of a decision based on a probabilistic analysis versus a decision based on an analysis that ignores uncertainty.

In mathematics, the walk-on-spheres method (WoS) is a numerical probabilistic algorithm, or Monte-Carlo method, used mainly in order to approximate the solutions of some specific boundary value problem for partial differential equations (PDEs). The WoS method was first introduced by Mervin E. Muller in 1956 to solve Laplace's equation, and was since then generalized to other problems.

Exponential Tilting (ET), Exponential Twisting, or Exponential Change of Measure (ECM) is a distribution shifting technique used in many parts of mathematics. The different exponential tiltings of a random variable is known as the natural exponential family of .

References

  1. Schlaifer, R.; Raiffa, H. (1968). Applied Statistical Decision Theory. Cambridge: MIT Press. OCLC   443816.

Further reading