Probability management

Last updated

The discipline of probability management communicates and calculates uncertainties as data structures that obey both the laws of arithmetic and probability, while preserving statistical coherence. The simplest approach is to use vector arrays of simulated or historical realizations and metadata called Stochastic Information Packets (SIPs). A set of SIPs, which preserve statistical relationships between variables, is said to be coherent and is referred to as a Stochastic Library Unit with Relationships Preserved (SLURP). SIPs and SLURPs allow stochastic simulations to communicate with one another. For example, see Analytica (Wikipedia), Analytica (SIP page), Oracle Crystal Ball, Frontline Solvers, and Autobox.

The first large documented application of SIPs involved the exploration portfolio of Royal Dutch Shell in 2005 as reported by Savage, Scholtes, and Zweidler, who formalized the discipline of probability management in 2006. [1] The topic is also explored at length in. [2]

Vectors of simulated realizations of probability distributions have been used to drive stochastic optimization since at least 1991. [3] Andrew Gelman described such arrays of realizations as Random Variable Objects in 2007. [4]

A recent approach does not store the actual realizations, but delivers formulas known as Virtual SIPs that generate identical simulation trials in the host environment regardless of platform. This is accomplished through inverse transform sampling, also known as the F-Inverse method, coupled to a portable pseudo random number generator, which produces the same stream of uniform random numbers across platforms. [5] Quantile parameterized distributions (QPDs) are convenient for inverse transform sampling in this context. In particular, the Metalog distribution is a flexible continuous probability distribution that has simple closed form equations, can be directly parameterized by data, using only a handful of parameters. [6] An ideal pseudo random number generator for driving inverse transforms is the HDR generator developed by Douglas W. Hubbard. It is a counter-based generator with a four-dimensional seed plus an iteration index that runs in virtually all platforms including Microsoft Excel. [7] This allows simulation results derived in R, Python, or other readily available platforms to be delivered identically, trial by trial to a wide audience in terms of a combination of a few parameters for a Metalog distribution accompanied by the five inputs to the HDR generator.

In 2013, ProbabilityManagement.org was incorporated as a 501(c)(3) nonprofit that supports this approach through education, tools, and open standards. Executive Director Sam Savage is the author of The Flaw of Averages: Why we Underestimate Risk in the Face of Uncertainty and is an adjunct professor at Stanford University. Harry Markowitz, Nobel Laureate in Economics, was a co-founding board member. The nonprofit has received financial support from Chevron Corporation, General Electric, Highmark Health, Kaiser Permanente, Lockheed Martin, PG&E, and Wells Fargo Bank. The SIPmath 2.0 Standard supports XLSX, CSV, and XML Formats. [8] The SIPmath 3.0 Standard uses JSON objects to convey virtual SIPs based on the Metalog Distribution and HDR Generator.

Related Research Articles

A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.

Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geometallurgy, geography, forestry, environmental control, landscape ecology, soil science, and agriculture. Geostatistics is applied in varied branches of geography, particularly those involving the spread of diseases (epidemiology), the practice of commerce and military planning (logistics), and the development of efficient spatial networks. Geostatistical algorithms are incorporated in many places, including geographic information systems (GIS).

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the Metropolis–Hastings algorithm.

<span class="mw-page-title-main">Mathematical statistics</span> Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

Network traffic simulation is a process used in telecommunications engineering to measure the efficiency of a communications network.

In mathematical finance, a Monte Carlo option model uses Monte Carlo methods to calculate the value of an option with multiple sources of uncertainty or with complicated features. The first application to option pricing was by Phelim Boyle in 1977. In 1996, M. Broadie and P. Glasserman showed how to price Asian options by Monte Carlo. An important development was the introduction in 1996 by Carriere of Monte Carlo methods for options with early exercise features.

In probability and statistics, a random variate or simply variate is a particular outcome of a random variable: the random variates which are other outcomes of the same random variable might have different values. A random deviate or simply deviate is the difference of random variate with respect to the distribution central location, often divided by the standard deviation of the distribution.

"Stochastic" means being or having a random variable. A stochastic model is a tool for estimating probability distributions of potential outcomes by allowing for random variation in one or more inputs over time. The random variation is usually based on fluctuations observed in historical data for a selected period using standard time-series techniques. Distributions of potential outcomes are derived from a large number of simulations which reflect the random variation in the input(s).

A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.

This page lists articles related to probability theory. In particular, it lists many articles corresponding to specific probability distributions. Such articles are marked here by a code of the form (X:Y), which refers to number of random variables involved and the type of the distribution. For example (2:DC) indicates a distribution with two random variables, discrete or continuous. Other codes are just abbreviations for topics. The list of codes can be found in the table of contents.

In probability and statistics, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a uniformly distributed PRN generator. Computational algorithms are then used to manipulate a single random variate, X, or often several such variates, into a new random variate Y such that these values have the required distribution. The first methods were developed for Monte-Carlo simulations in the Manhattan project, published by John von Neumann in the early 1950s.

In probability theory and statistics, an inverse distribution is the distribution of the reciprocal of a random variable. Inverse distributions arise in particular in the Bayesian context of prior distributions and posterior distributions for scale parameters. In the algebra of random variables, inverse distributions are special cases of the class of ratio distributions, in which the numerator random variable has a degenerate distribution.

Quantile-parameterized distributions (QPDs) are probability distributions that are directly parameterized by data. They were motivated by the need for easy-to-use continuous probability distributions flexible enough to represent a wide range of uncertainties, such as those commonly encountered in business, economics, engineering, and science. Because QPDs are directly parameterized by data, they have the practical advantage of avoiding the intermediate step of parameter estimation, a time-consuming process that typically requires non-linear iterative methods to estimate probability-distribution parameters from data. Some QPDs have virtually unlimited shape flexibility and closed-form moments as well.

<span class="mw-page-title-main">Metalog distribution</span>

The metalog distribution is a flexible continuous probability distribution designed for ease of use in practice. Together with its transforms, the metalog family of continuous distributions is unique because it embodies all of following properties: virtually unlimited shape flexibility; a choice among unbounded, semi-bounded, and bounded distributions; ease of fitting to data with linear least squares; simple, closed-form quantile function equations that facilitate simulation; a simple, closed-form PDF; and Bayesian updating in closed form in light of new data. Moreover, like a Taylor series, metalog distributions may have any number of terms, depending on the degree of shape flexibility desired and other application needs.

References

  1. Savage, Sam; Scholtes, Stefan; Zweidler, Daniel (February 2006). "Probability Management | ORMS Today". pubsonline.informs.org. doi:10.1287/orms.2006.01.10 . Retrieved 2022-07-21.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. Savage, Sam (2009). The Flaw of Averages, Why we Underestimate Risk in the Face of Uncertainty. Hoboken: John Wiley & Sons. ISBN   978 0-471-38197-6.
  3. Dembo, Ron (1991). "Scenario Optimization". Annals of Operations Research. 30: 63–80. doi:10.1007/BF02204809. S2CID   44126126.
  4. Gelman, Andrew (2007). "Manipulating and summarizing posterior simulations using random variable objects". Statistics and Computing. 17 (3): 235–244. doi:10.1007/s11222-007-9020-4. S2CID   15926131.
  5. Savage, Sam (2022). Chancification: How to Fix the Flaw of Averages. pp. Chapter 16.
  6. Keelin, Thomas W. (2016-12-01). "The Metalog Distributions". Decision Analysis. 13 (4): 243–277. doi:10.1287/deca.2016.0338. ISSN   1545-8490.
  7. Hubbard, Douglas W. (December 2019). "A Multi-Dimensional, Counter-Based Pseudo Random Number Generator as a Standard for Monte Carlo Simulations" (PDF). WSC 19: Proceedings of the Winter Simulation Conference.
  8. "SIP Standard Specification" (PDF). June 6, 2016.