A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities. [1]
Realizations of these random variables are generated and inserted into a model of the system. Outputs of the model are recorded, and then the process is repeated with a new set of random values. These steps are repeated until a sufficient amount of data is gathered. In the end, the distribution of the outputs shows the most probable estimates as well as a frame of expectations regarding what ranges of values the variables are more or less likely to fall in. [1]
Often random variables inserted into the model are created on a computer with a random number generator (RNG). The U(0,1) uniform distribution outputs of the random number generator are then transformed into random variables with probability distributions that are used in the system model. [2]
Stochastic originally meant "pertaining to conjecture"; from Greek stokhastikos "able to guess, conjecturing": from stokhazesthai "guess"; from stokhos "a guess, aim, target, mark". The sense of "randomly determined" was first recorded in 1934, from German Stochastik. [3]
In order to determine the next event in a stochastic simulation, the rates of all possible changes to the state of the model are computed, and then ordered in an array. Next, the cumulative sum of the array is taken, and the final cell contains the number R, where R is the total event rate. This cumulative array is now a discrete cumulative distribution, and can be used to choose the next event by picking a random number z~U(0,R) and choosing the first event, such that z is less than the rate associated with that event.
A probability distribution is used to describe the potential outcome of a random variable.
Limits the outcomes where the variable can only take on discrete values. [4]
A random variable X is Bernoulli-distributed with parameter p if it has two possible outcomes usually encoded 1 (success or default) or 0 (failure or survival) [5] where the probabilities of success and failure are and where .
To produce a random variable X with a Bernoulli distribution from a U(0,1) uniform distribution made by a random number generator, we define
such that the probability for and . [2]
Define
For a fair coin, both realizations are equally likely. We can generate realizations of this random variable X from a uniform distribution provided by a random number generator (RNG) by having if the RNG outputs a value between 0 and 0.5 and if the RNG outputs a value between 0.5 and 1.
Of course, the two outcomes may not be equally likely (e.g. success of medical treatment). [6]
A binomial distributed random variable Y with parameters n and p is obtained as the sum of n independent and identically Bernoulli-distributed random variables X1, X2, ..., Xn [4]
Example: A coin is tossed three times. Find the probability of getting exactly two heads. This problem can be solved by looking at the sample space. There are three ways to get two heads.
The answer is 3/8 (= 0.375). [7]
A poisson process is a process where events occur randomly in an interval of time or space. [2] [8] The probability distribution for Poisson processes with constant rate λ per time interval is given by the following equation. [4]
Defining as the number of events that occur in the time interval
It can be shown that inter-arrival times for events is exponentially distributed with a cumulative distribution function (CDF) of . The inverse of the exponential CDF is given by
where is an uniformly distributed random variable. [2]
Simulating a Poisson process with a constant rate for the number of events that occur in interval can be carried out with the following algorithm. [9]
Published by Dan Gillespie in 1977, and is a linear search on the cumulative array. See Gillespie algorithm.
Gillespie’s Stochastic Simulation Algorithm (SSA) is essentially an exact procedure for numerically simulating the time evolution of a well-stirred chemically reacting system by taking proper account of the randomness inherent in such a system. [10]
It is rigorously based on the same microphysical premise that underlies the chemical master equation and gives a more realistic representation of a system’s evolution than the deterministic reaction rate equation (RRE) represented mathematically by ODEs. [10]
As with the chemical master equation, the SSA converges, in the limit of large numbers of reactants, to the same solution as the law of mass action.
Published 2000 by Gibson and Bruck. [11] This is an improvement over the first reaction method where the unused reaction times are reused. To make the sampling of reactions more efficient, an indexed priority queue is used to store the reaction times. On the other hand, to make the recomputation of propensities more efficient, a dependency graph is used. This dependency graph tells which reaction propensities to update after a particular reaction has fired.
Published 2004 [12] and 2005. These methods sort the cumulative array to reduce the average search depth of the algorithm. The former runs a presimulation to estimate the firing frequency of reactions, whereas the latter sorts the cumulative array on-the-fly.
Published in 2006. This is a binary search on the cumulative array, thus reducing the worst-case time complexity of reaction sampling to O (log M).
Published in 2009, 2010, and 2011 (Ramaswamy 2009, 2010, 2011). Use factored-out, partial reaction propensities to reduce the computational cost to scale with the number of species in the network, rather than the (larger) number of reactions. Four variants exist:
The use of partial-propensity methods is limited to elementary chemical reactions, i.e., reactions with at most two different reactants. Every non-elementary chemical reaction can be equivalently decomposed into a set of elementary ones, at the expense of a linear (in the order of the reaction) increase in network size.
A general drawback of stochastic simulations is that for big systems, too many events happen which cannot all be taken into account in a simulation. The following methods can dramatically improve simulation speed by some approximations.
Since the SSA method keeps track of each transition, it would be impractical to implement for certain applications due to high time complexity. Gillespie proposed an approximation procedure, the tau-leaping method which decreases computational time with minimal loss of accuracy. [13] Instead of taking incremental steps in time, keeping track of X(t) at each time step as in the SSA method, the tau-leaping method leaps from one subinterval to the next, approximating how many transitions take place during a given subinterval. It is assumed that the value of the leap, τ, is small enough that there is no significant change in the value of the transition rates along the subinterval [t, t + τ]. This condition is known as the leap condition. The tau-leaping method thus has the advantage of simulating many transitions in one leap while not losing significant accuracy, resulting in a speed up in computational time. [14]
This method approximates reversible processes (which includes random walk/diffusion processes) by taking only net rates of the opposing events of a reversible process into account. The main advantage of this method is that it can be implemented with a simple if-statement replacing the previous transition rates of the model with new, effective rates. The model with the replaced transition rates can thus be solved, for instance, with the conventional SSA. [15]
While in discrete state space it is clearly distinguished between particular states (values) in continuous space it is not possible due to certain continuity. The system usually change over time, variables of the model, then change continuously as well. Continuous simulation thereby simulates the system over time, given differential equations determining the rates of change of state variables. [16] Example of continuous system is the predator/prey model [17] or cart-pole balancing [18]
The random variable X is said to be normally distributed with parameters μ and σ, abbreviated by X ∈ N(μ, σ2), if the density of the random variable is given by the formula [4]
Many things actually are normally distributed, or very close to it. For example, height and intelligence are approximately normally distributed; measurement errors also often have a normal distribution. [19]
Exponential distribution describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.
The exponential distribution is popular, for example, in queuing theory when we want to model the time we have to wait until a certain event takes place. Examples include the time until the next client enters the store, the time until a certain company defaults or the time until some machine has a defect. [4]
Student's t-distribution are used in finance as probabilistic models of assets returns. The density function of the t-distribution is given by the following equation: [4]
where is the number of degrees of freedom and is the gamma function.
For large values of n, the t-distribution doesn't significantly differ from a standard normal distribution. Usually, for values n > 30, the t-distribution is considered as equal to the standard normal distribution.
It is often possible to model one and the same system by use of completely different world views. Discrete event simulation of a problem as well as continuous event simulation of it (continuous simulation with the discrete events that disrupt the continuous flow) may lead eventually to the same answers. Sometimes however, the techniques can answer different questions about a system. If we necessarily need to answer all the questions, or if we don't know what purposes is the model going to be used for, it is convenient to apply combined continuous/discrete methodology. [20] Similar techniques can change from a discrete, stochastic description to a deterministic, continuum description in a time-and space dependent manner. [21] The use of this technique enables the capturing of noise due to small copy numbers, while being much faster to simulate than the conventional Gillespie algorithm. Furthermore, the use of the deterministic continuum description enables the simulations of arbitrarily large systems.
Monte Carlo is an estimation procedure. The main idea is that if it is necessary to know the average value of some random variable and its distribution cannot be stated, and if it is possible to take samples from the distribution, we can estimate it by taking the samples, independently, and averaging them. If there are sufficient samples, then the law of large numbers says the average must be close to the true value. The central limit theorem says that the average has a Gaussian distribution around the true value. [22]
As a simple example, suppose we need to measure area of a shape with a complicated, irregular outline. The Monte Carlo approach is to draw a square around the shape and measure the square. Then we throw darts into the square, as uniformly as possible. The fraction of darts falling on the shape gives the ratio of the area of the shape to the area of the square. In fact, it is possible to cast almost any integral problem, or any averaging problem, into this form. It is necessary to have a good way to tell if you're inside the outline, and a good way to figure out how many darts to throw. Last but not least, we need to throw the darts uniformly, i.e., using a good random number generator. [22]
There are wide possibilities for use of Monte Carlo Method: [1]
For simulation experiments (including Monte Carlo) it is necessary to generate random numbers (as values of variables). The problem is that the computer is highly deterministic machine—basically, behind each process there is always an algorithm, a deterministic computation changing inputs to outputs; therefore it is not easy to generate uniformly spread random numbers over a defined interval or set. [1]
A random number generator is a device capable of producing a sequence of numbers which cannot be "easily" identified with deterministic properties. This sequence is then called a sequence of stochastic numbers. [23]
The algorithms typically rely on pseudorandom numbers, computer generated numbers mimicking true random numbers, to generate a realization, one possible outcome of a process. [24]
Methods for obtaining random numbers have existed for a long time and are used in many different fields (such as gaming). However, these numbers suffer from a certain bias. Currently the best methods expected to produce truly random sequences are natural methods that take advantage of the random nature of quantum phenomena. [23]
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.
A random variable is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes in a sample space to a measurable space, often the real numbers.
Inverse transform sampling is a basic method for pseudo-random number sampling, i.e., for generating sample numbers at random from any probability distribution given its cumulative distribution function.
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.
In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, a and b, which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated U, where U stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable X under no constraint other than that it is contained in the distribution's support.
A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distributions. It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occurs may itself be a stochastic process. The distribution can be represented by a random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of the Markov process represents one of the phases.
In probability theory, the Gillespie algorithm generates a statistically correct trajectory of a stochastic equation system for which the reaction rates are known. It was created by Joseph L. Doob and others, presented by Dan Gillespie in 1976, and popularized in 1977 in a paper where he uses it to simulate chemical or biochemical systems of reactions efficiently and accurately using limited computational power. As computers have become faster, the algorithm has been used to simulate increasingly complex systems. The algorithm is particularly useful for simulating reactions within cells, where the number of reagents is low and keeping track of the position and behaviour of individual molecules is computationally feasible. Mathematically, it is a variant of a dynamic Monte Carlo method and similar to the kinetic Monte Carlo methods. It is used heavily in computational systems biology.
The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.
In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equals the given probability. Intuitively, the quantile function associates with a range at and below a probability input the likelihood that a random variable is realized in that range for some probability distribution. It is also called the percentile function, percent-point function or inverse cumulative distribution function.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume.
In queueing theory, a discipline within the mathematical theory of probability, a fluid queue is a mathematical model used to describe the fluid level in a reservoir subject to randomly determined periods of filling and emptying. The term dam theory was used in earlier literature for these models. The model has been used to approximate discrete models, model the spread of wildfires, in ruin theory and to model high speed data networks. The model applies the leaky bucket algorithm to a stochastic source.
In probability theory, an exponentially modified Gaussian distribution describes the sum of independent normal and exponential random variables. An exGaussian random variable Z may be expressed as Z = X + Y, where X and Y are independent, X is Gaussian with mean μ and variance σ2, and Y is exponential of rate λ. It has a characteristic positive skew from the exponential component.
In probability theory, the zero-truncated Poisson (ZTP) distribution is a certain discrete probability distribution whose support is the set of positive integers. This distribution is also known as the conditional Poisson distribution or the positive Poisson distribution. It is the conditional probability distribution of a Poisson-distributed random variable, given that the value of the random variable is not zero. Thus it is impossible for a ZTP random variable to be zero. Consider for example the random variable of the number of items in a shopper's basket at a supermarket checkout line. Presumably a shopper does not stand in line with nothing to buy, so this phenomenon may follow a ZTP distribution.
In probability theory, tau-leaping, or τ-leaping, is an approximate method for the simulation of a stochastic system. It is based on the Gillespie algorithm, performing all reactions for an interval of length tau before updating the propensity functions. By updating the rates less often this sometimes allows for more efficient simulation and thus the consideration of larger systems.
In probability, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space. The Poisson point process is often called simply the Poisson process, but it is also called a Poisson random measure, Poisson random point field or Poisson point field. This point process has convenient mathematical properties, which has led to its being frequently defined in Euclidean space and used as a mathematical model for seemingly random processes in numerous disciplines such as astronomy, biology, ecology, geology, seismology, physics, economics, image processing, and telecommunications.
Hybrid stochastic simulations are a sub-class of stochastic simulations. These simulations combine existing stochastic simulations with other stochastic simulations or algorithms. Generally they are used for physics and physics-related research. The goal of a hybrid stochastic simulation varies based on context, however they typically aim to either improve accuracy or reduce computational complexity. The first hybrid stochastic simulation was developed in 1985.
A mixed Poisson distribution is a univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a Poisson distribution, and that the rate parameter itself is considered as a random variable. Hence it is a special case of a compound probability distribution. Mixed Poisson distributions can be found in actuarial mathematics as a general approach for the distribution of the number of claims and is also examined as an epidemiological model. It should not be confused with compound Poisson distribution or compound Poisson process.