Stochastic process rare event sampling

Last updated

Stochastic-process rare event sampling (SPRES) is a rare-event sampling method in computer simulation, designed specifically for non-equilibrium calculations, including those for which the rare-event rates are time-dependent (non-stationary process). To treat systems in which there is time dependence in the dynamics, due either to variation of an external parameter or to evolution of the system itself, the scheme for branching paths must be devised so as to achieve sampling which is distributed evenly in time and which takes account of changing fluxes through different regions of the phase space.

Contents

Algorithm summary

The SPRES algorithm [1] branches simulation paths at fixed time intervals. The process of branching requires that identical paths can be made to diverge from each other, such as by changing the seed in the computer's random number generator. For systems which would be naturally considered as deterministic, it may be possible to inject an element of randomness, for instance by coupling to a fluctuating heat bath or by adding random perturbations to account for some elements of the simulation which are not modelled explicitly but which exist in the real system.

The amount of over or under-sampling (the branching density) is decided based on some system-specific 'progress coordinate' which measures progress toward a rare event of interest. The probability of selecting a configuration as the starting point for a new path segment is conditioned jointly by its probability of appearing in an unbiased simulation and by the local flux forwards in the progress coordinate, with a small flux leading adaptively to a larger oversampling.

The method is designed to allow ready observation of rare events with respect to time. An additional benefit relative to methods which mainly split trajectories based on interfaces in the progress coordinate rather than on time is that over most of the progress coordinate space the coordinate only needs to be evaluated at fixed time intervals (rather than continuously) because the exact time-point at which interfaces other than the final interface are reached is no longer of importance.

See also

Cited references

  1. Berryman, Joshua T.; Schilling, Tanja (2010). "Sampling rare events in nonequilibrium and nonstationary systems". The Journal of Chemical Physics. 133 (24): 244101. arXiv: 1001.2456 . Bibcode:2010JChPh.133x4101B. doi:10.1063/1.3525099. PMID   21197970. S2CID   34154184.


Related Research Articles

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Statistics</span> Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the Metropolis–Hastings algorithm.

Stochastic refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselves, these two terms are often used synonymously. Furthermore, in probability theory, the formal concept of a stochastic process is also referred to as a random process.

Global optimization is a branch of applied mathematics and numerical analysis that attempts to find the global minima or maxima of a function or a set of functions on a given set. It is usually described as a minimization problem because the maximization of the real-valued function is equivalent to the minimization of the function .

Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper by Teun Kloek and Herman K. van Dijk in 1978, but its precursors can be found in statistical physics as early as 1949. Importance sampling is also related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.

<span class="mw-page-title-main">Mathematical statistics</span> Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

Monte Carlo methods are used in corporate finance and mathematical finance to value and analyze (complex) instruments, portfolios and investments by simulating the various sources of uncertainty affecting their value, and then determining the distribution of their value over the range of resultant outcomes. This is usually done by help of stochastic asset models. The advantage of Monte Carlo methods over other techniques increases as the dimensions of the problem increase.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

A discrete-event simulation (DES) models the operation of a system as a (discrete) sequence of events in time. Each event occurs at a particular instant in time and marks a change of state in the system. Between consecutive events, no change in the system is assumed to occur; thus the simulation time can directly jump to the occurrence time of the next event, which is called next-event time progression.

A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.

Algorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst. Cornerstones in this field are computational learning theory, granular computing, bioinformatics, and, long ago, structural probability . The main focus is on the algorithms which compute statistics rooting the study of a random phenomenon, along with the amount of data they must feed on to produce reliable results. This shifts the interest of mathematicians from the study of the distribution laws to the functional properties of the statistics, and the interest of computer scientists from the algorithms for processing data to the information they process.

Transition path sampling (TPS) is a rare-event sampling method used in computer simulations of rare events: physical or chemical transitions of a system from one stable state to another that occur too rarely to be observed on a computer timescale. Examples include protein folding, chemical reactions and nucleation. Standard simulation tools such as molecular dynamics can generate the dynamical trajectories of all the atoms in the system. However, because of the gap in accessible time-scales between simulation and reality, even present supercomputers might require years of simulations to show an event that occurs once per microsecond without some kind of acceleration.

Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a uniformly distributed PRN generator. Computational algorithms are then used to manipulate a single random variate, X, or often several such variates, into a new random variate Y such that these values have the required distribution. The first methods were developed for Monte-Carlo simulations in the Manhattan project, published by John von Neumann in the early 1950s.

Probability bounds analysis (PBA) is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds. It is used to project partial information about random variables and other quantities through mathematical expressions. For instance, it computes sure bounds on the distribution of a sum, product, or more complex function, given only sure bounds on the distributions of the inputs. Such bounds are called probability boxes, and constrain cumulative probability distributions.

Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A familiar example of a rare event in this context would be nucleation of a raindrop from over-saturated water vapour: although raindrops form every day, relative to the length and time scales defined by the motion of water molecules in the vapour phase, the formation of a liquid droplet is extremely rare.

Subset simulation is a method used in reliability engineering to compute small failure probabilities encountered in engineering systems. The basic idea is to express a small failure probability as a product of larger conditional probabilities by introducing intermediate failure events. This conceptually converts the original rare-event problem into a series of frequent-event problems that are easier to solve. In the actual implementation, samples conditional on intermediate failure events are adaptively generated to gradually populate from the frequent to rare event region. These 'conditional samples' provide information for estimating the complementary cumulative distribution function (CCDF) of the quantity of interest, covering the high as well as the low probability regions. They can also be used for investigating the cause and consequence of failure events. The generation of conditional samples is not trivial but can be performed efficiently using Markov chain Monte Carlo (MCMC).

<span class="mw-page-title-main">Simulation-based optimization</span>

Simulation-based optimization integrates optimization techniques into simulation modeling and analysis. Because of the complexity of the simulation, the objective function may become difficult and expensive to evaluate. Usually, the underlying simulation model is stochastic, so that the objective function must be estimated using statistical estimation techniques.