Nested sampling algorithm

Last updated

The nested sampling algorithm is a computational approach to the Bayesian statistics problems of comparing models and generating samples from posterior distributions. It was developed in 2004 by physicist John Skilling. [1]

Contents

Background

Bayes' theorem can be applied to a pair of competing models and for data , one of which may be true (though which one is unknown) but which both cannot be true simultaneously. The posterior probability for may be calculated as:

The prior probabilities and are already known, as they are chosen by the researcher ahead of time. However, the remaining Bayes factor is not so easy to evaluate, since in general it requires marginalizing nuisance parameters. Generally, has a set of parameters that can be grouped together and called , and has its own vector of parameters that may be of different dimensionality, but is still termed . The marginalization for is

and likewise for . This integral is often analytically intractable, and in these cases it is necessary to employ a numerical algorithm to find an approximation. The nested sampling algorithm was developed by John Skilling specifically to approximate these marginalization integrals, and it has the added benefit of generating samples from the posterior distribution . [2] It is an alternative to methods from the Bayesian literature [3] such as bridge sampling and defensive importance sampling.

Here is a simple version of the nested sampling algorithm, followed by a description of how it computes the marginal probability density where is or :

Start with  points  sampled from prior. for to do        % The number of iterations j is chosen by guesswork.     current likelihood values of the points;          Save the point with least likelihood as a sample point with weight .     Update the point with least likelihood with some Markov chain Monte Carlo steps according to the prior, accepting only steps that     keep the likelihood above . endreturn;

At each iteration, is an estimate of the amount of prior mass covered by the hypervolume in parameter space of all points with likelihood greater than . The weight factor is an estimate of the amount of prior mass that lies between two nested hypersurfaces and . The update step computes the sum over of to numerically approximate the integral

In the limit , this estimator has a positive bias of order [4] which can be removed by using instead of the in the above algorithm.

The idea is to subdivide the range of and estimate, for each interval , how likely it is a priori that a randomly chosen would map to this interval. This can be thought of as a Bayesian's way to numerically implement Lebesgue integration. [5]

Implementations

Example implementations demonstrating the nested sampling algorithm are publicly available for download, written in several programming languages.

Applications

Since nested sampling was proposed in 2004, it has been used in many aspects of the field of astronomy. One paper suggested using nested sampling for cosmological model selection and object detection, as it "uniquely combines accuracy, general applicability and computational feasibility." [20] A refinement of the algorithm to handle multimodal posteriors has been suggested as a means to detect astronomical objects in extant datasets. [15] Other applications of nested sampling are in the field of finite element updating where the algorithm is used to choose an optimal finite element model, and this was applied to structural dynamics. [21] This sampling method has also been used in the field of materials modeling. It can be used to learn the partition function from statistical mechanics and derive thermodynamic properties. [22]

Dynamic nested sampling

Dynamic nested sampling is a generalisation of the nested sampling algorithm in which the number of samples taken in different regions of the parameter space is dynamically adjusted to maximise calculation accuracy. [23] This can lead to large improvements in accuracy and computational efficiency when compared to the original nested sampling algorithm, in which the allocation of samples cannot be changed and often many samples are taken in regions which have little effect on calculation accuracy.

Publicly available dynamic nested sampling software packages include:

Dynamic nested sampling has been applied to a variety of scientific problems, including analysis of gravitational waves, [28] mapping distances in space [29] and exoplanet detection. [30]

See also

Related Research Articles

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses prior knowledge, in the form of a prior distribution in order to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

The Bayes factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other. The models in question can have a common set of parameters, such as a null hypothesis and an alternative, but this is not necessary; for instance, it could also be a non-linear model compared to its linear approximation. The Bayes factor can be thought of as a Bayesian analog to the likelihood-ratio test, although it uses the integrated likelihood rather than the maximized likelihood. As such, both quantities only coincide under simple hypotheses. Also, in contrast with null hypothesis significance testing, Bayes factors support evaluation of evidence in favor of a null hypothesis, rather than only allowing the null to be rejected or not rejected.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample for all possible values of the parameters; it can be understood as the probability of the model itself and is therefore often referred to as model evidence or simply evidence.

<span class="mw-page-title-main">Tomographic reconstruction</span> Estimate object properties from a finite number of projections

Tomographic reconstruction is a type of multidimensional inverse problem where the challenge is to yield an estimate of a specific system from a finite number of projections. The mathematical basis for tomographic imaging was laid down by Johann Radon. A notable example of applications is the reconstruction of computed tomography (CT) where cross-sectional images of patients are obtained in non-invasive manner. Recent developments have seen the Radon transform and its inverse used for tasks related to realistic object insertion required for testing and evaluating computed tomography use in airport security.

Econometric models involving data sampled at different frequencies are of general interest. Mixed-data sampling (MIDAS) is an econometric regression developed by Eric Ghysels with several co-authors. There is now a substantial literature on MIDAS regressions and their applications, including Ghysels, Santa-Clara and Valkanov (2006), Ghysels, Sinko and Valkanov, Andreou, Ghysels and Kourtellos (2010) and Andreou, Ghysels and Kourtellos (2013).

Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment. This allows accounting for both any prior knowledge on the parameters to be determined as well as uncertainties in observations.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic model. In this, observations are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

<span class="mw-page-title-main">Thompson sampling</span> Type of heuristic technique

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

In statistics, the graphical lasso is a sparsepenalized maximum likelihood estimator for the concentration or precision matrix of a multivariate elliptical distribution. The original variant was formulated to solve Dempster's covariance selection problem for the multivariate Gaussian distribution when observations were limited. Subsequently, the optimization algorithms to solve this problem were improved and extended to other types of estimators and distributions.

In computational statistics, the pseudo-marginal Metropolis–Hastings algorithm is a Monte Carlo method to sample from a probability distribution. It is an instance of the popular Metropolis–Hastings algorithm that extends its use to cases where the target density is not available analytically. It relies on the fact that the Metropolis–Hastings algorithm can still sample from the correct target distribution if the target density in the acceptance ratio is replaced by an estimate. It is especially popular in Bayesian statistics, where it is applied if the likelihood function is not tractable.

<span class="mw-page-title-main">Stochastic gradient Langevin dynamics</span> Optimization and sampling technique

Stochastic gradient Langevin dynamics (SGLD) is an optimization and sampling technique composed of characteristics from Stochastic gradient descent, a Robbins–Monro optimization algorithm, and Langevin dynamics, a mathematical extension of molecular dynamics models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a differentiable objective function. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems.

A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

References

  1. Skilling, John (2004). "Nested Sampling". AIP Conference Proceedings. 735: 395–405. Bibcode:2004AIPC..735..395S. doi:10.1063/1.1835238.
  2. Skilling, John (2006). "Nested Sampling for General Bayesian Computation". Bayesian Analysis. 1 (4): 833–860. doi: 10.1214/06-BA127 .
  3. Chen, Ming-Hui, Shao, Qi-Man, and Ibrahim, Joseph George (2000). Monte Carlo methods in Bayesian computation. Springer. ISBN   978-0-387-98935-8.{{cite book}}: CS1 maint: multiple names: authors list (link)
  4. Walter, Clement (2017). "Point-process based Monte Carlo estimation". Statistics and Computing. 27: 219–236. arXiv: 1412.6368 . doi:10.1007/s11222-015-9617-y. S2CID   14639080.
  5. Jasa, Tomislav; Xiang, Ning (2012). "Nested sampling applied in Bayesian room-acoustics decay analysis". Journal of the Acoustical Society of America. 132 (5): 3251–3262. Bibcode:2012ASAJ..132.3251J. doi:10.1121/1.4754550. PMID   23145609. S2CID   20876510.
  6. John Skilling website
  7. Nested sampling algorithm in Haskell at Hackage
  8. Nested sampling algorithm in R on Bojan Nikolic website
  9. Nested sampling algorithm in R on GitHub
  10. Kester, D.; Mueller, M. (2021). "BayesicFitting, a PYTHON toolbox for Bayesian fitting and evidence calculation.: Including a Nested Sampling implementation". Astronomy and Computing. 37: 100503. arXiv: 2109.11976 . doi: 10.1016/j.ascom.2021.100503 .
  11. Python toolbox containing a Nested sampling algorithm on GitHub
  12. Nested sampling algorithm in C++ on GitHub
  13. Nested sampling algorithm in Python on GitHub
  14. Nested sampling algorithm for materials simulation on GitHub
  15. 1 2 Feroz, F.; Hobson, M.P. (2008). "Multimodal nested sampling: an efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses". MNRAS. 384 (2): 449–463. arXiv: 0704.3704 . Bibcode:2008MNRAS.384..449F. doi: 10.1111/j.1365-2966.2007.12353.x . S2CID   14226032.
  16. The MultiNest nested sampling software package on GitHub
  17. The PolyChord nested sampling software package on GitHub
  18. Handley, Will; Hobson, Mike; Lasenby, Anthony (2015). "polychord: next-generation nested sampling". Monthly Notices of the Royal Astronomical Society. 453 (4): 4384–4398. arXiv: 1506.00171 . Bibcode:2015MNRAS.453.4384H. doi: 10.1093/mnras/stv1911 . S2CID   118882763.
  19. Implementations of single and multi-ellipsoid nested sampling in Julia on GitHub
  20. Mukherjee, P.; Parkinson, D.; Liddle, A.R. (2006). "A Nested Sampling Algorithm for Cosmological Model Selection". Astrophysical Journal. 638 (2): 51–54. arXiv: astro-ph/0508461 . Bibcode:2006ApJ...638L..51M. doi:10.1086/501068. S2CID   6208051.
  21. Mthembu, L.; Marwala, T.; Friswell, M.I.; Adhikari, S. (2011). "Model selection in finite element model updating using the Bayesian evidence statistic". Mechanical Systems and Signal Processing. 25 (7): 2399–2412. Bibcode:2011MSSP...25.2399M. doi:10.1016/j.ymssp.2011.04.001.
  22. Partay, Livia B. (2010). "Efficient Sampling of Atomic Configurational Spaces". The Journal of Physical Chemistry B. 114 (32): 10502–10512. arXiv: 0906.3544 . doi:10.1021/jp1012973. PMID   20701382. S2CID   16834142.
  23. Higson, Edward; Handley, Will; Hobson, Michael; Lasenby, Anthony (2019). "Dynamic nested sampling: an improved algorithm for parameter estimation and evidence calculation". Statistics and Computing. 29 (5): 891–913. arXiv: 1704.03459 . Bibcode:2019S&C....29..891H. doi:10.1007/s11222-018-9844-0. S2CID   53514669.
  24. The dynesty nested sampling software package on GitHub
  25. Speagle, Joshua (2020). "dynesty: A Dynamic Nested Sampling Package for Estimating Bayesian Posteriors and Evidences". Monthly Notices of the Royal Astronomical Society. 493 (3): 3132–3158. arXiv: 1904.02180 . doi: 10.1093/mnras/staa278 . S2CID   102354337.
  26. Higson, Edward (2018). "dyPolyChord: dynamic nested sampling with PolyChord". Journal of Open Source Software. 3 (29): 965. doi: 10.21105/joss.00965 .
  27. The dyPolyChord dynamic nested sampling software package on GitHub
  28. Ashton, Gregory; et al. (2019). "Bilby: A User-friendly Bayesian Inference Library for Gravitational-wave Astronomy". The Astrophysical Journal Supplement Series. 241 (2): 13. arXiv: 1811.02042 . Bibcode:2019ApJS..241...27A. doi: 10.3847/1538-4365/ab06fc . S2CID   118677076.
  29. Zucker, Catherine; et al. (2018). "Mapping Distances across the Perseus Molecular Cloud Using {CO} Observations, Stellar Photometry, and Gaia {DR}2 Parallax Measurements". The Astrophysical Journal. 869 (1): 83. arXiv: 1803.08931 . doi: 10.3847/1538-4357/aae97c . S2CID   119446622.
  30. Günther, Maximilian; et al. (2019). "A super-Earth and two sub-Neptunes transiting the nearby and quiet M dwarf TOI-270". Nature Astronomy. 3 (12): 1099–1108. arXiv: 1903.06107 . Bibcode:2019NatAs...3.1099G. doi:10.1038/s41550-019-0845-5. S2CID   119286334.