Stan (software)

Last updated
Stan
Original author(s) Stan Development Team
Initial releaseAugust 30, 2012 (2012-08-30)
Stable release
2.35.0 [1]   OOjs UI icon edit-ltr-progressive.svg / 3 June 2024;2 months ago (3 June 2024)
Repository
Written in C++
Operating system Unix-like, Microsoft Windows, Mac OS X
Platform Intel x86 - 32-bit, x64
Type Statistical package
License New BSD License
Website mc-stan.org

Stan is a probabilistic programming language for statistical inference written in C++. [2] The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function. [2]

Contents

Stan is licensed under the New BSD License. Stan is named in honour of Stanislaw Ulam, pioneer of the Monte Carlo method. [2]

Stan was created by a development team consisting of 34 members [3] that includes Andrew Gelman, Bob Carpenter, Matt Hoffman, and Daniel Lee.

Example

A simple linear regression model can be described as , where . This can also be expressed as . The latter form can be written in Stan as the following:

data{int<lower=0>N;vector[N]x;vector[N]y;}parameters{realalpha;realbeta;real<lower=0>sigma;}model{y~normal(alpha+beta*x,sigma);}

Interfaces

The Stan language itself can be accessed through several interfaces:

In addition, higher-level interfaces are provided with packages using Stan as backend, primarily in the R language: [4]

Algorithms

Stan implements gradient-based Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference, stochastic, gradient-based variational Bayesian methods for approximate Bayesian inference, and gradient-based optimization for penalized maximum likelihood estimation.

Automatic differentiation

Stan implements reverse-mode automatic differentiation to calculate gradients of the model, which is required by HMC, NUTS, L-BFGS, BFGS, and variational inference. [2] The automatic differentiation within Stan can be used outside of the probabilistic programming language.

Usage

Stan is used in fields including social science, [8] pharmaceutical statistics, [9] market research, [10] and medical imaging. [11]

See also

Related Research Articles

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the descent direction by preconditioning the gradient with curvature information. It does so by gradually improving an approximation to the Hessian matrix of the loss function, obtained only from gradient evaluations via a generalized secant method.

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

Limited-memory BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimize over unconstrained values of the real-vector where is a differentiable scalar function.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.

GNU MCSim is a suite of simulation software. It allows one to design one's own statistical or simulation models, perform Monte Carlo simulations, and Bayesian inference through (tempered) Markov chain Monte Carlo simulations. The latest version allows parallel computing of Monte Carlo or MCMC simulations.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

<span class="mw-page-title-main">Hamiltonian Monte Carlo</span> Sampling algorithm

The Hamiltonian Monte Carlo algorithm is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for which direct sampling is difficult. This sequence can be used to estimate integrals with respect to the target distribution.

Bayesian inference using Gibbs sampling (BUGS) is a statistical software for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods. It was developed by David Spiegelhalter at the Medical Research Council Biostatistics Unit in Cambridge in 1989 and released as free software in 1991.

<span class="mw-page-title-main">LaplacesDemon</span> Open-source statistical package

LaplacesDemon is an open-source statistical package that is intended to provide a complete environment for Bayesian inference. LaplacesDemon has been used in numerous fields. The user writes their own model specification function and selects a numerical approximation algorithm to update their Bayesian model. Some numerical approximation families of algorithms include Laplace's method, numerical integration, Markov chain Monte Carlo (MCMC), and variational Bayesian methods.

PyMC is a probabilistic programming language written in Python. It can be used for Bayesian statistical modeling and probabilistic machine learning.

The following outline is provided as an overview of and topical guide to machine learning:

<span class="mw-page-title-main">Stochastic gradient Langevin dynamics</span> Optimization and sampling technique

Stochastic gradient Langevin dynamics (SGLD) is an optimization and sampling technique composed of characteristics from Stochastic gradient descent, a Robbins–Monro optimization algorithm, and Langevin dynamics, a mathematical extension of molecular dynamics models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a differentiable objective function. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems.

ArviZ is a Python package for exploratory analysis of Bayesian models. It is specifically designed to work with the output of probabilistic programming libraries like PyMC, Stan, and others by providing a set of tools for summarizing and visualizing the results of Bayesian inference in a convenient and informative way. ArviZ also provides a common data structure for manipulating and storing data commonly arising in Bayesian analysis, like posterior samples or observed data.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

Bambi is a high-level Bayesian model-building interface written in Python. It works with the PyMC probabilistic programming framework. Bambi provides an interface to build and solve Bayesian generalized (non-)linear multivariate multilevel models.

References

  1. "Release 2.35.0". 3 June 2024. Retrieved 26 June 2024.
  2. 1 2 3 4 5 Stan Development Team. 2015. Stan Modeling Language User's Guide and Reference Manual, Version 2.9.0
  3. "Development Team". stan-dev.github.io. Retrieved 2018-07-25.
  4. Gabry, Jonah. "The current state of the Stan ecosystem in R". Statistical Modeling, Causal Inference, and Social Science. Retrieved 25 August 2020.
  5. "BRMS: Bayesian Regression Models using 'Stan'". 23 August 2021.
  6. Hoffman, Matthew D.; Gelman, Andrew (April 2014). "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo". Journal of Machine Learning Research . 15: pp. 1593–1623.
  7. Kucukelbir, Alp; Ranganath, Rajesh; Blei, David M. (June 2015). "Automatic Variational Inference in Stan". 1506 (3431). arXiv: 1506.03431 . Bibcode:2015arXiv150603431K.{{cite journal}}: Cite journal requires |journal= (help)
  8. Goodrich, Benjamin King, Wawro, Gregory and Katznelson, Ira, Designing Quantitative Historical Social Inquiry: An Introduction to Stan (2012). APSA 2012 Annual Meeting Paper. Available at SSRN   2105531
  9. Natanegara, Fanni; Neuenschwander, Beat; Seaman, John W.; Kinnersley, Nelson; Heilmann, Cory R.; Ohlssen, David; Rochester, George (2013). "The current state of Bayesian methods in medical product development: survey results and recommendations from the DIA Bayesian Scientific Working Group". Pharmaceutical Statistics. 13 (1): 3–12. doi:10.1002/pst.1595. ISSN   1539-1612. PMID   24027093. S2CID   19738522.
  10. Feit, Elea (15 May 2017). "Using Stan to Estimate Hierarchical Bayes Models" . Retrieved 19 March 2019.
  11. Gordon, GSD; Joseph, J; Alcolea, MP; Sawyer, T; Macfaden, AJ; Williams, C; Fitzpatrick, CRM; Jones, PH; di Pietro, M; Fitzgerald, RC; Wilkinson, TD; Bohndiek, SE (2019). "Quantitative phase and polarization imaging through an optical fiber applied to detection of early esophageal tumorigenesis". Journal of Biomedical Optics. 24 (12): 1–13. arXiv: 1811.03977 . Bibcode:2019JBO....24l6004G. doi:10.1117/1.JBO.24.12.126004. PMC   7006047 . PMID   31840442.

Further reading