Original author(s) | PyMC Development Team |
---|---|
Initial release | May 4, 2013 |
Stable release | |
Repository | https://github.com/pymc-devs/pymc |
Written in | Python |
Operating system | Unix-like, Mac OS X, Microsoft Windows |
Platform | Intel x86 – 32-bit, x64 |
Type | Statistical package |
License | Apache License, Version 2.0 |
Website | www |
PyMC (formerly known as PyMC3) is a probabilistic programming language written in Python. It can be used for Bayesian statistical modeling and probabilistic machine learning.
PyMC performs inference based on advanced Markov chain Monte Carlo and/or variational fitting algorithms. [2] [3] [4] [5] [6] It is a rewrite from scratch of the previous version of the PyMC software. [7] Unlike PyMC2, which had used Fortran extensions for performing computations, PyMC relies on PyTensor, a Python library that allows defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays. From version 3.8 PyMC relies on ArviZ to handle plotting, diagnostics, and statistical checks. PyMC and Stan are the two most popular probabilistic programming tools. [8] PyMC is an open source project, developed by the community and has been fiscally sponsored by NumFOCUS. [9]
PyMC has been used to solve inference problems in several scientific domains, including astronomy, [10] [11] epidemiology, [12] [13] molecular biology, [14] crystallography, [15] [16] chemistry, [17] ecology [18] [19] and psychology. [20] Previous versions of PyMC were also used widely, for example in climate science, [21] public health, [22] neuroscience, [23] and parasitology. [24] [25]
After Theano announced plans to discontinue development in 2017, [26] the PyMC team evaluated TensorFlow Probability as a computational backend, [27] but decided in 2020 to fork Theano under the name Aesara. [28] Large parts of the Theano codebase have been refactored and compilation through JAX [29] and Numba were added. The PyMC team has released the revised computational backend under the name PyTensor and continues the development of PyMC. [30]
PyMC implements non-gradient-based and gradient-based Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference and stochastic, gradient-based variational Bayesian methods for approximate Bayesian inference.
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. The name comes from the Monte Carlo Casino in Monaco, where the primary developer of the method, mathematician Stanislaw Ulam, was inspired by his uncle's gambling habits.
A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.
Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.
The nested sampling algorithm is a computational approach to the Bayesian statistics problems of comparing models and generating samples from posterior distributions. It was developed in 2004 by physicist John Skilling.
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.
Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.
Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.
Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures.
Bayesian inference using Gibbs sampling (BUGS) is a statistical software for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods. It was developed by David Spiegelhalter at the Medical Research Council Biostatistics Unit in Cambridge in 1989 and released as free software in 1991.
Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A familiar example of a rare event in this context would be nucleation of a raindrop from over-saturated water vapour: although raindrops form every day, relative to the length and time scales defined by the motion of water molecules in the vapour phase, the formation of a liquid droplet is extremely rare.
The free energy principle is a theoretical framework suggesting that the brain reduces surprise or uncertainty by making predictions based on internal models and updating them using sensory input. It highlights the brain's objective of aligning its internal model and the external world to enhance prediction accuracy. This principle integrates Bayesian inference with active inference, where actions are guided by predictions and sensory feedback refines them. It has wide-ranging implications for comprehending brain function, perception, and action.
Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.
Radford M. Neal is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he holds a research chair in statistics and machine learning.
Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation. This allows for gradient-based optimization of parameters in the program, often via gradient descent, as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly scientific computing and machine learning. One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the Advanced Concepts Team at the European Space Agency in early 2016.
ArviZ is a Python package for exploratory analysis of Bayesian models. It is specifically designed to work with the output of probabilistic programming libraries like PyMC, Stan, and others by providing a set of tools for summarizing and visualizing the results of Bayesian inference in a convenient and informative way. ArviZ also provides a common data structure for manipulating and storing data commonly arising in Bayesian analysis, like posterior samples or observed data.
This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.
Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.
Bambi is a high-level Bayesian model-building interface written in Python. It works with the PyMC probabilistic programming framework. Bambi provides an interface to build and solve Bayesian generalized (non-)linear multivariate multilevel models.
{{cite journal}}
: Cite journal requires |journal=
(help)