PyMC

Last updated
PyMC
Original author(s) PyMC Development Team
Initial releaseMay 4, 2013 (2013-05-04)
Stable release
5.18.0 [1]   OOjs UI icon edit-ltr-progressive.svg / 4 November 2024;20 days ago (4 November 2024)
Repository https://github.com/pymc-devs/pymc
Written in Python
Operating system Unix-like, Mac OS X, Microsoft Windows
Platform Intel x86 – 32-bit, x64
Type Statistical package
License Apache License, Version 2.0
Website www.pymc.io

PyMC (formerly known as PyMC3) is a probabilistic programming language written in Python. It can be used for Bayesian statistical modeling and probabilistic machine learning.

Contents

PyMC performs inference based on advanced Markov chain Monte Carlo and/or variational fitting algorithms. [2] [3] [4] [5] [6] It is a rewrite from scratch of the previous version of the PyMC software. [7] Unlike PyMC2, which had used Fortran extensions for performing computations, PyMC relies on PyTensor, a Python library that allows defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays. From version 3.8 PyMC relies on ArviZ to handle plotting, diagnostics, and statistical checks. PyMC and Stan are the two most popular probabilistic programming tools. [8] PyMC is an open source project, developed by the community and has been fiscally sponsored by NumFOCUS. [9]

PyMC has been used to solve inference problems in several scientific domains, including astronomy, [10] [11] epidemiology, [12] [13] molecular biology, [14] crystallography, [15] [16] chemistry, [17] ecology [18] [19] and psychology. [20] Previous versions of PyMC were also used widely, for example in climate science, [21] public health, [22] neuroscience, [23] and parasitology. [24] [25]

After Theano announced plans to discontinue development in 2017, [26] the PyMC team evaluated TensorFlow Probability as a computational backend, [27] but decided in 2020 to fork Theano under the name Aesara. [28] Large parts of the Theano codebase have been refactored and compilation through JAX [29] and Numba were added. The PyMC team has released the revised computational backend under the name PyTensor and continues the development of PyMC. [30]

Inference engines

PyMC implements non-gradient-based and gradient-based Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference and stochastic, gradient-based variational Bayesian methods for approximate Bayesian inference.

See also

Related Research Articles

<span class="mw-page-title-main">Monte Carlo method</span> Probabilistic problem-solving algorithm

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. The name comes from the Monte Carlo Casino in Monaco, where the primary developer of the method, mathematician Stanislaw Ulam, was inspired by his uncle's gambling habits.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

The nested sampling algorithm is a computational approach to the Bayesian statistics problems of comparing models and generating samples from posterior distributions. It was developed in 2004 by physicist John Skilling.

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.

Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures.

Bayesian inference using Gibbs sampling (BUGS) is a statistical software for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods. It was developed by David Spiegelhalter at the Medical Research Council Biostatistics Unit in Cambridge in 1989 and released as free software in 1991.

Rare event sampling is an umbrella term for a group of computer simulation methods intended to selectively sample 'special' regions of the dynamic space of systems which are unlikely to visit those special regions through brute-force simulation. A familiar example of a rare event in this context would be nucleation of a raindrop from over-saturated water vapour: although raindrops form every day, relative to the length and time scales defined by the motion of water molecules in the vapour phase, the formation of a liquid droplet is extremely rare.

The free energy principle is a theoretical framework suggesting that the brain reduces surprise or uncertainty by making predictions based on internal models and updating them using sensory input. It highlights the brain's objective of aligning its internal model and the external world to enhance prediction accuracy. This principle integrates Bayesian inference with active inference, where actions are guided by predictions and sensory feedback refines them. It has wide-ranging implications for comprehending brain function, perception, and action.

<span class="mw-page-title-main">Stan (software)</span> Probabilistic programming language for Bayesian inference

Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.

Radford M. Neal is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he holds a research chair in statistics and machine learning.

Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation. This allows for gradient-based optimization of parameters in the program, often via gradient descent, as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly scientific computing and machine learning. One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the Advanced Concepts Team at the European Space Agency in early 2016.

ArviZ is a Python package for exploratory analysis of Bayesian models. It is specifically designed to work with the output of probabilistic programming libraries like PyMC, Stan, and others by providing a set of tools for summarizing and visualizing the results of Bayesian inference in a convenient and informative way. ArviZ also provides a common data structure for manipulating and storing data commonly arising in Bayesian analysis, like posterior samples or observed data.

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

Bambi is a high-level Bayesian model-building interface written in Python. It works with the PyMC probabilistic programming framework. Bambi provides an interface to build and solve Bayesian generalized (non-)linear multivariate multilevel models.

References

  1. "Release 5.18.0". 4 November 2024. Retrieved 13 November 2024.
  2. Abril-Pla O, Andreani V, Carroll C, Dong L, Fonnesbeck CJ, Kochurov M, Kumar R, Lao J, Luhmann CC, Martin OA, Osthege M, Vieira R, Wiecki T, Zinkov R. (2023) PyMC: a modern, and comprehensive probabilistic programming framework in Python. PeerJ Comput. Sci. 9:e1516 doi : 10.7717/peerj-cs.1516
  3. Salvatier J, Wiecki TV, Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science 2:e55 doi : 10.7717/peerj-cs.55
  4. Martin, Osvaldo (2024). Bayesian Analysis with Python. Packt Publishing Ltd. ISBN   9781805127161 . Retrieved 24 February 2024.
  5. Martin, Osvaldo; Kumar, Ravin; Lao, Junpeng (2021). Bayesian Modeling and Computation in Python. CRC-press. pp. 1–420. ISBN   9780367894368 . Retrieved 7 July 2022.
  6. Davidson-Pilon, Cameron (2015-09-30). Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference. Addison-Wesley Professional. ISBN   9780133902921.
  7. "documentation" . Retrieved 2017-09-20.
  8. "The Algorithms Behind Probabilistic Programming" . Retrieved 2017-03-10.
  9. "NumFOCUS Announces New Fiscally Sponsored Project: PyMC3". NumFOCUS | Open Code = Better Science. Retrieved 2017-03-10.
  10. Greiner, J.; Burgess, J. M.; Savchenko, V.; Yu, H.-F. (2016). "On the Fermi-GBM Event 0.4 s after GW150914". The Astrophysical Journal Letters. 827 (2): L38. arXiv: 1606.00314 . Bibcode:2016ApJ...827L..38G. doi: 10.3847/2041-8205/827/2/L38 . ISSN   2041-8205. S2CID   3529170.
  11. Hilbe, Joseph M.; Souza, Rafael S. de; Ishida, Emille E. O. (2017-04-30). Bayesian Models for Astrophysical Data: Using R, JAGS, Python, and Stan. Cambridge University Press. ISBN   9781108210744.
  12. Brauner, Jan M.; Mindermann, Sören; Sharma, Mrinank; Johnston, David; Salvatier, John; Gavenčiak, Tom; Stephenson, Anna B.; Leech, Gavin; Altman, George; Mikulik, Vladimir; Norman, Alexander John; Monrad, Joshua Teperowski; Besiroglu, Tamay; Ge, Hong; Hartwick, Meghan A.; Teh, Yee Whye; Chindelevitch, Leonid; Gal, Yarin; Kulveit, Jan (2020-12-15). "Inferring the effectiveness of government interventions against COVID-19". Science. 371 (6531): eabd9338. doi: 10.1126/science.abd9338 . PMC   7877495 . PMID   33323424.
  13. Systrom, Kevin; Vladek, Thomas; Krieger, Mike. "Rt.live Github repository". Rt.live. Retrieved 10 January 2021.
  14. Wagner, Stacey D.; Struck, Adam J.; Gupta, Riti; Farnsworth, Dylan R.; Mahady, Amy E.; Eichinger, Katy; Thornton, Charles A.; Wang, Eric T.; Berglund, J. Andrew (2016-09-28). "Dose-Dependent Regulation of Alternative Splicing by MBNL Proteins Reveals Biomarkers for Myotonic Dystrophy". PLOS Genetics. 12 (9): e1006316. doi: 10.1371/journal.pgen.1006316 . ISSN   1553-7404. PMC   5082313 . PMID   27681373.
  15. Sharma, Amit; Johansson, Linda; Dunevall, Elin; Wahlgren, Weixiao Y.; Neutze, Richard; Katona, Gergely (2017-03-01). "Asymmetry in serial femtosecond crystallography data". Acta Crystallographica Section A. 73 (2): 93–101. Bibcode:2017AcCry..73...93S. doi:10.1107/s2053273316018696. ISSN   2053-2733. PMC   5332129 . PMID   28248658.
  16. Katona, Gergely; Garcia-Bonete, Maria-Jose; Lundholm, Ida (2016-05-01). "Estimating the difference between structure-factor amplitudes using multivariate Bayesian inference". Acta Crystallographica Section A. 72 (3): 406–411. Bibcode:2016AcCry..72..406K. doi:10.1107/S2053273316003430. ISSN   2053-2733. PMC   4850660 . PMID   27126118.
  17. Garay, Pablo G.; Martin, Osvaldo A.; Scheraga, Harold A.; Vila, Jorge A. (2016-07-21). "Detection of methylation, acetylation and glycosylation of protein residues by monitoring13C chemical-shift changes: A quantum-chemical study". PeerJ. 4: e2253. doi: 10.7717/peerj.2253 . ISSN   2167-8359. PMC   4963218 . PMID   27547559.
  18. Wang, Yan; Huang, Hong; Huang, Lida; Ristic, Branko (2017). "Evaluation of Bayesian source estimation methods with Prairie Grass observations and Gaussian plume model: A comparison of likelihood functions and distance measures". Atmospheric Environment. 152: 519–530. Bibcode:2017AtmEn.152..519W. doi:10.1016/j.atmosenv.2017.01.014.
  19. MacNeil, M. Aaron; Chong-Seng, Karen M.; Pratchett, Deborah J.; Thompson, Casssandra A.; Messmer, Vanessa; Pratchett, Morgan S. (2017-03-14). "Age and Growth of An Outbreaking Acanthaster cf. solaris Population within the Great Barrier Reef" (PDF). Diversity. 9 (1): 18. doi: 10.3390/d9010018 .
  20. Tünnermann, Jan; Scharlau, Ingrid (2016). "Peripheral Visual Cues: Their Fate in Processing and Effects on Attention and Temporal-Order Perception". Frontiers in Psychology. 7: 1442. doi: 10.3389/fpsyg.2016.01442 . ISSN   1664-1078. PMC   5052275 . PMID   27766086.
  21. Graham, Nicholas A. J.; Jennings, Simon; MacNeil, M. Aaron; Mouillot, David; Wilson, Shaun K. (2015). "Predicting climate-driven regime shifts versus rebound potential in coral reefs". Nature. 518 (7537): 94–97. Bibcode:2015Natur.518...94G. doi:10.1038/nature14140. PMID   25607371. S2CID   4453338.
  22. Mascarenhas, Maya N.; Flaxman, Seth R.; Boerma, Ties; Vanderpoel, Sheryl; Stevens, Gretchen A. (2012-12-18). "National, Regional, and Global Trends in Infertility Prevalence Since 1990: A Systematic Analysis of 277 Health Surveys". PLOS Medicine. 9 (12): e1001356. doi: 10.1371/journal.pmed.1001356 . ISSN   1549-1676. PMC   3525527 . PMID   23271957.
  23. Cavanagh, James F; Wiecki, Thomas V; Cohen, Michael X; Figueroa, Christina M; Samanta, Johan; Sherman, Scott J; Frank, Michael J (2011). "Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold". Nature Neuroscience. 14 (11): 1462–1467. doi:10.1038/nn.2925. PMC   3394226 . PMID   21946325.
  24. Gething, Peter W.; Elyazar, Iqbal R. F.; Moyes, Catherine L.; Smith, David L.; Battle, Katherine E.; Guerra, Carlos A.; Patil, Anand P.; Tatem, Andrew J.; Howes, Rosalind E. (2012-09-06). "A Long Neglected World Malaria Map: Plasmodium vivax Endemicity in 2010". PLOS Neglected Tropical Diseases. 6 (9): e1814. Bibcode:2012PNTDi...6.1814G. doi: 10.1371/journal.pntd.0001814 . ISSN   1935-2735. PMC   3435256 . PMID   22970336.
  25. Pullan, Rachel L.; Smith, Jennifer L.; Jasrasaria, Rashmi; Brooker, Simon J. (2014-01-21). "Global numbers of infection and disease burden of soil transmitted helminth infections in 2010". Parasites & Vectors. 7: 37. Bibcode:2014PVec....7...37P. doi: 10.1186/1756-3305-7-37 . ISSN   1756-3305. PMC   3905661 . PMID   24447578.
  26. Lamblin, Pascal (28 September 2017). "MILA and the future of Theano". theano-users (Mailing list). Retrieved 28 September 2017.
  27. Developers, PyMC (2018-05-17). "Theano, TensorFlow and the Future of PyMC". PyMC Developers. Retrieved 2019-01-25.
  28. "The Future of PyMC3, or: Theano is Dead, Long Live Theano". PyMC Developers. 27 October 2020. Retrieved 10 January 2021.
  29. Bradbury, James; Frostig, Roy; Hawkins, Peter; James, Matthew James; Leary, Chris; Maclaurin, Dougal; Necula, George; Paszke, Adam; VanderPlas, Jake; Wanderman-Milne, Skye; Zhang, Qiao. "JAX". GitHub . Retrieved 10 January 2021.
  30. "PyMC Timeline". PyMC Timeline. Retrieved 10 January 2021.
  31. Hoffman, Matthew D.; Gelman, Andrew (April 2014). "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo". Journal of Machine Learning Research . 15: pp. 1593–1623.
  32. Kucukelbir, Alp; Ranganath, Rajesh; Blei, David M. (June 2015). "Automatic Variational Inference in Stan". 1506 (3431). arXiv: 1506.03431 . Bibcode:2015arXiv150603431K.{{cite journal}}: Cite journal requires |journal= (help)

Further reading