Radford M. Neal

Radford M. Neal
Radford M. Neal
Born	September 12, 1956 (age 67)
Citizenship	Canadian
Education	University of Calgary ; University of Toronto
	Scientific career
Fields	Statistics, Machine Learning, Artificial Intelligence
Institutions	University of Toronto
Thesis	Bayesian Learning for Neural Networks (1995)
Doctoral advisor	Geoffrey Hinton
Other academic advisors	David Hill
Website	www.cs.utoronto.ca/~radford/

Last updated January 06, 2024

Radford M. Neal is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he holds a research chair in statistics and machine learning.

Education and career

Neal studied computer science at the University of Calgary, where he received his B.Sc. in 1977 and M.Sc. in 1980, with thesis work supervised by David Hill. He worked for several years as a sessional instructor at the University of Calgary and as a statistical consultant in the industry before coming back to the academia. Neal continued his study at the University of Toronto, where he received his Ph.D. in 1995 under the supervision of Geoffrey Hinton.^[2] Neal became an assistant professor at the University of Toronto in 1995, an associated professor in 1999 and a full professor since 2001. He was the Canada Research Chair in Statistics and Machine Learning from 2003 to 2016 and retired in 2017.

Neal has made great contributions in the area of machine learning and statistics, where he is particularly well known for his work on Markov chain Monte Carlo,^[3]^[4] error correcting codes ^[5] and Bayesian learning for neural networks.^[6] He is also known for his blog^[7] and as the developer of pqR: a new version of the R interpreter.^[8]

Bibliography

Books and chapters

Neal, Radford M. (1996). Bayesian learning for neural networks. New York: Springer. ISBN 0-387-94724-8. OCLC 34894370.
Neal, Radford M. (2011-05-10). Brooks, Steve; Gelman, Andrew; Jones, Galin; Meng, Xiao-Li (eds.). MCMC using Hamiltonian dynamics. arXiv: 1206.1901 . Bibcode:2011hmcm.book..113N. doi:10.1201/b10905. ISBN 9780429138508. S2CID 1048042.

Selected papers

Witten, Ian H.; Neal, Radford M.; Cleary, John G. (1987). "Arithmetic coding for data compression". Communications of the ACM. 30 (6): 520–540. doi: 10.1145/214762.214771 . ISSN 0001-0782. S2CID 3343393.
Hinton, Geoffrey E.; Dayan, Peter; Frey, Brendan J.; Neal, Radford M. (1995-05-26). "The "Wake-Sleep" Algorithm for Unsupervised Neural Networks". Science. 268 (5214): 1158–1161. Bibcode:1995Sci...268.1158H. doi:10.1126/science.7761831. ISSN 0036-8075. PMID 7761831. S2CID 871473.
Dayan, Peter; Hinton, Geoffrey E.; Neal, Radford M.; Zemel, Richard S. (1995). "The Helmholtz Machine". Neural Computation. 7 (5): 889–904. doi:10.1162/neco.1995.7.5.889. ISSN 0899-7667. PMID 7584891. S2CID 1890561.
Neal, Radford M. (2000). "Markov Chain Sampling Methods for Dirichlet Process Mixture Models". Journal of Computational and Graphical Statistics. 9 (2): 249–265. doi:10.2307/1390653. ISSN 1061-8600. JSTOR 1390653.
Neal, Radford M. (2001). "Annealed importance sampling". Statistics and Computing. 11 (2): 125–139. doi:10.1023/A:1008923215028. S2CID 11112994.
Neal, Radford M. (2003-06-01). "Slice sampling". The Annals of Statistics. 31 (3). doi: 10.1214/aos/1056562461 . ISSN 0090-5364.
Jain, Sonia; Neal, Radford M. (2007-09-01). "Splitting and merging components of a nonconjugate Dirichlet process mixture model". Bayesian Analysis. 2 (3). doi: 10.1214/07-BA219 . ISSN 1936-0975.
Shahbaba, Babak; Lan, Shiwei; Johnson, Wesley O.; Neal, Radford M. (2014). "Split Hamiltonian Monte Carlo". Statistics and Computing. 24 (3): 339–349. arXiv: 1106.5941 . doi:10.1007/s11222-012-9373-1. ISSN 0960-3174. S2CID 255067283.

Related Research Articles

A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent Markov process. An HMM requires that there be an observable process $whose outcomes depend on the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about state of by observing By definition of being a Markov model, an HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time Estimation of the parameters in an HMM can be performed using maximum likelihood. For linear chain HMMs, the Baum-Welch algorithm can be used to estimate the parameters.$

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the Metropolis–Hastings algorithm.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation that views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

The Helmholtz machine is a type of artificial neural network that can account for the hidden structure of a set of data by being trained to create a generative model of the original set of data. The hope is that by learning economical representations of the data, the underlying structure of the generative model should reasonably approximate the hidden structure of the data set. A Helmholtz machine contains two networks, a bottom-up recognition network that takes the data as input and produces a distribution over hidden variables, and a top-down "generative" network that generates values of the hidden variables and the data itself. At the time, Helmholtz machines were one of a handful of learning architectures that used feedback as well as feedforward to ensure quality of learned models.

Peter Dayan is a British neuroscientist and computer scientist who is director at the Max Planck Institute for Biological Cybernetics in Tübingen, Germany, along with Ivan De Araujo. He is co-author of Theoretical Neuroscience, an influential textbook on computational neuroscience. He is known for applying Bayesian methods from machine learning and artificial intelligence to understand neural function and is particularly recognized for relating neurotransmitter levels to prediction errors and Bayesian uncertainties. He has pioneered the field of reinforcement learning (RL) where he helped develop the Q-learning algorithm, and made contributions to unsupervised learning, including the wake-sleep algorithm for neural networks and the Helmholtz machine.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

WinBUGS is statistical software for Bayesian analysis using Markov chain Monte Carlo (MCMC) methods.

Just another Gibbs sampler (JAGS) is a program for simulation from Bayesian hierarchical models using Markov chain Monte Carlo (MCMC), developed by Martyn Plummer. JAGS has been employed for statistical work in many fields, for example ecology, management, and genetics.

The Hamiltonian Monte Carlo algorithm is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for which direct sampling is difficult. This sequence can be used to estimate integrals with respect to the target distribution.

<span class="mw-page-title-main">Wake-sleep algorithm</span> Unsupervised learning algorithm

The wake-sleep algorithm is an unsupervised learning algorithm for deep generative models, especially Helmholtz Machines. The algorithm is similar to the expectation-maximization algorithm, and optimizes the model likelihood for observed data. The name of the algorithm derives from its use of two learning phases, the “wake” phase and the “sleep” phase, which are performed alternately. It can be conceived as a model for learning in the brain, but is also being applied for machine learning.

Stuart Alan Geman is an American mathematician, known for influential contributions to computer vision, statistics, probability theory, machine learning, and the neurosciences. He and his brother, Donald Geman, are well known for proposing the Gibbs sampler, and for the first proof of convergence of the simulated annealing algorithm.

Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.

PyMC is a probabilistic programming language written in Python. It can be used for Bayesian statistical modeling and probabilistic machine learning.

Yee-Whye Teh is a professor of statistical machine learning in the Department of Statistics, University of Oxford. Prior to 2012 he was a reader at the Gatsby Charitable Foundation computational neuroscience unit at University College London. His work is primarily in machine learning, artificial intelligence, statistics and computer science.

Éric Moulines is a French researcher in statistical learning and signal processing. He received the silver medal from the CNRS in 2010, the France Télécom prize awarded in collaboration with the French Academy of Sciences in 2011. He was appointed a Fellow of the European Association for Signal Processing in 2012 and of the Institute of Mathematical Statistics in 2016. He is General Engineer of the Corps des Mines (X81).

ArviZ is a Python package for exploratory analysis of Bayesian models.

An energy-based model (EBM) is a form of generative model (GM) imported directly from statistical physics to learning.

Siddhartha Chib is an econometrician and statistician, the Harry C. Hartkopf Professor of Econometrics and Statistics at Washington University in St. Louis. His work is primarily in Bayesian statistics, econometrics, and Markov chain Monte Carlo methods.

References

↑ "Radford M. Neal Curriculum Vitae" (PDF). User radford at cs.utoronto.ca. Retrieved 4 May 2015.
↑ Neal, Radford M. (2022-05-31). "Curriculum Vitae" (PDF).
↑ Neal, Radford (1993). Probabilistic Inference Using Markov Chain Monte Carlo Methods (PDF) (Report). Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto. p. 144. Retrieved 9 May 2015.
↑ Neal, Radford M (2011). "MCMC Using Hamiltonian Dynamics" (PDF). In Steve Brooks; Andrew Gelman; Galin L. Jones; Xiao-Li Meng (eds.). Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC. ISBN 978-0470177938.
↑ MacKay, D. J. C.; Neal, R. M. (1996). "Near Shannon limit performance of low density parity check codes". Electronics Letters. 32 (18): 1645. Bibcode:1996ElL....32.1645M. doi:10.1049/el:19961141.
↑ Neal, R. M. (1996). Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Vol. 118. doi:10.1007/978-1-4612-0745-0. ISBN 978-0-387-94724-2.
↑ "Radford Neal's blog" . Retrieved 9 May 2015.
↑ "pqR - a pretty quick version of R" . Retrieved 9 May 2015.

This article about a Canadian scientist is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Radford M. Neal Curriculum Vitae" (PDF). User radford at cs.utoronto.ca. Retrieved 4 May 2015.

[2] Neal, Radford M. (2022-05-31). "Curriculum Vitae" (PDF).

[3] Neal, Radford (1993). Probabilistic Inference Using Markov Chain Monte Carlo Methods (PDF) (Report). Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto. p. 144. Retrieved 9 May 2015.

[4] Neal, Radford M (2011). "MCMC Using Hamiltonian Dynamics" (PDF). In Steve Brooks; Andrew Gelman; Galin L. Jones; Xiao-Li Meng (eds.). Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC. ISBN 978-0470177938.

[5] MacKay, D. J. C.; Neal, R. M. (1996). "Near Shannon limit performance of low density parity check codes". Electronics Letters. 32 (18): 1645. Bibcode:1996ElL....32.1645M. doi:10.1049/el:19961141.

[6] Neal, R. M. (1996). Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Vol. 118. doi:10.1007/978-1-4612-0745-0. ISBN 978-0-387-94724-2.

[7] "Radford Neal's blog" . Retrieved 9 May 2015.

[8] "pqR - a pretty quick version of R" . Retrieved 9 May 2015.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Authority control databases
International	ISNI VIAF
National	Norway France BnF data Israel United States Netherlands
Academics	DBLP Google Scholar MathSciNet Mathematics Genealogy Project ORCID Scopus zbMATH
Other	IdRef

Radford M. Neal
Born	(1956-09-12) September 12, 1956 (age 67)^[1]
Citizenship	Canadian
Education	University of Calgary University of Toronto
Scientific career
Fields	Statistics, Machine Learning, Artificial Intelligence
Institutions	University of Toronto
Thesis	Bayesian Learning for Neural Networks (1995)
Doctoral advisor	Geoffrey Hinton
Other academic advisors	David Hill

Website	www.cs.utoronto.ca/~radford/