Hamiltonian Monte Carlo

Last updated January 11, 2025

The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples whose distribution converges to a target probability distribution that is difficult to sample directly. This sequence can be used to estimate integrals of the target distribution, such as expected values and moments.

Hamiltonian Monte Carlo corresponds to an instance of the Metropolis–Hastings algorithm, with a Hamiltonian dynamics evolution simulated using a time-reversible and volume-preserving numerical integrator (typically the leapfrog integrator) to propose a move to a new point in the state space. Compared to using a Gaussian random walk proposal distribution in the Metropolis–Hastings algorithm, Hamiltonian Monte Carlo reduces the correlation between successive sampled states by proposing moves to distant states which maintain a high probability of acceptance due to the approximate energy conserving properties of the simulated Hamiltonian dynamic when using a symplectic integrator. The reduced correlation means fewer Markov chain samples are needed to approximate integrals with respect to the target probability distribution for a given Monte Carlo error.

The algorithm was originally proposed by Simon Duane, Anthony Kennedy, Brian Pendleton and Duncan Roweth in 1987 for calculations in lattice quantum chromodynamics.^[1] In 1996, Radford M. Neal showed how the method could be used for a broader class of statistical problems, in particular artificial neural networks.^[2] However, the burden of having to provide gradients of the Bayesian network delayed the wider adoption of the algorithm in statistics and other quantitative disciplines, until in the mid-2010s the developers of Stan implemented HMC in combination with automatic differentiation.^[3]

Algorithm

Suppose the target distribution to sample is $f(\mathbf {x} )$ for $\mathbf {x} \in \mathbb {R} ^{d}$ ( $d\geq 1$ ) and a chain of samples $\mathbf {X} _{0},\mathbf {X} _{1},\mathbf {X} _{2},\ldots$ is required.

The Hamilton's equations are

{\frac {{\text{d}}x_{i}}{{\text{d}}t}}={\frac {\partial H}{\partial p_{i}}}\quad {\text{and}}\quad {\dfrac {{\text{d}}p_{i}}{{\text{d}}t}}=-{\dfrac {\partial H}{\partial x_{i}}}

where $x_{i}$ and $p_{i}$ are the $i$ th component of the position and momentum vector respectively and $H$ is the Hamiltonian. Let $M$ be a mass matrix which is symmetric and positive definite, then the Hamiltonian is

H(\mathbf {x} ,\mathbf {p} )=U(\mathbf {x} )+{\dfrac {1}{2}}\mathbf {p} ^{\text{T}}M^{-1}\mathbf {p}

where $U(\mathbf {x} )$ is the potential energy. The potential energy for a target is given as

U(\mathbf {x} )=-\ln f(\mathbf {x} )

which comes from the Boltzmann's factor. Note that the Hamiltonian $H$ is dimensionless in this formulation because the exponential probability weight $\exp \left(-H\right)$ has to be well defined. For example, in simulations at finite temperature $T$ the factor $k_{\text{B}}T$ (with the Boltzmann constant $k_{\text{B}}$ ) is directly absorbed into $U$ and $M$ .

The algorithm requires a positive integer for number of leapfrog steps $L$ and a positive number for the step size $\Delta t$ . Suppose the chain is at $\mathbf {X} _{n}=\mathbf {x} _{n}$ . Let $\mathbf {x} _{n}(0)=\mathbf {x} _{n}$ . First, a random Gaussian momentum $\mathbf {p} _{n}(0)$ is drawn from ${\text{N}}\left(\mathbf {0} ,M\right)$ . Next, the particle will run under Hamiltonian dynamics for time $L\Delta t$ , this is done by solving the Hamilton's equations numerically using the leapfrog algorithm. The position and momentum vectors after time $\Delta t$ using the leapfrog algorithm are:^[4]

\mathbf {p} _{n}\left(t+{\dfrac {\Delta t}{2}}\right)=\mathbf {p} _{n}(t)-{\dfrac {\Delta t}{2}}\nabla \left.U(\mathbf {x} )\right|_{\mathbf {x} =\mathbf {x} _{n}(t)}

\mathbf {x} _{n}(t+\Delta t)=\mathbf {x} _{n}(t)+\Delta tM^{-1}\mathbf {p} _{n}\left(t+{\dfrac {\Delta t}{2}}\right)

\mathbf {p} _{n}(t+\Delta t)=\mathbf {p} _{n}\left(t+{\dfrac {\Delta t}{2}}\right)-{\dfrac {\Delta t}{2}}\nabla \left.U(\mathbf {x} )\right|_{\mathbf {x} =\mathbf {x} _{n}(t+\Delta t)}

These equations are to be applied to $\mathbf {x} _{n}(0)$ and $\mathbf {p} _{n}(0)$ $L$ times to obtain $\mathbf {x} _{n}(L\Delta t)$ and $\mathbf {p} _{n}(L\Delta t)$ .

The leapfrog algorithm is an approximate solution to the motion of non-interacting classical particles. If exact, the solution will never change the initial randomly-generated energy distribution, as energy is conserved for each particle in the presence of a classical potential energy field. In order to reach a thermodynamic equilibrium distribution, particles must have some sort of interaction with, for example, a surrounding heat bath, so that the entire system can take on different energies with probabilities according to the Boltzmann distribution.

One way to move the system towards a thermodynamic equilibrium distribution is to change the state of the particles using the Metropolis–Hastings algorithm. So first, one applies the leapfrog step, then a Metropolis-Hastings step.

The transition from $\mathbf {X} _{n}=\mathbf {x} _{n}$ to $\mathbf {X} _{n+1}$ is

\mathbf {X} _{n+1}|\mathbf {X} _{n}=\mathbf {x} _{n}={\begin{cases}\mathbf {x} _{n}(L\Delta t)&{\text{with probability }}\alpha \left(\mathbf {x} _{n}(0),\mathbf {x} _{n}(L\Delta t)\right)\\\mathbf {x} _{n}(0)&{\text{otherwise}}\end{cases}}

where

\alpha \left(\mathbf {x} _{n}(0),\mathbf {x} _{n}(L\Delta t)\right)={\text{min}}\left(1,{\dfrac {\exp \left[-H(\mathbf {x} _{n}(L\Delta t),\mathbf {p} _{n}(L\Delta t))\right]}{\exp \left[-H(\mathbf {x} _{n}(0),\mathbf {p} _{n}(0))\right]}}\right).

A full update consists of first randomly sampling the momenta $\mathbf {p}$ (independently of the previous iterations), then integrating the equations of motion (e.g. with leapfrog), and finally obtaining the new configuration from the Metropolis-Hastings accept/reject step. This updating mechanism is repeated to obtain $\mathbf {X} _{n+1},\mathbf {X} _{n+2},\mathbf {X} _{n+3},\ldots$ .

No U-Turn Sampler

The No U-Turn Sampler (NUTS)^[5] is an extension by controlling the number of steps $L$ automatically. Tuning $L$ is critical. For example, in the one dimensional ${\text{N}}(0,1/{\sqrt {k}})$ case, the potential is $U(x)=kx^{2}/2$ which corresponds to the potential of a simple harmonic oscillator. For $L$ too large, the particle will oscillate and thus waste computational time. For $L$ too small, the particle will behave like a random walk.

Loosely, NUTS runs the Hamiltonian dynamics both forwards and backwards in time randomly until a U-Turn condition is satisfied. When that happens, a random point from the path is chosen for the MCMC sample and the process is repeated from that new point.

In detail, a binary tree is constructed to trace the path of the leap frog steps. To produce a MCMC sample, an iterative procedure is conducted. A slice variable $U_{n}\sim {\text{Uniform}}(0,\exp(-H[\mathbf {x} _{n}(0),\mathbf {p} _{n}(0)]))$ is sampled. Let $\mathbf {x} _{n}^{+}$ and $\mathbf {p} _{n}^{+}$ be the position and momentum of the forward particle respectively. Similarly, $\mathbf {x} _{n}^{-}$ and $\mathbf {p} _{n}^{-}$ for the backward particle. In each iteration, the binary tree selects at random uniformly to move the forward particle forwards in time or the backward particle backwards in time. Also for each iteration, the number of leap frog steps increase by a factor of 2. For example, in the first iteration, the forward particle moves forwards in time using 1 leap frog step. In the next iteration, the backward particle moves backwards in time using 2 leap frog steps.

The iterative procedure continues until the U-Turn condition is met, that is

(\mathbf {x} _{n}^{+}-\mathbf {x} _{n}^{-})\cdot \mathbf {p} _{n}^{-}<0\quad {\text{or}}\quad .(\mathbf {x} _{n}^{+}-\mathbf {x} _{n}^{-})\cdot \mathbf {p} _{n}^{+}<0

or when the Hamiltonian becomes inaccurate

\exp \left[-H(\mathbf {x} _{n}^{+},\mathbf {p} _{n}^{+})+\delta \right]<U_{n}

or

\exp \left[-H(\mathbf {x} _{n}^{-},\mathbf {p} _{n}^{-})+\delta \right]<U_{n}

where, for example, $\delta =1000$ .

Once the U-Turn condition is met, the next MCMC sample, $\mathbf {x} _{n+1}$ , is obtained by sampling uniformly the leap frog path traced out by the binary tree $\{\mathbf {x} _{n}^{-},\ldots ,\mathbf {x} _{n}(-\Delta t),\mathbf {x} _{n}(0),\mathbf {x} _{n}(\Delta t),\ldots ,\mathbf {x} _{n}^{+}\}$ which satisfies

U_{n}<\exp \left[-H(\mathbf {x_{n+1}} ,\mathbf {p_{n+1})} \right]

This is usually satisfied if the remaining HMC parameters are sensible.

Related Research Articles

<span class="mw-page-title-main">Heat equation</span> Partial differential equation describing the evolution of temperature in a region

In mathematics and physics, the heat equation is a certain partial differential equation. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quantity such as heat diffuses through a given region. Since then, the heat equation and its variants have been found to be fundamental in many parts of both pure and applied mathematics.

In physics, a Langevin equation is a stochastic differential equation describing how a system evolves when subjected to a combination of deterministic and fluctuating ("random") forces. The dependent variables in a Langevin equation typically are collective (macroscopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. One application is to Brownian motion, which models the fluctuating motion of a small particle in a fluid.

In thermodynamics, the Helmholtz free energy is a thermodynamic potential that measures the useful work obtainable from a closed thermodynamic system at a constant temperature (isothermal). The change in the Helmholtz energy during a process is equal to the maximum amount of work that the system can perform in a thermodynamic process in which temperature is held constant. At constant temperature, the Helmholtz free energy is minimized at equilibrium.

<span class="mw-page-title-main">Path integral formulation</span> Formulation of quantum mechanics

The path integral formulation is a description in quantum mechanics that generalizes the stationary action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

In atomic physics, hyperfine structure is defined by small shifts in otherwise degenerate electronic energy levels and the resulting splittings in those electronic energy levels of atoms, molecules, and ions, due to electromagnetic multipole interaction between the nucleus and electron clouds.

In numerical analysis, stochastic tunneling (STUN) is an approach to global optimization based on the Monte Carlo method-sampling of the function to be objective minimized in which the function is nonlinearly transformed to allow for easier tunneling among regions containing function minima. Easier tunneling allows for faster exploration of sample space and faster convergence to a good solution.

In plasma physics, the particle-in-cell (PIC) method refers to a technique used to solve a certain class of partial differential equations. In this method, individual particles in a Lagrangian frame are tracked in continuous phase space, whereas moments of the distribution such as densities and currents are computed simultaneously on Eulerian (stationary) mesh points.

In physics, the S-matrix or scattering matrix is a matrix which relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

Verlet integration is a numerical method used to integrate Newton's equations of motion. It is frequently used to calculate trajectories of particles in molecular dynamics simulations and computer graphics. The algorithm was first used in 1791 by Jean Baptiste Delambre and has been rediscovered many times since then, most recently by Loup Verlet in the 1960s for use in molecular dynamics. It was also used by P. H. Cowell and A. C. C. Crommelin in 1909 to compute the orbit of Halley's Comet, and by Carl Størmer in 1907 to study the trajectories of electrical particles in a magnetic field . The Verlet integrator provides good numerical stability, as well as other properties that are important in physical systems such as time reversibility and preservation of the symplectic form on phase space, at no significant additional computational cost over the simple Euler method.

In physics, the Hamilton–Jacobi equation, named after William Rowan Hamilton and Carl Gustav Jacob Jacobi, is an alternative formulation of classical mechanics, equivalent to other formulations such as Newton's laws of motion, Lagrangian mechanics and Hamiltonian mechanics.

<span class="mw-page-title-main">Monte Carlo integration</span> Numerical technique

In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand at a regular grid, Monte Carlo randomly chooses points at which the integrand is evaluated. This method is particularly useful for higher-dimensional integrals.

The isothermal–isobaric ensemble is a statistical mechanical ensemble that maintains constant temperature $and constant pressure applied. It is also called the -ensemble, where the number of particles is also kept as a constant. This ensemble plays an important role in chemistry as chemical reactions are usually carried out under constant pressure condition. The NPT ensemble is also useful for measuring the equation of state of model systems whose virial expansion for pressure cannot be evaluated, or systems near first-order phase transitions.$

In condensed matter physics and crystallography, the static structure factor is a mathematical description of how a material scatters incident radiation. The structure factor is a critical tool in the interpretation of scattering patterns obtained in X-ray, electron and neutron diffraction experiments.

The Gross–Pitaevskii equation describes the ground state of a quantum system of identical bosons using the Hartree–Fock approximation and the pseudopotential interaction model.

In thermodynamics, the excess chemical potential is defined as the difference between the chemical potential of a given species and that of an ideal gas under the same conditions . The chemical potential of a particle species $is therefore given by an ideal part and an excess part.$

The Swendsen–Wang algorithm is the first non-local or cluster algorithm for Monte Carlo simulation for large systems near criticality. It has been introduced by Robert Swendsen and Jian-Sheng Wang in 1987 at Carnegie Mellon.

In statistical mechanics, the mean squared displacement is a measure of the deviation of the position of a particle with respect to a reference position over time. It is the most common measure of the spatial extent of random motion, and can be thought of as measuring the portion of the system "explored" by the random walker. In the realm of biophysics and environmental engineering, the Mean Squared Displacement is measured over time to determine if a particle is spreading slowly due to diffusion, or if an advective force is also contributing. Another relevant concept, the variance-related diameter, is also used in studying the transportation and mixing phenomena in the realm of environmental engineering. It prominently appears in the Debye–Waller factor and in the Langevin equation.

Symmetries in quantum mechanics describe features of spacetime and particles which are unchanged under some transformation, in the context of quantum mechanics, relativistic quantum mechanics and quantum field theory, and with applications in the mathematical formulation of the standard model and condensed matter physics. In general, symmetry in physics, invariance, and conservation laws, are fundamentally important constraints for formulating physical theories and models. In practice, they are powerful methods for solving problems and predicting what can happen. While conservation laws do not always give the answer to the problem directly, they form the correct constraints and the first steps to solving a multitude of problems. In application, understanding symmetries can also provide insights on the eigenstates that can be expected. For example, the existence of degenerate states can be inferred by the presence of non commuting symmetry operators or that the non degenerate states are also eigenvectors of symmetry operators.

In quantum mechanics, a translation operator is defined as an operator which shifts particles and fields by a certain amount in a certain direction. It is a special case of the shift operator from functional analysis.

Stochastic gradient Langevin dynamics (SGLD) is an optimization and sampling technique composed of characteristics from Stochastic gradient descent, a Robbins–Monro optimization algorithm, and Langevin dynamics, a mathematical extension of molecular dynamics models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a differentiable objective function. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems.

References

↑ Duane, Simon; Kennedy, Anthony D.; Pendleton, Brian J.; Roweth, Duncan (1987). "Hybrid Monte Carlo". Physics Letters B. 195 (2): 216–222. Bibcode:1987PhLB..195..216D. doi:10.1016/0370-2693(87)91197-X.
↑ Neal, Radford M. (1996). "Monte Carlo Implementation". Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Vol. 118. Springer. pp. 55–98. doi:10.1007/978-1-4612-0745-0_3. ISBN 0-387-94724-8.
↑ Gelman, Andrew; Lee, Daniel; Guo, Jiqiang (2015). "Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization". Journal of Educational and Behavioral Statistics. 40 (5): 530–543. doi:10.3102/1076998615606113. S2CID 18351694.
↑ Betancourt, Michael (2018-07-15). "A Conceptual Introduction to Hamiltonian Monte Carlo". arXiv: 1701.02434 [stat.ME].
↑ Hoffman, Matthew D; Gelman, Andrew (2014). "The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo". Journal of Machine Learning Research. 15 (1): 1593–1623. Retrieved 2024-03-28.

External links

Betancourt, Michael. "Efficient Bayesian inference with Hamiltonian Monte Carlo". MLSS Iceland 2014– via YouTube.
McElreath, Richard. "Markov chain Monte Carlo". Statistical Rethinking 2022– via YouTube.
Hamiltonian Monte Carlo from scratch
Optimization and Monte Carlo Methods

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Duane, Simon; Kennedy, Anthony D.; Pendleton, Brian J.; Roweth, Duncan (1987). "Hybrid Monte Carlo". Physics Letters B. 195 (2): 216–222. Bibcode:1987PhLB..195..216D. doi:10.1016/0370-2693(87)91197-X.

[2] Neal, Radford M. (1996). "Monte Carlo Implementation". Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Vol. 118. Springer. pp. 55–98. doi:10.1007/978-1-4612-0745-0_3. ISBN 0-387-94724-8.

[3] Gelman, Andrew; Lee, Daniel; Guo, Jiqiang (2015). "Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization". Journal of Educational and Behavioral Statistics. 40 (5): 530–543. doi:10.3102/1076998615606113. S2CID 18351694.

[4] Betancourt, Michael (2018-07-15). "A Conceptual Introduction to Hamiltonian Monte Carlo". arXiv: 1701.02434 [stat.ME].

[5] Hoffman, Matthew D; Gelman, Andrew (2014). "The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo". Journal of Machine Learning Research. 15 (1): 1593–1623. Retrieved 2024-03-28.

[1]

[2]

[3]

[4]

[5]

Hamiltonian Monte Carlo

Contents

Algorithm

No U-Turn Sampler

See also

Related Research Articles

References

Further reading

External links