Multicanonical ensemble

Last updated June 15, 2023 • 7 min readFrom Wikipedia, The Free Encyclopedia

In statistics and physics, multicanonical ensemble (also called multicanonical sampling or flat histogram) is a Markov chain Monte Carlo sampling technique that uses the Metropolis–Hastings algorithm to compute integrals where the integrand has a rough landscape with multiple local minima. It samples states according to the inverse of the density of states,^[1] which has to be known a priori or be computed using other techniques like the Wang and Landau algorithm.^[2] Multicanonical sampling is an important technique for spin systems like the Ising model or spin glasses.^[1]^[3]^[4]

Motivation

In systems with a large number of degrees of freedom, like spin systems, Monte Carlo integration is required. In this integration, importance sampling and in particular the Metropolis algorithm, is a very important technique.^[3] However, the Metropolis algorithm samples states according to $\exp(-\beta E)$ where beta is the inverse of the temperature. This means that an energy barrier of $\Delta E$ on the energy spectrum is exponentially difficult to overcome.^[1] Systems with multiple local energy minima like the Potts model become hard to sample as the algorithm gets stuck in the system's local minima.^[3] This motivates other approaches, namely, other sampling distributions.

Overview

Multicanonical ensemble uses the Metropolis–Hastings algorithm with a sampling distribution given by the inverse of the density of states of the system, contrary to the sampling distribution $\exp(-\beta E)$ of the Metropolis algorithm.^[1] With this choice, on average, the number of states sampled at each energy is constant, i.e. it is a simulation with a "flat histogram" on energy. This leads to an algorithm for which the energy barriers are no longer difficult to overcome. Another advantage over the Metropolis algorithm is that the sampling is independent of the temperature of the system, which means that one simulation allows the estimation of thermodynamical variables for all temperatures (thus the name "multicanonical": several temperatures). This is a great improvement in the study of first order phase transitions.^[1]

The biggest problem in performing a multicanonical ensemble is that the density of states has to be known a priori.^[2]^[3] One important contribution to multicanonical sampling was the Wang and Landau algorithm, which asymptotically converges to a multicanonical ensemble while calculating the density of states during the convergence.^[2]

The multicanonical ensemble is not restricted to physical systems. It can be employed on abstract systems which have a cost function F. By using the density of states with respect to F, the method becomes general for computing higher-dimensional integrals or finding local minima.^[5]

Motivation

Consider a system and its phase-space $\Omega$ characterized by a configuration ${\boldsymbol {r}}$ in $\Omega$ and a "cost" function F from the system's phase-space to a one-dimensional space $\Gamma$ : $F(\Omega )=\Gamma =[\Gamma _{\min },\Gamma _{\max }]$ , the spectrum of F.

example:

The Ising model with N sites is an example of such a system; the phase-space is a discrete phase-space defined by all possible configurations of N spins ${\boldsymbol {r}}=(\sigma _{1},\ldots ,\sigma _{i},\ldots ,\sigma _{N})$ where $\sigma _{i}\in \{-1,1\}$ . The cost function is the Hamiltonian of the system:

H({\boldsymbol {r}})=-\sum _{\langle i~j\rangle }J_{ij}(1-\sigma _{i}\sigma _{j}),

where $<i,j>$ is the sum over neighborhoods and $J_{ij}$ is the interaction matrix.

The energy spectrum is $\Gamma =[E_{\min },E_{\max }]$ which, in this case, depends on the particular $J_{ij}$ used. If all $J_{ij}$ are 1 (the ferromagnetic Ising model), $E_{\min }=0$ (e.g. all spins are 1.) and $E_{\max }=2DN$ (half spins are up, half spins are down). Also notice that in this system, $\Gamma \in \mathbb {Z}$

The computation of an average quantity $\langle Q\rangle$ over the phase-space requires the evaluation of an integral:

\langle Q\rangle =\int _{\Omega }Q({\boldsymbol {r}})P_{r}({\boldsymbol {r}})\,d{\boldsymbol {r}}

where $P_{r}({\boldsymbol {r}})$ is the weight of each state (e.g. $P_{r}({\boldsymbol {r}})=1/V$ correspond to uniformly distributed states).

When Q does not depend on the particular state but only on the particular F's value of the state $F({\boldsymbol {r}})=F_{\boldsymbol {r}}$ , the formula for $\langle Q\rangle$ can be integrated over f by adding a dirac delta function and be written as

{\begin{aligned}\langle Q\rangle &=\int _{\Gamma _{\min }}^{\Gamma _{\max }}\int _{\Omega }Q(F_{\boldsymbol {r}})P_{r}(F_{\boldsymbol {r}})\delta (f-F_{\boldsymbol {r}})\,d{\boldsymbol {r}}\,df\\&=\int _{\Gamma _{\min }}^{\Gamma _{\max }}Q(f)\int _{\Omega }\delta (f-F_{\boldsymbol {r}})P_{r}(F_{\boldsymbol {r}})\,d{\boldsymbol {r}}\,df\\&=\int _{\Gamma _{\min }}^{\Gamma _{\max }}Q(f)P(f)\,df\\\end{aligned}}

where

P(f)=\int _{\Omega }P_{r}(r)\delta (f-F({\boldsymbol {r}}))\,d{\boldsymbol {r}}

is the marginal distribution of F.

example:

A system in contact with a heat bath at inverse temperature $\beta$ is an example for computing this kind of integral. For instance, the mean energy of the system is weighted by the Boltzmann factor:

\langle E\rangle ={\frac {1}{V}}\int _{\Omega }H_{\boldsymbol {r}}{\frac {e^{-\beta H_{\boldsymbol {r}}}}{Z}}d{\boldsymbol {r}}

where

Z={\frac {1}{V}}\int _{\Omega }e^{-\beta H_{\boldsymbol {r}}}d{\boldsymbol {r}}.

The marginal distribution $P(E)$ is given by

P(E)={\frac {1}{V}}\int _{\Omega }e^{-\beta H_{\boldsymbol {r}}}\delta (E-H_{\boldsymbol {r}})d{\boldsymbol {r}}=e^{-\beta E}{\frac {1}{V}}\int _{\Omega }\delta (E-H_{\boldsymbol {r}})d{\boldsymbol {r}}=e^{-\beta E}\rho (E)

where $\rho (E)$ is the density of states.

The average energy $\langle E\rangle$ is then given by

\langle E\rangle =\int _{E_{\min }}^{E_{\max }}EP(E)\,dE

When the system has a large number of degrees of freedom, an analytical expression for $\langle Q\rangle$ is often hard to obtain, and Monte Carlo integration is typically employed in the computation of $\langle Q\rangle$ . On the simplest formulation, the method chooses N uniformly distributed states ${\boldsymbol {r}}_{i}\in \Omega$ , and uses the estimator

{\overline {Q}}_{N}=\sum _{i=1}^{N}Q({\boldsymbol {r}}_{i})VP_{r}({\boldsymbol {r}}_{i})

for computing $\langle Q\rangle$ because ${\overline {Q}}_{N}$ converges almost surely to $\langle Q\rangle$ by the strong law of large numbers:

\lim _{N\rightarrow \infty }{\overline {Q}}_{N}=\langle Q\rangle .

One typical problem of this convergence is that the variance of Q can be very high, which leads to a high computational effort to achieve reasonable results.

example

On the previous example, the states that mostly contribute to the integral are the ones with low energy. If the states are sampled uniformly, on average, the number of states which are sampled with energy E is given by the density of states. This density of states can be centered far away from the energy's minima and thus the average can be difficult to obtain.

To improve this convergence, the Metropolis–Hastings algorithm was proposed. Generally, Monte Carlo methods' idea is to use importance sampling to improve the convergence of the estimator ${\overline {Q}}_{N}$ by sampling states according to an arbitrary distribution $\pi ({\boldsymbol {r}})$ , and use the appropriate estimator:

{\overline {Q}}_{N}=\sum _{i=1}^{N}Q({\boldsymbol {r}}_{i})\pi ^{-1}({\boldsymbol {r}}_{i})P_{r}({\boldsymbol {r}}_{i})

.

This estimator generalizes the estimator of the mean for samples drawn from an arbitrary distribution. Therefore, when $\pi ({\boldsymbol {r}})$ is a uniform distribution, it corresponds the one used on a uniform sampling above.

When the system is a physical system in contact with a heat bath, each state ${\boldsymbol {r}}$ is weighted according to the Boltzmann factor, $P_{r}({\boldsymbol {r}}_{i})\propto \exp(-\beta F_{\boldsymbol {r}})$ . In Monte Carlo, the canonical ensemble is defined by choosing $\pi ({\boldsymbol {r}})$ to be proportional to $P_{r}({\boldsymbol {r}}_{i})$ . In this situation, the estimator corresponds to a simple arithmetic average:

{\overline {Q}}_{N}={\frac {1}{N}}\sum _{i=1}^{N}Q({\boldsymbol {r}}_{i})

Historically, this occurred because the original idea ^[6] was to use Metropolis–Hastings algorithm to compute averages on a system in contact with a heat bath where the weight is given by the Boltzmann factor, $P({\boldsymbol {x}})\propto \exp(-\beta E({\boldsymbol {r}}))$ .^[3]

While it is often the case that the sampling distribution $\pi$ is chosen to be the weight distribution $P_{r}$ , this does not need to be the case. One situation where the canonical ensemble is not an efficient choice is when it takes an arbitrarily long time to converge.^[1] One situation where this happens is when the function F has multiple local minima. The computational cost for the algorithm to leave a specific region with a local minimum exponentially increases with the cost function's value of the minimum. That is, the deeper the minimum, the more time the algorithm spends there, and the harder it will be to leave (exponentially growing with the depth of the local minimum).

One way to avoid becoming stuck in local minima of the cost function is to make the sampling technique "invisible" to local minima. This is the basis of the multicanonical ensemble.

Multicanonical ensemble

The multicanonical ensemble is defined by choosing the sampling distribution to be

\pi ({\boldsymbol {r}})\propto {\frac {1}{P(F_{\boldsymbol {r}})}}

where $P(f)$ is the marginal distribution of F defined above. The consequence of this choice is that the average number of samples with a given value of f, m(f), is given by

m(f)=\int _{\Omega }\delta (f-F_{\boldsymbol {r}})\pi ({\boldsymbol {r}})P_{r}({\boldsymbol {r}})\,d{\boldsymbol {r}}\propto \int _{\Omega }\delta (f-F_{\boldsymbol {r}})P_{r}({\boldsymbol {r}}){\frac {1}{P(F_{\boldsymbol {r}})}}d{\boldsymbol {r}}={\frac {1}{P(f)}}\int _{\Omega }\delta (f-F_{\boldsymbol {r}})P_{r}({\boldsymbol {r}})d{\boldsymbol {r}}=1

that is, the average number of samples does not depend on f: all costs f are equally sampled regardless of whether they are more or less probable. This motivates the name "flat-histogram". For systems in contact with a heat bath, the sampling is independent of the temperature and one simulation allows to study all temperatures.

example:

On the ferromagnetic Ising model with N sites (exemplified on previous section), the density of states can be analytically computed. In this case, a multicanonical ensemble can be used to compute any other quantity Q by sampling the system according to $P({\boldsymbol {r}})$ and using the proper estimator ${\overline {Q}}$ defined on the previous section.

Tunneling time and critical slowing down

Like in any other Monte Carlo method, there are correlations of the samples being drawn from $P({\boldsymbol {r}})$ . A typical measurement of the correlation is the tunneling time. The tunneling time is defined by the number of Markov steps (of the Markov chain) the simulation needs to perform a round-trip between the minimum and maximum of the spectrum of F. One motivation to use the tunneling time is that when it crosses the spectra, it passes through the region of the maximum of the density of states, thus de-correlating the process. On the other hand using round-trips ensures that the system visits all the spectrum.

Because the histogram is flat on the variable F, a multicanonic ensemble can be seen as a diffusion process (i.e. a random walk) on the one-dimensional line of F values. Detailed balance of the process dictates that there is no drift on the process.^[7] This implies that the tunneling time, in local dynamics, should scale as a diffusion process, and thus the tunneling time should scale quadratically with the size of the spectrum, N:

\tau _{tt}\propto N^{2}

However, in some systems (the Ising model being the most paradigmatic), the scaling suffers from critical slowing down: it is $N^{2+z}$ where $z>0$ depends on the particular system.^[4]

Non-local dynamics were developed to improve the scaling to a quadratic scaling^[8] (see the Wolff algorithm), beating the critical slowing down. However, it is still an open question whether there is a local dynamics that does not suffer from critical slowing down in spin systems like the Ising model.

Related Research Articles

Differential geometry of curves is the branch of geometry that deals with smooth curves in the plane and the Euclidean space by methods of differential and integral calculus.

<span class="mw-page-title-main">Equipartition theorem</span> Theorem in classical statistical mechanics

In classical statistical mechanics, the equipartition theorem relates the temperature of a system to its average energies. The equipartition theorem is also known as the law of equipartition, equipartition of energy, or simply equipartition. The original idea of equipartition was that, in thermal equilibrium, energy is shared equally among all of its various forms; for example, the average kinetic energy per degree of freedom in translational motion of a molecule should equal that in rotational motion.

In physics, the Rabi cycle is the cyclic behaviour of a two-level quantum system in the presence of an oscillatory driving field. A great variety of physical processes belonging to the areas of quantum computing, condensed matter, atomic and molecular physics, and nuclear and particle physics can be conveniently studied in terms of two-level quantum mechanical systems, and exhibit Rabi flopping when coupled to an optical driving field. The effect is important in quantum optics, magnetic resonance and quantum computing, and is named after Isidor Isaac Rabi.

In quantum physics, Fermi's golden rule is a formula that describes the transition rate from one energy eigenstate of a quantum system to a group of energy eigenstates in a continuum, as a result of a weak perturbation. This transition rate is effectively independent of time and is proportional to the strength of the coupling between the initial and final states of the system as well as the density of states. It is also applicable when the final state is discrete, i.e. it is not part of a continuum, if there is some decoherence in the process, like relaxation or collision of the atoms, or like noise in the perturbation, in which case the density of states is replaced by the reciprocal of the decoherence bandwidth.

<span class="mw-page-title-main">Monte Carlo integration</span> Numerical technique

In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand at a regular grid, Monte Carlo randomly chooses points at which the integrand is evaluated. This method is particularly useful for higher-dimensional integrals.

In quantum field theory, the Lehmann–Symanzik–Zimmerman (LSZ) reduction formula is a method to calculate S-matrix elements from the time-ordered correlation functions of a quantum field theory. It is a step of the path that starts from the Lagrangian of some quantum field theory and leads to prediction of measurable quantities. It is named after the three German physicists Harry Lehmann, Kurt Symanzik and Wolfhart Zimmermann.

The Green–Kubo relations give the exact mathematical expression for transport coefficients $in terms of integrals of time correlation functions:$

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle $on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises-Fisher distribution on the N -dimensional sphere.$

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

In mathematics, the Cameron–Martin theorem or Cameron–Martin formula is a theorem of measure theory that describes how abstract Wiener measure changes under translation by certain elements of the Cameron–Martin Hilbert space.

Monte Carlo in statistical physics refers to the application of the Monte Carlo method to problems in statistical physics, or statistical mechanics.

The Wang and Landau algorithm, proposed by Fugao Wang and David P. Landau, is a Monte Carlo method designed to estimate the density of states of a system. The method performs a non-Markovian random walk to build the density of states by quickly visiting all the available energy spectrum. The Wang and Landau algorithm is an important method to obtain the density of states required to perform a multicanonical simulation.

In mathematics, the Möbius energy of a knot is a particular knot energy, i.e., a functional on the space of knots. It was discovered by Jun O'Hara, who demonstrated that the energy blows up as the knot's strands get close to one another. This is a useful property because it prevents self-intersection and ensures the result under gradient descent is of the same knot type.

<span class="mw-page-title-main">Wrapped normal distribution</span>

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

An electric dipole transition is the dominant effect of an interaction of an electron in an atom with the electromagnetic field.

<span class="mw-page-title-main">Wrapped Cauchy distribution</span>

In probability theory and directional statistics, a wrapped Cauchy distribution is a wrapped probability distribution that results from the "wrapping" of the Cauchy distribution around the unit circle. The Cauchy distribution is sometimes known as a Lorentzian distribution, and the wrapped Cauchy distribution may sometimes be referred to as a wrapped Lorentzian distribution.

<span class="mw-page-title-main">Thermal fluctuations</span> Random temperature-influenced deviations of particles from their average state

In statistical mechanics, thermal fluctuations are random deviations of an atomic system from its average state, that occur in a system at equilibrium. All thermal fluctuations become larger and more frequent as the temperature increases, and likewise they decrease as temperature approaches absolute zero.

In statistics, the matrix t-distribution is the generalization of the multivariate t-distribution from vectors to matrices. The matrix t-distribution shares the same relationship with the multivariate t-distribution that the matrix normal distribution shares with the multivariate normal distribution. For example, the matrix t-distribution is the compound distribution that results from sampling from a matrix normal distribution having sampled the covariance matrix of the matrix normal from an inverse Wishart distribution.

A phonovoltaic (pV) cell converts vibrational (phonons) energy into a direct current much like the photovoltaic effect in a photovoltaic (PV) cell converts light (photon) into power. That is, it uses a p-n junction to separate the electrons and holes generated as valence electrons absorb optical phonons more energetic than the band gap, and then collects them in the metallic contacts for use in a circuit. The pV cell is an application of heat transfer physics and competes with other thermal energy harvesting devices like the thermoelectric generator.

In complex geometry, the $lemma$ is a mathematical lemma about the de Rham cohomology class of a complex differential form. The $-lemma is a result of Hodge theory and the Kähler identities on a compact Kähler manifold. Sometimes it is also known as the -lemma, due to the use of a related operator, with the relation between the two operators being and so .$

References

1 2 3 4 5 6 Berg, B.; Neuhaus, T. (1992). "Multicanonical ensemble: A new approach to simulate first-order phase transitions". Physical Review Letters. 68 (1): 9–12. arXiv: hep-lat/9202004 . Bibcode:1992PhRvL..68....9B. doi:10.1103/PhysRevLett.68.9. PMID 10045099. S2CID 19478641.
1 2 3 Wang, F.; Landau, D. (2001). "Efficient, Multiple-Range Random Walk Algorithm to Calculate the Density of States". Physical Review Letters. 86 (10): 2050–2053. arXiv: cond-mat/0011174 . Bibcode:2001PhRvL..86.2050W. doi:10.1103/PhysRevLett.86.2050. PMID 11289852. S2CID 2941153.
1 2 3 4 5 Newmann, M E J; Barkema, G T (2002). Monte Carlo Methods in Statistical Physics. USA: Oxford University Press. ISBN 0198517971.
1 2 Dayal, P.; Trebst, S.; Wessel, S.; Würtz, D.; Troyer, M.; Sabhapandit, S.; Coppersmith, S. (2004). "Performance Limitations of Flat-Histogram Methods". Physical Review Letters. 92 (9): 097201. arXiv: cond-mat/0306108 . Bibcode:2004PhRvL..92i7201D. doi:10.1103/PhysRevLett.92.097201. PMID 15089505. S2CID 1128445.
↑ Lee, J.; Choi, M. (1994). "Optimization by multicanonical annealing and the traveling salesman problem". Physical Review E. 50 (2): R651–R654. Bibcode:1994PhRvE..50..651L. doi:10.1103/PhysRevE.50.R651. PMID 9962167.
↑ Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. (1953). "Equation of State Calculations by Fast Computing Machines". The Journal of Chemical Physics. 21 (6): 1087. Bibcode:1953JChPh..21.1087M. doi:10.1063/1.1699114. OSTI 4390578. S2CID 1046577.
↑ Robert, Christian; Casella, George (2004). Monte Carlo statistical methods. Springer. ISBN 978-0-387-21239-5.
↑ Wolff, U. (1989). "Collective Monte Carlo Updating for Spin Systems". Physical Review Letters. 62 (4): 361–364. Bibcode:1989PhRvL..62..361W. doi:10.1103/PhysRevLett.62.361. PMID 10040213.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Berg-1] 1 2 3 4 5 6 Berg, B.; Neuhaus, T. (1992). "Multicanonical ensemble: A new approach to simulate first-order phase transitions". Physical Review Letters. 68 (1): 9–12. arXiv: hep-lat/9202004 . Bibcode:1992PhRvL..68....9B. doi:10.1103/PhysRevLett.68.9. PMID 10045099. S2CID 19478641.

[Landau-2] 1 2 3 Wang, F.; Landau, D. (2001). "Efficient, Multiple-Range Random Walk Algorithm to Calculate the Density of States". Physical Review Letters. 86 (10): 2050–2053. arXiv: cond-mat/0011174 . Bibcode:2001PhRvL..86.2050W. doi:10.1103/PhysRevLett.86.2050. PMID 11289852. S2CID 2941153.

[Newmann-3] 1 2 3 4 5 Newmann, M E J; Barkema, G T (2002). Monte Carlo Methods in Statistical Physics. USA: Oxford University Press. ISBN 0198517971.

[Dayal-4] 1 2 Dayal, P.; Trebst, S.; Wessel, S.; Würtz, D.; Troyer, M.; Sabhapandit, S.; Coppersmith, S. (2004). "Performance Limitations of Flat-Histogram Methods". Physical Review Letters. 92 (9): 097201. arXiv: cond-mat/0306108 . Bibcode:2004PhRvL..92i7201D. doi:10.1103/PhysRevLett.92.097201. PMID 15089505. S2CID 1128445.

[Lee-5] Lee, J.; Choi, M. (1994). "Optimization by multicanonical annealing and the traveling salesman problem". Physical Review E. 50 (2): R651–R654. Bibcode:1994PhRvE..50..651L. doi:10.1103/PhysRevE.50.R651. PMID 9962167.

[Metropolis-6] Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. (1953). "Equation of State Calculations by Fast Computing Machines". The Journal of Chemical Physics. 21 (6): 1087. Bibcode:1953JChPh..21.1087M. doi:10.1063/1.1699114. OSTI 4390578. S2CID 1046577.

[Robert-7] Robert, Christian; Casella, George (2004). Monte Carlo statistical methods. Springer. ISBN 978-0-387-21239-5.

[Wolff-8] Wolff, U. (1989). "Collective Monte Carlo Updating for Spin Systems". Physical Review Letters. 62 (4): 361–364. Bibcode:1989PhRvL..62..361W. doi:10.1103/PhysRevLett.62.361. PMID 10040213.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Multicanonical ensemble

Contents

Motivation

Overview

Motivation

Multicanonical ensemble

Tunneling time and critical slowing down

Related Research Articles

References