Markov chain central limit theorem

Last updated

In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity.

Contents

Statement

Suppose that:

Now let [1] [2] [3]

Then as we have [4]

where the decorated arrow indicates convergence in distribution.

Monte Carlo Setting

The Markov chain central limit theorem can be guaranteed for functionals of general state space Markov chains under certain conditions. In particular, this can be done with a focus on Monte Carlo settings. An example of the application in a MCMC (Markov Chain Monte Carlo) setting is the following:

Consider a simple hard spheres model on a grid. Suppose . A proper configuration on consists of coloring each point either black or white in such a way that no two adjacent points are white. Let denote the set of all proper configurations on , be the total number of proper configurations and π be the uniform distribution on so that each proper configuration is equally likely. Suppose our goal is to calculate the typical number of white points in a proper configuration; that is, if is the number of white points in then we want the value of

If and are even moderately large then we will have to resort to an approximation to . Consider the following Markov chain on . Fix and set where is an arbitrary proper configuration. Randomly choose a point and independently draw . If and all of the adjacent points are black then color white leaving all other points alone. Otherwise, color black and leave all other points alone. Call the resulting configuration . Continuing in this fashion yields a Harris ergodic Markov chain having as its invariant distribution. It is now a simple matter to estimate with . Also, since is finite (albeit potentially large) it is well known that will converge exponentially fast to which implies that a CLT holds for .

Implications

Not taking into account the additional terms in the variance which stem from correlations (e.g. serial correlations in markov chain monte carlo simulations) can result in the problem of pseudoreplication when computing e.g. the confidence intervals for the sample mean.

Related Research Articles

<span class="mw-page-title-main">Autocorrelation</span> Correlation of a signal with a time-shifted copy of itself, as a function of shift

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the squared deviation from the mean of a random variable. The variance is also often defined as the square of the standard deviation. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

<span class="mw-page-title-main">Chi-squared distribution</span> Probability distribution and special case of gamma distribution

In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.

<span class="mw-page-title-main">Wiener process</span> Stochastic process generalizing Brownian motion

In mathematics, the Wiener process is a real-valued continuous-time stochastic process named in honor of American mathematician Norbert Wiener for his investigations on the mathematical properties of the one-dimensional Brownian motion. It is often also called Brownian motion due to its historical connection with the physical process of the same name originally observed by Scottish botanist Robert Brown. It is one of the best known Lévy processes and occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

In probability theory and statistics, the cumulantsκn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

<span class="mw-page-title-main">Polylogarithm</span> Special mathematical function

In mathematics, the polylogarithm (also known as Jonquière's function, for Alfred Jonquière) is a special function Lis(z) of order s and argument z. Only for special values of s does the polylogarithm reduce to an elementary function such as the natural logarithm or a rational function. In quantum statistics, the polylogarithm function appears as the closed form of integrals of the Fermi–Dirac distribution and the Bose–Einstein distribution, and is also known as the Fermi–Dirac integral or the Bose–Einstein integral. In quantum electrodynamics, polylogarithms of positive integer order arise in the calculation of processes represented by higher-order Feynman diagrams.

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory. The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) holding times that have finite mean. A renewal-reward process additionally has a random sequence of rewards incurred at each holding time, which are IID but need not be independent of the holding times.

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

<span class="mw-page-title-main">Variance reduction</span>

In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort. Every output random variable from the simulation is associated with a variance which limits the precision of the simulation results. In order to make a simulation statistically efficient, i.e., to obtain a greater precision and smaller confidence intervals for the output random variable of interest, variance reduction techniques can be used. The main ones are common random numbers, antithetic variates, control variates, importance sampling, stratified sampling, moment matching, conditional Monte Carlo and quasi random variables. For simulation with black-box models subset simulation and line sampling can also be used. Under these headings are a variety of specialized techniques; for example, particle transport simulations make extensive use of "weight windows" and "splitting/Russian roulette" techniques, which are a form of importance sampling.

In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems. However, it is usually more difficult to compute.

In mathematics, especially measure theory, a set function is a function whose domain is a family of subsets of some given set and that (usually) takes its values in the extended real number line which consists of the real numbers and

<span class="mw-page-title-main">Distance correlation</span>

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

In representation theory of mathematics, the Waldspurger formula relates the special values of two L-functions of two related admissible irreducible representations. Let k be the base field, f be an automorphic form over k, π be the representation associated via the Jacquet–Langlands correspondence with f. Goro Shimura (1976) proved this formula, when and f is a cusp form; Günter Harder made the same discovery at the same time in an unpublished paper. Marie-France Vignéras (1980) proved this formula, when and f is a newform. Jean-Loup Waldspurger, for whom the formula is named, reproved and generalized the result of Vignéras in 1985 via a totally different method which was widely used thereafter by mathematicians to prove similar formulas.

In mathematics, nuclear operators are an important class of linear operators introduced by Alexander Grothendieck in his doctoral dissertation. Nuclear operators are intimately tied to the projective tensor product of two topological vector spaces (TVSs).

In mathematics, Rathjen's  psi function is an ordinal collapsing function developed by Michael Rathjen. It collapses weakly Mahlo cardinals to generate large countable ordinals. A weakly Mahlo cardinal is a cardinal such that the set of regular cardinals below is closed under . Rathjen uses this to diagonalise over the weakly inaccessible hierarchy.

References

  1. On the Markov Chain Central Limit Theorem, Galin L. Jones, https://arxiv.org/pdf/math/0409112.pdf
  2. Markov Chain Monte Carlo Lecture Notes Charles J. Geyer https://www.stat.umn.edu/geyer/f05/8931/n1998.pdf page 9
  3. Note that the equation for starts from Bienaymé's identity and then assumes that which is the Cesàro summation, see Greyer, Markov Chain Monte Carlo Lecture Notes https://www.stat.umn.edu/geyer/f05/8931/n1998.pdf page 9
  4. Geyer, Charles J. (2011). Introduction to Markov Chain Monte Carlo. In Handbook of MarkovChain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, Section 1.8. http://www.mcmchandbook.net/HandbookChapter1.pdf

Sources