Martingale (probability theory)

Last updated

In probability theory, a martingale is a sequence of random variables (i.e., a stochastic process) for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.

Contents

Stopped Brownian motion is an example of a martingale. It can model an even coin-toss betting game with the possibility of bankruptcy. HittingTimes1.png
Stopped Brownian motion is an example of a martingale. It can model an even coin-toss betting game with the possibility of bankruptcy.

History

Originally, martingale referred to a class of betting strategies that was popular in 18th-century France. [1] [2] The simplest of these strategies was designed for a game in which the gambler wins their stake if a coin comes up heads and loses it if the coin comes up tails. The strategy had the gambler double their bet after every loss so that the first win would recover all previous losses plus win a profit equal to the original stake. As the gambler's wealth and available time jointly approach infinity, their probability of eventually flipping heads approaches 1, which makes the martingale betting strategy seem like a sure thing. However, the exponential growth of the bets eventually bankrupts its users due to finite bankrolls. Stopped Brownian motion, which is a martingale process, can be used to model the trajectory of such games.

The concept of martingale in probability theory was introduced by Paul Lévy in 1934, though he did not name it. The term "martingale" was introduced later by Ville (1939), who also extended the definition to continuous martingales. Much of the original development of the theory was done by Joseph Leo Doob among others. Part of the motivation for that work was to show the impossibility of successful betting strategies in games of chance.

Definitions

A basic definition of a discrete-time martingale is a discrete-time stochastic process (i.e., a sequence of random variables) X1, X2, X3, ... that satisfies for any time n,

That is, the conditional expected value of the next observation, given all the past observations, is equal to the most recent observation.

Martingale sequences with respect to another sequence

More generally, a sequence Y1, Y2, Y3 ... is said to be a martingale with respect to another sequence X1, X2, X3 ... if for all n

Similarly, a continuous-time martingale with respect to the stochastic process Xt is a stochastic process Yt such that for all t

This expresses the property that the conditional expectation of an observation at time t, given all the observations up to time , is equal to the observation at time s (of course, provided that s  t). The second property implies that is measurable with respect to .

General definition

In full generality, a stochastic process taking values in a Banach space with norm is a martingale with respect to a filtrationand probability measure if

where χF denotes the indicator function of the event F. In Grimmett and Stirzaker's Probability and Random Processes, this last condition is denoted as
which is a general form of conditional expectation. [3]

It is important to note that the property of being a martingale involves both the filtration and the probability measure (with respect to which the expectations are taken). It is possible that Y could be a martingale with respect to one measure but not another one; the Girsanov theorem offers a way to find a measure with respect to which an Itō process is a martingale.

In the Banach space setting the conditional expectation is also denoted in operator notation as . [4]

Examples of martingales

with "+" in case of "heads" and "−" in case of "tails". Let
Then { Yn : n = 1, 2, 3, ... } is a martingale with respect to { Xn : n = 1, 2, 3, ... }. To show this
If X is actually distributed according to the density f rather than according to g, then { Yn : n = 1, 2, 3, ... } is a martingale with respect to { Xn : n = 1, 2, 3, ... }.
Software-created martingale series Martingale1.svg
Software-created martingale series

Submartingales, supermartingales, and relationship to harmonic functions

There are two popular generalizations of a martingale that also include cases when the current observation Xn is not necessarily equal to the future conditional expectation E[Xn+1 | X1,...,Xn] but instead an upper or lower bound on the conditional expectation. These definitions reflect a relationship between martingale theory and potential theory, which is the study of harmonic functions. Just as a continuous-time martingale satisfies E[Xt | {Xτ : τ  s}]  Xs = 0 s  t, a harmonic function f satisfies the partial differential equation Δf = 0 where Δ is the Laplacian operator. Given a Brownian motion process Wt and a harmonic function f, the resulting process f(Wt) is also a martingale.

Likewise, a continuous-time submartingale satisfies
In potential theory, a subharmonic function f satisfies Δf  0. Any subharmonic function that is bounded above by a harmonic function for all points on the boundary of a ball is bounded above by the harmonic function for all points inside the ball. Similarly, if a submartingale and a martingale have equivalent expectations for a given time, the history of the submartingale tends to be bounded above by the history of the martingale. Roughly speaking, the prefix "sub-" is consistent because the current observation Xn is less than (or equal to) the conditional expectation E[Xn+1 | X1,...,Xn]. Consequently, the current observation provides support from below the future conditional expectation, and the process tends to increase in future time.
Likewise, a continuous-time supermartingale satisfies
In potential theory, a superharmonic function f satisfies Δf  0. Any superharmonic function that is bounded below by a harmonic function for all points on the boundary of a ball is bounded below by the harmonic function for all points inside the ball. Similarly, if a supermartingale and a martingale have equivalent expectations for a given time, the history of the supermartingale tends to be bounded below by the history of the martingale. Roughly speaking, the prefix "super-" is consistent because the current observation Xn is greater than (or equal to) the conditional expectation E[Xn+1 | X1,...,Xn]. Consequently, the current observation provides support from above the future conditional expectation, and the process tends to decrease in future time.

Examples of submartingales and supermartingales

Martingales and stopping times

A stopping time with respect to a sequence of random variables X1, X2, X3, ... is a random variable τ with the property that for each t, the occurrence or non-occurrence of the event τ = t depends only on the values of X1, X2, X3, ..., Xt. The intuition behind the definition is that at any particular time t, you can look at the sequence so far and tell if it is time to stop. An example in real life might be the time at which a gambler leaves the gambling table, which might be a function of their previous winnings (for example, he might leave only when he goes broke), but he can't choose to go or stay based on the outcome of games that haven't been played yet.

In some contexts the concept of stopping time is defined by requiring only that the occurrence or non-occurrence of the event τ = t is probabilistically independent of Xt + 1, Xt + 2, ... but not that it is completely determined by the history of the process up to time t. That is a weaker condition than the one appearing in the paragraph above, but is strong enough to serve in some of the proofs in which stopping times are used.

One of the basic properties of martingales is that, if is a (sub-/super-) martingale and is a stopping time, then the corresponding stopped process defined by is also a (sub-/super-) martingale.

The concept of a stopped martingale leads to a series of important theorems, including, for example, the optional stopping theorem which states that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial value.

See also

Notes

  1. Balsara, N. J. (1992). Money Management Strategies for Futures Traders . Wiley Finance. p.  122. ISBN   978-0-471-52215-7. martingale.
  2. Mansuy, Roger (June 2009). "The origins of the Word "Martingale"" (PDF). Electronic Journal for History of Probability and Statistics. 5 (1). Archived (PDF) from the original on 2012-01-31. Retrieved 2011-10-22.
  3. Grimmett, G.; Stirzaker, D. (2001). Probability and Random Processes (3rd ed.). Oxford University Press. ISBN   978-0-19-857223-7.
  4. Bogachev, Vladimir (1998). Gaussian Measures. American Mathematical Society. pp. 372–373. ISBN   978-1470418694.

Related Research Articles

<span class="mw-page-title-main">Autocorrelation</span> Correlation of a signal with a time-shifted copy of itself, as a function of shift

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

<span class="mw-page-title-main">Convolution</span> Integral expressing the amount of overlap of one function as it is shifted over another

In mathematics, convolution is a mathematical operation on two functions that produces a third function. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The integral is evaluated for all values of shift, producing the convolution function. The choice of which function is reflected and shifted before the integral does not change the integral result. Graphically, it expresses how the 'shape' of one function is modified by the other.

<span class="mw-page-title-main">Probability density function</span> Function whose integral over a region describes the probability of an event occurring in that region

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

<span class="mw-page-title-main">Fokker–Planck equation</span> Partial differential equation

In statistical mechanics and information theory, the Fokker–Planck equation is a partial differential equation that describes the time evolution of the probability density function of the velocity of a particle under the influence of drag forces and random forces, as in Brownian motion. The equation can be generalized to other observables as well. The Fokker-Planck equation has multiple applications in information theory, graph theory, data science, finance, economics etc.

In probability theory, the Azuma–Hoeffding inequality gives a concentration result for the values of martingales that have bounded differences.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

<span class="mw-page-title-main">Cross-correlation</span> Covariance and correlation

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

<span class="mw-page-title-main">Stopping time</span> Time at which a random variable stops exhibiting a behavior of interest

In probability theory, in particular in the study of stochastic processes, a stopping time is a specific type of “random time”: a random variable whose value is interpreted as the time at which a given stochastic process exhibits a certain behavior of interest. A stopping time is often defined by a stopping rule, a mechanism for deciding whether to continue or stop a process on the basis of the present position and past events, and which will almost always lead to a decision to stop at some finite time.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In probability and statistics, given two stochastic processes and , the cross-covariance is a function that gives the covariance of one process with the other at pairs of time points. With the usual notation for the expectation operator, if the processes have the mean functions and , then the cross-covariance is given by

In mathematics, a local martingale is a type of stochastic process, satisfying the localized version of the martingale property. Every martingale is a local martingale; every bounded local martingale is a martingale; in particular, every local martingale that is bounded from below is a supermartingale, and every local martingale that is bounded from above is a submartingale; however, in general a local martingale is not a martingale, because its expectation can be distorted by large values of small probability. In particular, a driftless diffusion process is a local martingale, but not necessarily a martingale.

In mathematics, Doob's martingale inequality, also known as Kolmogorov’s submartingale inequality is a result in the study of stochastic processes. It gives a bound on the probability that a submartingale exceeds any given value over a given interval of time. As the name suggests, the result is usually given in the case that the process is a martingale, but the result is also valid for submartingales.

In probability theory, a real valued stochastic process X is called a semimartingale if it can be decomposed as the sum of a local martingale and a càdlàg adapted finite-variation process. Semimartingales are "good integrators", forming the largest class of processes with respect to which the Itô integral and the Stratonovich integral can be defined.

In mathematics – specifically, in stochastic analysis – an Itô diffusion is a solution to a specific type of stochastic differential equation. That equation is similar to the Langevin equation used in physics to describe the Brownian motion of a particle subjected to a potential in a viscous fluid. Itô diffusions are named after the Japanese mathematician Kiyosi Itô.

In mathematics – specifically, in the theory of stochastic processes – Doob's martingale convergence theorems are a collection of results on the limits of supermartingales, named after the American mathematician Joseph L. Doob. Informally, the martingale convergence theorem typically refers to the result that any supermartingale satisfying a certain boundedness condition must converge. One may think of supermartingales as the random variable analogues of non-increasing sequences; from this perspective, the martingale convergence theorem is a random variable analogue of the monotone convergence theorem, which states that any bounded monotone sequence converges. There are symmetric results for submartingales, which are analogous to non-decreasing sequences.

In probability theory, the optional stopping theorem says that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial expected value. Since martingales can be used to model the wealth of a gambler participating in a fair game, the optional stopping theorem says that, on average, nothing can be gained by stopping play based on the information obtainable so far. Certain conditions are necessary for this result to hold true. In particular, the theorem applies to doubling strategies.

In the theory of stochastic processes in discrete time, a part of the mathematical theory of probability, the Doob decomposition theorem gives a unique decomposition of every adapted and integrable stochastic process as the sum of a martingale and a predictable process starting at zero. The theorem was proved by and is named for Joseph L. Doob.

<span class="mw-page-title-main">J. Laurie Snell</span> American mathematician

James Laurie Snell was an American mathematician and educator.

In the mathematical theory of probability, the drift-plus-penalty method is used for optimization of queueing networks and other stochastic systems.

In probability theory, Kramkov's optional decomposition theorem is a mathematical theorem on the decomposition of a positive supermartingale with respect to a family of equivalent martingale measures into the form

References