Probability distribution of extreme points of a Wiener stochastic process

Last updated April 07, 2023

In the mathematical theory of probability, the Wiener process, named after Norbert Wiener, is a stochastic process used in modeling various phenomena, including Brownian motion and fluctuations in financial markets. A formula for the conditional probability distribution of the extremum of the Wiener process and a sketch of its proof appears in work of H. J. Kusher (appendix 3, page 106) published in 1964.^[1] a detailed constructive proof appears in work of Dario Ballabio in 1978.^[2] This result was developed within a research project about Bayesian optimization algorithms.

In some global optimization problems the analytical definition of the objective function is unknown and it is only possible to get values at fixed points. There are objective functions in which the cost of an evaluation is very high, for example when the evaluation is the result of an experiment or a particularly onerous measurement. In these cases, the search of the global extremum (maximum or minimum) can be carried out using a methodology named "Bayesian optimization", which tend to obtain a priori the best possible result with a predetermined number of evaluations. In summary it is assumed that outside the points in which it has already been evaluated, the objective function has a pattern which can be represented by a stochastic process with appropriate characteristics. The stochastic process is taken as a model of the objective function, assuming that the probability distribution of its extrema gives the best indication about extrema of the objective function. In the simplest case of the one-dimensional optimization, given that the objective function has been evaluated in a number of points, there is the problem to choose in which of the intervals thus identified is more appropriate to invest in a further evaluation. If a Wiener stochastic process is chosen as a model for the objective function, it is possible to calculate the probability distribution of the model extreme points inside each interval, conditioned by the known values at the interval boundaries. The comparison of the obtained distributions provides a criterion for selecting the interval in which the process should be iterated. The probability value of having identified the interval in which falls the global extremum point of the objective function can be used as a stopping criterion. Bayesian optimization is not an efficient method for the accurate search of local extrema so, once the search range has been restricted, depending on the characteristics of the problem, a specific local optimization method can be used.

Proposition

Let $X(t)$ be a Wiener stochastic process on an interval $[a,b]$ with initial value $X(a)=X_{a}.$

By definition of Wiener process, increments have a normal distribution:

{\text{for }}a\leq t_{1}<t_{2}\leq b,\qquad X(t_{2})-X(t_{1})\sim N(0,\sigma ^{2}(t_{2}-t_{1})).

Let

F(z)=\Pr(\min _{a\leq t\leq b}X(t)\leq z\mid X(b)=X_{b})

be the cumulative probability distribution function of the minimum value of the $X(t)$ function on interval $[a,b]$ conditioned by the value $X(b)=X_{b}.$

It is shown that:^[1]^[3]^{[note 1]}

F(z)={\begin{cases}1&{\text{for }}z\geq \min\{X_{a},X_{b}\},\\\exp \left(-2{\dfrac {(z-X_{b})(z-X_{a})}{\sigma ^{2}(b-a)}}\right)&{\text{for }}z<\min(X_{a},X_{b}).\end{cases}}

Constructive proof

Case $z\geq \min(X_{a},X_{b})$ is an immediate consequence of the minimum definition, in the following it will always be assumed $z<\min(X_{a},X_{b})$ and also corner case $\min _{a\leq t\leq b}X(t)=\min(X_{a},X_{b})$ will be excluded.

Let' s assume $X(t)$ defined in a finite number of points $t_{k}\in [a,b],\ \ 0\leq k\leq n,\ \ t_{0}=a$ .

Let $T_{n}\ \ {\overset {\underset {\mathrm {def} }{}}{=}}\ \ \{t_{k},\ \ 0\leq k\leq n,\}$ by varying the integer $n$ be a sequence of sets $\{T_{n}\}$ such that $T_{n}\subset T_{n+1}$ and $\bigcup _{n=0}^{+\infty }T_{n}$ be a dense set in $[a,b]$ ,

hence every neighbourhood of each point in $[a,b]$ contains an element of one of the sets $T_{n}$ .

Let $\Delta z$ be a real positive number such that $z+\Delta z<\min(X_{a},X_{b}).$

Let the event $E$ be defined as: $E\ \ {\overset {\underset {\mathrm {def} }{}}{=}}\ \ (\min _{a\leq t\leq b}X(t)<z+\Delta z)$ $\Longleftrightarrow$ $(\exists \,t\in [a,b]:X(t)<z+\Delta z)$ .

Having excluded corner case $\min _{a\leq t\leq b}X(t)=\min(X_{a},X_{b})$ , it is surely $P(E)>0$ .

Let $E_{n},\ \ n=0,1,2,\ldots$ be the events defined as: $E_{n}\ \ {\overset {\underset {\mathrm {def} }{}}{=}}\ \ (\exists \,t_{k}\in T_{n}:z<X(t_{k})<z+\Delta z)$ and let $\nu$ be the first k among the $t_{k}\in T_{n}$ which define $E_{n}$ .

Since $T_{n}\subset T_{n+1}$ it is evidently $E_{n}\subset E_{n+1}$ . Now equation (2.1) will be proved.

(2.1) $\ \ \ \ E=\bigcup _{n=0}^{+\infty }E_{n}$

By the $E_{n}$ events definition, $\forall \,n\ \ E_{n}\Rightarrow E$ , hence $\bigcup _{n=0}^{+\infty }E_{n}\subset E$ . It will now be verified the relation $E\subset \bigcup _{n=0}^{+\infty }E_{n}$ hence (2.1) will be proved.

The definition of $E$ , the continuity of $X(t)$ and the hypothesis $z<X_{a}=X(a)$ imply, by the intermediate value theorem, $(\exists \,{\bar {t}}\in [a,b]:z<X({\bar {t}})<z+\Delta z)$ .

By the continuity of $X(t)$ and the hypothesis that $\bigcup _{n=0}^{+\infty }T_{n}$ is dense in $[a,b]$ it is deducted that $\exists \,{\bar {n}}$ such that for $t_{\nu }\in T_{\bar {n}}$ it must be $z<X(t_{\nu })<z+\Delta z$ ,

hence $E\subset E_{\bar {n}}\subset \bigcup _{n=0}^{+\infty }E_{n}$ which implies (2.1).

(2.2) $\ \ \ \ P(E)=\lim _{n\rightarrow +\infty }P(E_{n})$

(2.2) is deducted from (2.1), considering that $E_{n}\Rightarrow E_{n+1}$ implies that the sequence of probabilities $P(E_{n})$ is monotone non decreasing and hence it converges to its supremum. The definition of events $E_{n}$ implies $\forall n\ \ P(E_{n})>0\Rightarrow P(E_{n})=P(E_{\nu })$ and (2.2) implies $P(E)=P(E_{\nu })$ .

In the following it will always be assumed $n\geq \nu$ , so $t_{\nu }$ is well defined.

(2.3) $\ \ \ \ P(X(b)\leqslant -X_{b}+2z)\leqslant P(X(b)-X(t_{\nu })<-X_{b}+z)$

In fact, by definition of $E_{n}$ it is $z<X(t_{\nu })$ , so $(X(b)\leqslant -X_{b}+2z)\Rightarrow (X(b)-X(t_{\nu })<-X_{b}+z)$ .

In a similar way, since by definition of $E_{n}$ it is $z<X(t_{\nu })$ , (2.4) is valid:

(2.4) $\ \ \ \ P(X(b)-X(t_{\nu })>X_{b}-z)\leqslant P(X(b)>X_{b})$

(2.5) $\ \ \ \ P(X(b)-X(t_{\nu })<-X_{b}+z)=P(X(b)-X(t_{\nu })>X_{b}-z)$

The above is explained by the fact that the random variable ${\displaystyle (X(b)-X(t_{\nu }))\thicksim N(\varnothing$ has a symmetric probability density compared to its mean which is zero.

By applying in sequence relationships (2.3), (2.5) and (2.4) we get (2.6) :

(2.6) $\ \ \ \ P(X(b)\leqslant -X_{b}+2z)\leqslant P(X(b)>X_{b})$

With the same procedure used to obtain (2.3), (2.4) and (2.5) taking advantage this time by the relationship $X(t_{\nu })<z+\Delta z$ we get (2.7):

(2.7) $\ \ \ \ P(X(b)>X_{b})\leqslant P(X(b)-X(t_{\nu })>X_{b}-z-\Delta z)\ \$ $=\ \ P(X(b)-X(t_{\nu })<-X_{b}+z+\Delta z)\leqslant P(X(b)<-X_{b}+2z+2\Delta z)$

By applying in sequence (2.6) and (2.7) we get:

(2.8) $P(X(b)\leqslant -X_{b}+2z)\leqslant P(X(b)>X_{b})$ $\leqslant P(X(b)<-X_{b}+2z+2\Delta z)$

From $X_{b}>z+\Delta z>z$ , considering the continuity of $X(t)$ and the intermediate value theorem we get $X(b)>X_{b}>z+\Delta z>z\Rightarrow E_{n}$ ,

which implies $P(X(b)>X_{b})=P(E_{n},X(b)>X_{b})$ .

Replacing the above in (2.8) and passing to the limits: $\lim _{n\rightarrow +\ \infty }\ \ E_{n}(\Delta z)\rightarrow E(\Delta z)$ and for $\Delta z\rightarrow 0$ , event $E(\Delta z)$ converges to $\min _{a\leq t\leq b}X(t)\leqslant z$

(2.9) $\ \ \ \ P(X(b)\leqslant -X_{b}+2z)=$ $P(\min _{a\leq t\leq b}X(t)\leqslant z,\ \ X(b)>X_{b})$

$\forall \,dX_{b}>0$ , by substituting $(X_{b})$ with $(X_{b}-dX_{b})$ in (2.9) we get the equivalent relationship:

(2.10) $\ \ \ \ P(X(b)\leqslant -X_{b}+2z+dX_{b})=$ $P(\min _{a\leq t\leq b}X(t)\leqslant z,\ \ X(b)>X_{b}-dX_{b})$

Applying the Bayes' theorem to the joint event $(\min _{a\leq t\leq b}X(t)\leqslant z,\ \ X_{b}-dX_{b}<X(b)\leqslant X_{b})$

(2.11) $\ \ \ \ P(\min _{a\leq t\leq b}X(t)\leqslant z\mid X_{b}-dX_{b}<X(b)\leqslant X_{b})=$ $P(\min _{a\leq t\leq b}X(t)\leqslant z,\ \ X_{b}-dX_{b}<X(b)\leqslant X_{b})$ $/\ \ P(X_{b}-dX_{b}<X(b)\leqslant X_{b})$

Let: $B\ {\overset {\underset {\mathrm {def} }{}}{=}}\ \{X(b)>X_{b}\},\ C\ {\overset {\underset {\mathrm {def} }{}}{=}}\ \{X_{b}-dX_{b}<X(b)\leq X_{b}\},\ D\ {\overset {\underset {\mathrm {def} }{}}{=}}\ \{X(b)>X_{b}-dX_{b}\},\ A\ {\overset {\underset {\mathrm {def} }{}}{=}}\ \ \{\min _{a\leq t\leq b}X(t)\leqslant z\}$ From the above definitions it follows:

$D=B\cup C\Rightarrow \ P(A,D)=P(A,B\cup C)=P(A,B)+P(A,C)\Rightarrow P(A,C)=P(A,D)-P(A,B)$

(2.12) $\ \ \ \ P(A,C)=P(A,D)-P(A,B)$

Substituting (2.12) into (2.11), we get the equivalent:

(2.13) $P(\min _{a\leq t\leq b}X(t)\leqslant z\mid X_{b}-dX_{b}<X(b)\leqslant X_{b})=(P(\min _{a\leqslant t\leqslant b}X(t)\leq z,\ \ X(b)>X_{b}-dX_{b})-P(\min _{a\leqslant t\leqslant b}X(t)\leq z,\ \ X(b)>X_{b}))\ \ /\ \ P(X_{b}-dX_{b}<X(b)\leqslant X_{b})$

Substituting (2.9) and (2.10) into (2.13):

(2.14) $\ \ \ \ P(\min _{a\leq t\leq b}X(t)\leqslant z\mid X_{b}-dX_{b}<X(b)\leqslant X_{b})=$ $(P(X(b)\leqslant -X_{b}+2z+dX_{b})-P(X(b)\leqslant -X_{b}+2z)$ $/\ \ P(X_{b}-dX_{b}<X(b)\leqslant X_{b})$

It can be observed that in the second member of (2.14) appears the probability distribution of the random variable $X(b)$ , normal with mean $X_{a}$ e variance $\sigma ^{2}(b-a)$ .

The realizations $X_{b}$ and $-X_{b}+2z$ of the random variable $X(b)$ match respectively the probability densities:

(2.15) $\ \ \ \ P(X_{b})\,dX_{b}={\frac {1}{\sigma {\sqrt {2\pi (b-a)}}}}\exp {\biggl (}-{\frac {1}{2}}{\frac {(X_{b}-X_{a})^{2}}{\sigma ^{2}(b-a)}}{\biggr )}\,dX_{b}$

(2.16) $\ \ \ \ P(-X_{b}+2z)\,dX_{b}={\frac {1}{\sigma {\sqrt {2\pi (b-a)}}}}\exp {\biggl (}-{\frac {1}{2}}{\frac {(-X_{b}+2z-X_{a})^{2}}{\sigma ^{2}(b-a)}}{\biggr )}\,dX_{b}$

Substituting (2.15) e (2.16) into (2.14) and taking the limit for $dX_{b}\rightarrow 0$ the thesis is proved:

$F(z)=P(\min _{a\leq t\leq b}X(t)\leq z\ \ |\ \ X(b)=X_{b})=$

$={\frac {1}{\sigma {\sqrt {2\pi (b-a)}}}}\exp {\biggl (}-{\frac {1}{2}}{\frac {(-X_{b}+2z-X_{a})^{2}}{\sigma ^{2}(b-a)}}{\biggr )}\,dX_{b}$ $\ \ \diagup \ \ {\frac {1}{\sigma {\sqrt {2\pi (b-a)}}}}\exp {\biggl (}-{\frac {1}{2}}{\frac {(X_{b}-X_{a})^{2}}{\sigma ^{2}(b-a)}}{\biggr )}\,dX_{b}=$

$=\exp {\biggl (}-{\frac {1}{2}}{\frac {(-X_{b}+2z-X_{a})^{2}-(X_{b}-X_{a})^{2}}{\sigma ^{2}(b-a)}}{\biggr )}=$ $\ \ \exp {\biggl (}-2\ \ {\frac {(z-X_{b})(z-X_{a})}{\sigma ^{2}(b-a)}}{\biggr )}$

Bibliography

A versatile stochastic model of a function of unknown and time varying form - Harold J Kushner - Journal of Mathematical Analysis and Applications Volume 5, Issue 1, August 1962, Pages 150-167.
The Application of Bayesian Methods for Seeking the Extremum - J. Mockus, J. Tiesis, A. Zilinskas - IFIP Congress 1977, August 8–12 Toronto.

Notes

↑ The theorem, as set out and shown for the case of the minimum of the Wiener process, also applies to the maximum.

Related Research Articles

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In probability and statistics, Student's t-distribution is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

In mathematics, the Kronecker delta is a function of two variables, usually just non-negative integers. The function is 1 if the variables are equal, and 0 otherwise:

In materials science and solid mechanics, Poisson's ratio $(nu) is a measure of the Poisson effect, the deformation of a material in directions perpendicular to the specific direction of loading. The value of Poisson's ratio is the negative of the ratio of transverse strain to axial strain. For small values of these changes, is the amount of transversal elongation divided by the amount of axial compression. Most materials have Poisson's ratio values ranging between 0.0 and 0.5. For soft materials, such as rubber, where the bulk modulus is much higher than the shear modulus, Poisson's ratio is near 0.5. For open-cell polymer foams, Poisson's ratio is near zero, since the cells tend to collapse in compression. Many typical solids have Poisson's ratios in the range of 0.2-0.3. The ratio is named after the French mathematician and physicist Siméon Poisson.$

In physics, the Polyakov action is an action of the two-dimensional conformal field theory describing the worldsheet of a string in string theory. It was introduced by Stanley Deser and Bruno Zumino and independently by L. Brink, P. Di Vecchia and P. S. Howe in 1976, and has become associated with Alexander Polyakov after he made use of it in quantizing the string in 1981. The action reads

The scaled inverse chi-squared distribution is the distribution for x = 1/s², where s² is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ² = τ². The distribution is therefore parametrised by the two quantities ν and τ², referred to as the number of chi-squared degrees of freedom and the scaling parameter, respectively.

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice (1907–1986).

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

In mathematics, the theory of optimal stopping or early stopping is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost. Optimal stopping problems can be found in areas of statistics, economics, and mathematical finance. A key example of an optimal stopping problem is the secretary problem. Optimal stopping problems can often be written in the form of a Bellman equation, and are therefore often solved using dynamic programming.

In differential geometry, normal coordinates at a point p in a differentiable manifold equipped with a symmetric affine connection are a local coordinate system in a neighborhood of p obtained by applying the exponential map to the tangent space at p. In a normal coordinate system, the Christoffel symbols of the connection vanish at the point p, thus often simplifying local calculations. In normal coordinates associated to the Levi-Civita connection of a Riemannian manifold, one can additionally arrange that the metric tensor is the Kronecker delta at the point p, and that the first partial derivatives of the metric at p vanish.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst $of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.$

Contact mechanics is the study of the deformation of solids that touch each other at one or more points. A central distinction in contact mechanics is between stresses acting perpendicular to the contacting bodies' surfaces and frictional stresses acting tangentially between the surfaces. Normal contact mechanics or frictionless contact mechanics focuses on normal stresses caused by applied normal forces and by the adhesion present on surfaces in close contact, even if they are clean and dry. Frictional contact mechanics emphasizes the effect of friction forces.

<span class="mw-page-title-main">Variance gamma process</span>

In the theory of stochastic processes, a part of the mathematical theory of probability, the variance gamma process (VG), also known as Laplace motion, is a Lévy process determined by a random time change. The process has finite moments distinguishing it from many Lévy processes. There is no diffusion component in the VG process and it is thus a pure jump process. The increments are independent and follow a variance-gamma distribution, which is a generalization of the Laplace distribution.

The Brownian motion models for financial markets are based on the work of Robert C. Merton and Paul A. Samuelson, as extensions to the one-period market models of Harold Markowitz and William F. Sharpe, and are concerned with defining the concepts of financial assets and markets, portfolios, gains and wealth in terms of continuous-time stochastic processes.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product $is a product distribution .$

In statistics, the matrix t-distribution is the generalization of the multivariate t-distribution from vectors to matrices. The matrix t-distribution shares the same relationship with the multivariate t-distribution that the matrix normal distribution shares with the multivariate normal distribution. For example, the matrix t-distribution is the compound distribution that results from sampling from a matrix normal distribution having sampled the covariance matrix of the matrix normal from an inverse Wishart distribution.

Stochastic portfolio theory (SPT) is a mathematical theory for analyzing stock market structure and portfolio behavior introduced by E. Robert Fernholz in 2002. It is descriptive as opposed to normative, and is consistent with the observed behavior of actual markets. Normative assumptions, which serve as a basis for earlier theories like modern portfolio theory (MPT) and the capital asset pricing model (CAPM), are absent from SPT.

Nabarro–Herring creep is a mode of deformation of crystalline materials that occurs at low stresses and held at elevated temperatures in fine-grained materials. In Nabarro–Herring creep, atoms diffuse through the crystals, and the creep rate varies inversely with the square of the grain size so fine-grained materials creep faster than coarser-grained ones. NH creep is solely controlled by diffusional mass transport. This type of creep results from the diffusion of vacancies from regions of high chemical potential at grain boundaries subjected to normal tensile stresses to regions of lower chemical potential where the average tensile stresses across the grain boundaries are zero. Self-diffusion within the grains of a polycrystalline solid can cause the solid to yield to an applied shearing stress, the yielding being caused by a diffusional flow of matter within each crystal grain away from boundaries where there is a normal pressure and toward those where there is a normal tension. Atoms migrating in the opposite direction account for the creep strain. The creep strain rate is derived in the next section. NH creep is more important in ceramics than metals as dislocation motion is more difficult to effect in ceramics.

An additive process, in probability theory, is a cadlag, continuous in probability stochastic process with independent increments. An additive process is the generalization of a Lévy process. An example of an additive process is a Brownian motion with a time-dependent drift. The additive process was introduced by Paul Lévy in 1937.

References

1 2 H. J. Kushner, "A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise", J. Basic Eng 86(1), 97–106 (Mar 01, 1964).
↑ Dario Ballabio, "Una nuova classe di algoritmi stocastici per l'ottimizzazione globale" (A new class of stochastic algorithms for global optimization), University of Milan, Institute of Mathematics, doctoral dissertation presented on July 12th 1978, pp. 29–33.
↑ János D. Pintér, Global Optimization in Action: Continuous and Lipschitz Optimization, 1996 Springer Science & Business Media, page 57.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[4] The theorem, as set out and shown for the case of the minimum of the Wiener process, also applies to the maximum.

[:0-1] 1 2 H. J. Kushner, "A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise", J. Basic Eng 86(1), 97–106 (Mar 01, 1964).

[:1-2] Dario Ballabio, "Una nuova classe di algoritmi stocastici per l'ottimizzazione globale" (A new class of stochastic algorithms for global optimization), University of Milan, Institute of Mathematics, doctoral dissertation presented on July 12th 1978, pp. 29–33.

[3] János D. Pintér, Global Optimization in Action: Continuous and Lipschitz Optimization, 1996 Springer Science & Business Media, page 57.

[1]

[2]

[3]

[note 1]