Wald's equation

Last updated

In probability theory, Wald's equation, Wald's identity [1] or Wald's lemma [2] is an important identity that simplifies the calculation of the expected value of the sum of a random number of random quantities. In its simplest form, it relates the expectation of a sum of randomly many finite-mean, independent and identically distributed random variables to the expected number of terms in the sum and the random variables' common expectation under the condition that the number of terms in the sum is independent of the summands.

Contents

The equation is named after the mathematician Abraham Wald. An identity for the second moment is given by the Blackwell–Girshick equation. [3]

Basic version

Let (Xn)n be a sequence of real-valued, independent and identically distributed random variables and let N ≥ 0 be an integer-valued random variable that is independent of the sequence (Xn)n. Suppose that N and the Xn have finite expectations. Then

Example

Roll a six-sided dice. Take the number on the die (call it N) and roll that number of six-sided dice to get the numbers X1, . . . , XN, and add up their values. By Wald's equation, the resulting value on average is

General version

Let (Xn)n be an infinite sequence of real-valued random variables and let N be a nonnegative integer-valued random variable.

Assume that:

1. (Xn)n are all integrable (finite-mean) random variables,
2. E[Xn1{Nn}] = E[Xn] P(Nn) for every natural number n, and
3. the infinite series satisfies

Then the random sums

are integrable and

If, in addition,

4. (Xn)n all have the same expectation, and
5. N has finite expectation,

then

Remark: Usually, the name Wald's equation refers to this last equality.

Discussion of assumptions

Clearly, assumption ( 1 ) is needed to formulate assumption ( 2 ) and Wald's equation. Assumption ( 2 ) controls the amount of dependence allowed between the sequence (Xn)n and the number N of terms; see the counterexample below for the necessity. Note that assumption ( 2 ) is satisfied when N is a stopping time for a sequence of independent random variables (Xn)n.[ citation needed ] Assumption ( 3 ) is of more technical nature, implying absolute convergence and therefore allowing arbitrary rearrangement of an infinite series in the proof.

If assumption ( 5 ) is satisfied, then assumption ( 3 ) can be strengthened to the simpler condition

6. there exists a real constant C such that E[|Xn| 1{Nn}] ≤ C P(Nn) for all natural numbers n.

Indeed, using assumption ( 6 ),

and the last series equals the expectation of N [ Proof ], which is finite by assumption ( 5 ). Therefore, ( 5 ) and ( 6 ) imply assumption ( 3 ).

Assume in addition to ( 1 ) and ( 5 ) that

7. N is independent of the sequence (Xn)n and
8. there exists a constant C such that E[|Xn|] ≤ C for all natural numbers n.

Then all the assumptions ( 1 ), ( 2 ), ( 5 ) and ( 6 ), hence also ( 3 ) are satisfied. In particular, the conditions ( 4 ) and ( 8 ) are satisfied if

9. the random variables (Xn)n all have the same distribution.

Note that the random variables of the sequence (Xn)n don't need to be independent.

The interesting point is to admit some dependence between the random number N of terms and the sequence (Xn)n. A standard version is to assume ( 1 ), ( 5 ), ( 8 ) and the existence of a filtration (Fn)n0 such that

10. N is a stopping time with respect to the filtration, and
11. Xn and Fn–1 are independent for every n.

Then ( 10 ) implies that the event {Nn} = {Nn – 1}c is in Fn–1, hence by ( 11 ) independent of Xn. This implies ( 2 ), and together with ( 8 ) it implies ( 6 ).

For convenience (see the proof below using the optional stopping theorem) and to specify the relation of the sequence (Xn)n and the filtration (Fn)n0, the following additional assumption is often imposed:

12. the sequence (Xn)n is adapted to the filtration (Fn)n, meaning the Xn is Fn-measurable for every n.

Note that ( 11 ) and ( 12 ) together imply that the random variables (Xn)n are independent.

Application

An application is in actuarial science when considering the total claim amount follows a compound Poisson process

within a certain time period, say one year, arising from a random number N of individual insurance claims, whose sizes are described by the random variables (Xn)n. Under the above assumptions, Wald's equation can be used to calculate the expected total claim amount when information about the average claim number per year and the average claim size is available. Under stronger assumptions and with more information about the underlying distributions, Panjer's recursion can be used to calculate the distribution of SN.

Examples

Example with dependent terms

Let N be an integrable, 0-valued random variable, which is independent of the integrable, real-valued random variable Z with E[Z] = 0. Define Xn = (–1)nZ for all n. Then assumptions ( 1 ), ( 5 ), ( 7 ), and ( 8 ) with C := E[|Z|] are satisfied, hence also ( 2 ) and ( 6 ), and Wald's equation applies. If the distribution of Z is not symmetric, then ( 9 ) does not hold. Note that, when Z is not almost surely equal to the zero random variable, then ( 11 ) and ( 12 ) cannot hold simultaneously for any filtration (Fn)n, because Z cannot be independent of itself as E[Z2] = (E[Z])2 = 0 is impossible.

Example where the number of terms depends on the sequence

Let (Xn)n be a sequence of independent, symmetric, and {–1, +1}-valued random variables. For every n let Fn be the σ-algebra generated by X1, . . . , Xn and define N = n when Xn is the first random variable taking the value +1. Note that P(N = n) = 1/2n, hence E[N] < ∞ by the ratio test. The assumptions ( 1 ), ( 5 ) and ( 9 ), hence ( 4 ) and ( 8 ) with C = 1, ( 10 ), ( 11 ), and ( 12 ) hold, hence also ( 2 ), and ( 6 ) and Wald's equation applies. However, ( 7 ) does not hold, because N is defined in terms of the sequence (Xn)n. Intuitively, one might expect to have E[SN] > 0 in this example, because the summation stops right after a one, thereby apparently creating a positive bias. However, Wald's equation shows that this intuition is misleading.

Counterexamples

A counterexample illustrating the necessity of assumption ( 2 )

Consider a sequence (Xn)n of i.i.d. (Independent and identically distributed random variables) random variables, taking each of the two values 0 and 1 with probability 1/2 (actually, only X1 is needed in the following). Define N = 1 – X1. Then SN is identically equal to zero, hence E[SN] = 0, but E[X1] = 1/2 and E[N] = 1/2 and therefore Wald's equation does not hold. Indeed, the assumptions ( 1 ), ( 3 ), ( 4 ) and ( 5 ) are satisfied, however, the equation in assumption ( 2 ) holds for all n except for n = 1.[ citation needed ]

A counterexample illustrating the necessity of assumption ( 3 )

Very similar to the second example above, let (Xn)n be a sequence of independent, symmetric random variables, where Xn takes each of the values 2n and –2n with probability 1/2. Let N be the first n such that Xn = 2n. Then, as above, N has finite expectation, hence assumption ( 5 ) holds. Since E[Xn] = 0 for all n, assumptions ( 1 ) and ( 4 ) hold. However, since SN = 1 almost surely, Wald's equation cannot hold.

Since N is a stopping time with respect to the filtration generated by (Xn)n, assumption ( 2 ) holds, see above. Therefore, only assumption ( 3 ) can fail, and indeed, since

and therefore P(Nn) = 1/2n–1 for every n, it follows that

A proof using the optional stopping theorem

Assume ( 1 ), ( 5 ), ( 8 ), ( 10 ), ( 11 ) and ( 12 ). Using assumption ( 1 ), define the sequence of random variables

Assumption ( 11 ) implies that the conditional expectation of Xn given Fn–1 equals E[Xn] almost surely for every n, hence (Mn)n0 is a martingale with respect to the filtration (Fn)n0 by assumption ( 12 ). Assumptions ( 5 ), ( 8 ) and ( 10 ) make sure that we can apply the optional stopping theorem, hence MN = SNTN is integrable and

 

 

 

 

(13)

Due to assumption ( 8 ),

and due to assumption ( 5 ) this upper bound is integrable. Hence we can add the expectation of TN to both sides of Equation ( 13 ) and obtain by linearity

Remark: Note that this proof does not cover the above example with dependent terms.

General proof

This proof uses only Lebesgue's monotone and dominated convergence theorems. We prove the statement as given above in three steps.

Step 1: Integrability of the random sum SN

We first show that the random sum SN is integrable. Define the partial sums

 

 

 

 

(14)

Since N takes its values in 0 and since S0 = 0, it follows that

The Lebesgue monotone convergence theorem implies that

By the triangle inequality,

Using this upper estimate and changing the order of summation (which is permitted because all terms are non-negative), we obtain

 

 

 

 

(15)

where the second inequality follows using the monotone convergence theorem. By assumption ( 3 ), the infinite sequence on the right-hand side of ( 15 ) converges, hence SN is integrable.

Step 2: Integrability of the random sum TN

We now show that the random sum TN is integrable. Define the partial sums

 

 

 

 

(16)

of real numbers. Since N takes its values in 0 and since T0 = 0, it follows that

As in step 1, the Lebesgue monotone convergence theorem implies that

By the triangle inequality,

Using this upper estimate and changing the order of summation (which is permitted because all terms are non-negative), we obtain

 

 

 

 

(17)

By assumption ( 2 ),

Substituting this into ( 17 ) yields

which is finite by assumption ( 3 ), hence TN is integrable.

Step 3: Proof of the identity

To prove Wald's equation, we essentially go through the same steps again without the absolute value, making use of the integrability of the random sums SN and TN in order to show that they have the same expectation.

Using the dominated convergence theorem with dominating random variable |SN| and the definition of the partial sum Si given in ( 14 ), it follows that

Due to the absolute convergence proved in ( 15 ) above using assumption ( 3 ), we may rearrange the summation and obtain that

where we used assumption ( 1 ) and the dominated convergence theorem with dominating random variable |Xn| for the second equality. Due to assumption ( 2 ) and the σ-additivity of the probability measure,

Substituting this result into the previous equation, rearranging the summation (which is permitted due to absolute convergence, see ( 15 ) above), using linearity of expectation and the definition of the partial sum Ti of expectations given in ( 16 ),

By using dominated convergence again with dominating random variable |TN|,

If assumptions ( 4 ) and ( 5 ) are satisfied, then by linearity of expectation,

This completes the proof.

Further generalizations

See also

Notes

  1. Janssen, Jacques; Manca, Raimondo (2006). "Renewal Theory". Applied Semi-Markov Processes . Springer. pp.  45–104. doi:10.1007/0-387-29548-8_2. ISBN   0-387-29547-X.
  2. Thomas Bruss, F.; Robertson, J. B. (1991). "'Wald's Lemma' for Sums of Order Statistics of i.i.d. Random Variables". Advances in Applied Probability. 23 (3): 612–623. doi:10.2307/1427625. JSTOR   1427625. S2CID   120678340.
  3. Blackwell, D.; Girshick, M. A. (1946). "On functions of sequences of independent chance vectors with applications to the problem of the 'random walk' in k dimensions". Ann. Math. Statist. 17 (3): 310–317. doi: 10.1214/aoms/1177730943 .

Related Research Articles

<span class="mw-page-title-main">Expected value</span> Average value of a random variable

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a large number of independently selected outcomes of a random variable.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for identically distributed independent samples, the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behavior that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behavior can be characterized: two readily understood behaviors are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution.

In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences that are also bounded. Informally, the theorems state that if a sequence is increasing and bounded above by a supremum, then the sequence will converge to the supremum; in the same way, if a sequence is decreasing and is bounded below by an infimum, it will converge to the infimum.

In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of Lp spaces.

In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.

<span class="mw-page-title-main">Martingale (probability theory)</span> Model in probability theory

In probability theory, a martingale is a sequence of random variables for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given that a certain set of "conditions" is known to occur. If the random variable can take on only a finite number of values, the “conditions” are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory. The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

In mathematics, specifically the theory of elliptic functions, the nome is a special function that belongs to the non-elementary functions. This function is of great importance in the description of the elliptic functions, especially in the description of the modular identity of the Jacobi theta function, the Hermite elliptic transcendents and the Weber modular functions, that are used for solving equations of higher degrees.

Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) holding times that have finite mean. A renewal-reward process additionally has a random sequence of rewards incurred at each holding time, which are IID but need not be independent of the holding times.

In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model is a process which counts the number of objects in a given state . In mean field theory, limit theorems are considered and generalise the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.

<span class="mw-page-title-main">Lemniscate elliptic functions</span> Mathematical functions

In mathematics, the lemniscate elliptic functions are elliptic functions related to the arc length of the lemniscate of Bernoulli. They were first studied by Giulio Fagnano in 1718 and later by Leonhard Euler and Carl Friedrich Gauss, among others.

The Hewitt–Savage zero–one law is a theorem in probability theory, similar to Kolmogorov's zero–one law and the Borel–Cantelli lemma, that specifies that a certain type of event will either almost surely happen or almost surely not happen. It is sometimes known as the Savage-Hewitt law for symmetric events. It is named after Edwin Hewitt and Leonard Jimmie Savage.

<span class="mw-page-title-main">Dvoretzky–Kiefer–Wolfowitz inequality</span> Statistical inequality

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality bounds how close an empirically determined distribution function will be to the distribution function from which the empirical samples are drawn. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality

In probability theory, Etemadi's inequality is a so-called "maximal inequality", an inequality that gives a bound on the probability that the partial sums of a finite collection of independent random variables exceed some specified bound. The result is due to Nasrollah Etemadi.

In probability theory, the optional stopping theorem says that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial expected value. Since martingales can be used to model the wealth of a gambler participating in a fair game, the optional stopping theorem says that, on average, nothing can be gained by stopping play based on the information obtainable so far. Certain conditions are necessary for this result to hold true. In particular, the theorem applies to doubling strategies.

In mathematics, the second moment method is a technique used in probability theory and analysis to show that a random variable has positive probability of being positive. More generally, the "moment method" consists of bounding the probability that a random variable fluctuates far from its mean, by using its moments.

In the theory of stochastic processes in discrete time, a part of the mathematical theory of probability, the Doob decomposition theorem gives a unique decomposition of every adapted and integrable stochastic process as the sum of a martingale and a predictable process starting at zero. The theorem was proved by and is named for Joseph L. Doob.

References