In probability theory, Wald's equation, Wald's identity [1] or Wald's lemma [2] is an important identity that simplifies the calculation of the expected value of the sum of a random number of random quantities. In its simplest form, it relates the expectation of a sum of randomly many finite-mean, independent and identically distributed random variables to the expected number of terms in the sum and the random variables' common expectation under the condition that the number of terms in the sum is independent of the summands.
The equation is named after the mathematician Abraham Wald. An identity for the second moment is given by the Blackwell–Girshick equation. [3]
Let (Xn)n∈ be a sequence of real-valued, independent and identically distributed random variables and let N ≥ 0 be an integer-valued random variable that is independent of the sequence (Xn)n∈. Suppose that N and the Xn have finite expectations. Then
Roll a six-sided dice. Take the number on the die (call it N) and roll that number of six-sided dice to get the numbers X1, . . . , XN, and add up their values. By Wald's equation, the resulting value on average is
Let (Xn)n∈ be an infinite sequence of real-valued random variables and let N be a nonnegative integer-valued random variable.
Assume that:
Then the random sums
are integrable and
If, in addition,
then
Remark: Usually, the name Wald's equation refers to this last equality.
Clearly, assumption ( 1 ) is needed to formulate assumption ( 2 ) and Wald's equation. Assumption ( 2 ) controls the amount of dependence allowed between the sequence (Xn)n∈ and the number N of terms; see the counterexample below for the necessity. Note that assumption ( 2 ) is satisfied when N is a stopping time for a sequence of independent random variables (Xn)n∈.[ citation needed ] Assumption ( 3 ) is of more technical nature, implying absolute convergence and therefore allowing arbitrary rearrangement of an infinite series in the proof.
If assumption ( 5 ) is satisfied, then assumption ( 3 ) can be strengthened to the simpler condition
Indeed, using assumption ( 6 ),
and the last series equals the expectation of N [ Proof ], which is finite by assumption ( 5 ). Therefore, ( 5 ) and ( 6 ) imply assumption ( 3 ).
Assume in addition to ( 1 ) and ( 5 ) that
Then all the assumptions ( 1 ), ( 2 ), ( 5 ) and ( 6 ), hence also ( 3 ) are satisfied. In particular, the conditions ( 4 ) and ( 8 ) are satisfied if
Note that the random variables of the sequence (Xn)n∈ don't need to be independent.
The interesting point is to admit some dependence between the random number N of terms and the sequence (Xn)n∈. A standard version is to assume ( 1 ), ( 5 ), ( 8 ) and the existence of a filtration (Fn)n∈0 such that
Then ( 10 ) implies that the event {N ≥ n} = {N ≤ n – 1}c is in Fn–1, hence by ( 11 ) independent of Xn. This implies ( 2 ), and together with ( 8 ) it implies ( 6 ).
For convenience (see the proof below using the optional stopping theorem) and to specify the relation of the sequence (Xn)n∈ and the filtration (Fn)n∈0, the following additional assumption is often imposed:
Note that ( 11 ) and ( 12 ) together imply that the random variables (Xn)n∈ are independent.
An application is in actuarial science when considering the total claim amount follows a compound Poisson process
within a certain time period, say one year, arising from a random number N of individual insurance claims, whose sizes are described by the random variables (Xn)n∈. Under the above assumptions, Wald's equation can be used to calculate the expected total claim amount when information about the average claim number per year and the average claim size is available. Under stronger assumptions and with more information about the underlying distributions, Panjer's recursion can be used to calculate the distribution of SN.
Let N be an integrable, 0-valued random variable, which is independent of the integrable, real-valued random variable Z with E[Z] = 0. Define Xn = (–1)nZ for all n ∈ . Then assumptions ( 1 ), ( 5 ), ( 7 ), and ( 8 ) with C := E[|Z|] are satisfied, hence also ( 2 ) and ( 6 ), and Wald's equation applies. If the distribution of Z is not symmetric, then ( 9 ) does not hold. Note that, when Z is not almost surely equal to the zero random variable, then ( 11 ) and ( 12 ) cannot hold simultaneously for any filtration (Fn)n∈, because Z cannot be independent of itself as E[Z2] = (E[Z])2 = 0 is impossible.
Let (Xn)n∈ be a sequence of independent, symmetric, and {–1, +1}-valued random variables. For every n ∈ let Fn be the σ-algebra generated by X1, . . . , Xn and define N = n when Xn is the first random variable taking the value +1. Note that P(N = n) = 1/2n, hence E[N] < ∞ by the ratio test. The assumptions ( 1 ), ( 5 ) and ( 9 ), hence ( 4 ) and ( 8 ) with C = 1, ( 10 ), ( 11 ), and ( 12 ) hold, hence also ( 2 ), and ( 6 ) and Wald's equation applies. However, ( 7 ) does not hold, because N is defined in terms of the sequence (Xn)n∈. Intuitively, one might expect to have E[SN] > 0 in this example, because the summation stops right after a one, thereby apparently creating a positive bias. However, Wald's equation shows that this intuition is misleading.
Consider a sequence (Xn)n∈ of i.i.d. (Independent and identically distributed random variables) random variables, taking each of the two values 0 and 1 with probability 1/2 (actually, only X1 is needed in the following). Define N = 1 – X1. Then SN is identically equal to zero, hence E[SN] = 0, but E[X1] = 1/2 and E[N] = 1/2 and therefore Wald's equation does not hold. Indeed, the assumptions ( 1 ), ( 3 ), ( 4 ) and ( 5 ) are satisfied, however, the equation in assumption ( 2 ) holds for all n ∈ except for n = 1.[ citation needed ]
Very similar to the second example above, let (Xn)n∈ be a sequence of independent, symmetric random variables, where Xn takes each of the values 2n and –2n with probability 1/2. Let N be the first n ∈ such that Xn = 2n. Then, as above, N has finite expectation, hence assumption ( 5 ) holds. Since E[Xn] = 0 for all n ∈ , assumptions ( 1 ) and ( 4 ) hold. However, since SN = 1 almost surely, Wald's equation cannot hold.
Since N is a stopping time with respect to the filtration generated by (Xn)n∈, assumption ( 2 ) holds, see above. Therefore, only assumption ( 3 ) can fail, and indeed, since
and therefore P(N ≥ n) = 1/2n–1 for every n ∈ , it follows that
Assume ( 1 ), ( 5 ), ( 8 ), ( 10 ), ( 11 ) and ( 12 ). Using assumption ( 1 ), define the sequence of random variables
Assumption ( 11 ) implies that the conditional expectation of Xn given Fn–1 equals E[Xn] almost surely for every n ∈ , hence (Mn)n∈0 is a martingale with respect to the filtration (Fn)n∈0 by assumption ( 12 ). Assumptions ( 5 ), ( 8 ) and ( 10 ) make sure that we can apply the optional stopping theorem, hence MN = SN – TN is integrable and
| (13) |
Due to assumption ( 8 ),
and due to assumption ( 5 ) this upper bound is integrable. Hence we can add the expectation of TN to both sides of Equation ( 13 ) and obtain by linearity
Remark: Note that this proof does not cover the above example with dependent terms.
This proof uses only Lebesgue's monotone and dominated convergence theorems. We prove the statement as given above in three steps.
We first show that the random sum SN is integrable. Define the partial sums
| (14) |
Since N takes its values in 0 and since S0 = 0, it follows that
The Lebesgue monotone convergence theorem implies that
By the triangle inequality,
Using this upper estimate and changing the order of summation (which is permitted because all terms are non-negative), we obtain
| (15) |
where the second inequality follows using the monotone convergence theorem. By assumption ( 3 ), the infinite sequence on the right-hand side of ( 15 ) converges, hence SN is integrable.
We now show that the random sum TN is integrable. Define the partial sums
| (16) |
of real numbers. Since N takes its values in 0 and since T0 = 0, it follows that
As in step 1, the Lebesgue monotone convergence theorem implies that
By the triangle inequality,
Using this upper estimate and changing the order of summation (which is permitted because all terms are non-negative), we obtain
| (17) |
By assumption ( 2 ),
Substituting this into ( 17 ) yields
which is finite by assumption ( 3 ), hence TN is integrable.
To prove Wald's equation, we essentially go through the same steps again without the absolute value, making use of the integrability of the random sums SN and TN in order to show that they have the same expectation.
Using the dominated convergence theorem with dominating random variable |SN| and the definition of the partial sum Si given in ( 14 ), it follows that
Due to the absolute convergence proved in ( 15 ) above using assumption ( 3 ), we may rearrange the summation and obtain that
where we used assumption ( 1 ) and the dominated convergence theorem with dominating random variable |Xn| for the second equality. Due to assumption ( 2 ) and the σ-additivity of the probability measure,
Substituting this result into the previous equation, rearranging the summation (which is permitted due to absolute convergence, see ( 15 ) above), using linearity of expectation and the definition of the partial sum Ti of expectations given in ( 16 ),
By using dominated convergence again with dominating random variable |TN|,
If assumptions ( 4 ) and ( 5 ) are satisfied, then by linearity of expectation,
This completes the proof.
In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a large number of independently selected outcomes of a random variable.
In probability theory, the central limit theorem (CLT) establishes that, in many situations, for identically distributed independent samples, the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.
In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behavior that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behavior can be characterized: two readily understood behaviors are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution.
In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences that are also bounded. Informally, the theorems state that if a sequence is increasing and bounded above by a supremum, then the sequence will converge to the supremum; in the same way, if a sequence is decreasing and is bounded below by an infimum, it will converge to the infimum.
In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of Lp spaces.
In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.
In probability theory, a martingale is a sequence of random variables for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.
In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.
In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given that a certain set of "conditions" is known to occur. If the random variable can take on only a finite number of values, the “conditions” are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.
Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory. The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.
In mathematics, specifically the theory of elliptic functions, the nome is a special function that belongs to the non-elementary functions. This function is of great importance in the description of the elliptic functions, especially in the description of the modular identity of the Jacobi theta function, the Hermite elliptic transcendents and the Weber modular functions, that are used for solving equations of higher degrees.
Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) holding times that have finite mean. A renewal-reward process additionally has a random sequence of rewards incurred at each holding time, which are IID but need not be independent of the holding times.
In probability theory, an empirical process is a stochastic process that describes the proportion of objects in a system in a given state. For a process in a discrete state space a population continuous time Markov chain or Markov population model is a process which counts the number of objects in a given state . In mean field theory, limit theorems are considered and generalise the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.
In mathematics, the lemniscate elliptic functions are elliptic functions related to the arc length of the lemniscate of Bernoulli. They were first studied by Giulio Fagnano in 1718 and later by Leonhard Euler and Carl Friedrich Gauss, among others.
The Hewitt–Savage zero–one law is a theorem in probability theory, similar to Kolmogorov's zero–one law and the Borel–Cantelli lemma, that specifies that a certain type of event will either almost surely happen or almost surely not happen. It is sometimes known as the Savage-Hewitt law for symmetric events. It is named after Edwin Hewitt and Leonard Jimmie Savage.
In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality bounds how close an empirically determined distribution function will be to the distribution function from which the empirical samples are drawn. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality
In probability theory, Etemadi's inequality is a so-called "maximal inequality", an inequality that gives a bound on the probability that the partial sums of a finite collection of independent random variables exceed some specified bound. The result is due to Nasrollah Etemadi.
In probability theory, the optional stopping theorem says that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial expected value. Since martingales can be used to model the wealth of a gambler participating in a fair game, the optional stopping theorem says that, on average, nothing can be gained by stopping play based on the information obtainable so far. Certain conditions are necessary for this result to hold true. In particular, the theorem applies to doubling strategies.
In mathematics, the second moment method is a technique used in probability theory and analysis to show that a random variable has positive probability of being positive. More generally, the "moment method" consists of bounding the probability that a random variable fluctuates far from its mean, by using its moments.
In the theory of stochastic processes in discrete time, a part of the mathematical theory of probability, the Doob decomposition theorem gives a unique decomposition of every adapted and integrable stochastic process as the sum of a martingale and a predictable process starting at zero. The theorem was proved by and is named for Joseph L. Doob.