Memorylessness

Last updated

In probability and statistics, memorylessness is a property of certain probability distributions. It describes situations where the time you've already waited for an event doesn't affect how much longer you'll have to wait. To model memoryless situations accurately, we have to disregard the past state of the system – the probabilities remain unaffected by the history of the process. [1]

Contents

Only two kinds of distributions are memoryless: geometric distributions of non-negative integers and the exponential distributions of non-negative real numbers.

In the context of Markov processes, memorylessness refers to the Markov property, [2] an even stronger assumption which implies that the properties of random variables related to the future depend only on relevant information about the current time, not on information from further in the past. The present article describes the use outside the Markov property.

Waiting time examples

With memory

Most phenomena are not memoryless, which means that observers will obtain information about them over time. For example, suppose that X is a random variable, the lifetime of a car engine, expressed in terms of "number of miles driven until the engine breaks down". It is clear, based on our intuition, that an engine which has already been driven for 300,000 miles will have a much lower X than would a second (equivalent) engine which has only been driven for 1,000 miles. Hence, this random variable would not have the memorylessness property.

Without memory

In contrast, let us examine a situation which would exhibit memorylessness. Imagine a long hallway, lined on one wall with thousands of safes. Each safe has a dial with 500 positions, and each has been assigned an opening position at random. Imagine that an eccentric person walks down the hallway, stopping once at each safe to make a single random attempt to open it. In this case, we might define random variable X as the lifetime of their search, expressed in terms of "number of attempts the person must make until they successfully open a safe". In this case, E[X] will always be equal to the value of 500, regardless of how many attempts have already been made. Each new attempt has a (1/500) chance of succeeding, so the person is likely to open exactly one safe sometime in the next 500 attempts but with each new failure they make no "progress" toward ultimately succeeding. Even if the safe-cracker has just failed 499 consecutive times (or 4,999 times), we expect to wait 500 more attempts until we observe the next success. If, instead, this person focused their attempts on a single safe, and "remembered" their previous attempts to open it, they would be guaranteed to open the safe after, at most, 500 attempts (and, in fact, at onset would only expect to need 250 attempts, not 500).

Real-life examples of memorylessness include the universal law of radioactive decay, which describes the time until a given radioactive particle decays, and, potentially, the time until the discovery of a new Bitcoin block, though this has been put in question. [3] An often used (theoretical) example of memorylessness in queueing theory is the time a storekeeper must wait before the arrival of the next customer.

Discrete memorylessness

Suppose X is a discrete random variable whose values lie in the set {0, 1, 2, ...}. The probability distribution of X is memoryless precisely if for any m and n in {0, 1, 2, ...}, we have

Here, Pr(X > m + n | Xm) denotes the conditional probability that the value of X is greater than m + n given that it is greater than or equal to m.

The only memoryless discrete probability distributions are the geometric distributions, which count the number of independent, identically distributed ("failed") Bernoulli trials needed to get one "success"; that is, the geometric distribution supported on {0, 1, 2, 3, ... }. Note that the geometric distribution supported on {1, 2, ... } is not memoryless.

Note that the above definition applies to the definition of geometric distribution with support {0, 1, 2, ...}. The alternative parameterization with support {1, 2, ...} corresponds to a slightly different definition of discrete memorylessness: namely, that

A common misunderstanding

"Memorylessness" of the probability distribution of the number of failures X before the first success means that, for example,

It does not mean that

which would be true only if the events X > 40 and X ≥ 30 were independent, i.e.

Continuous memorylessness

Suppose X is a continuous random variable whose values lie in the non-negative real numbers [0, ∞). The probability distribution of X is memoryless precisely if for any non-negative real numbers t and s, we have

This is similar to the discrete version, except that s and t are constrained only to be non-negative real numbers instead of integers. Rather than counting trials until the first "success", for example, we may be marking time until the arrival of the first phone call at a switchboard.

The memoryless distribution is an exponential distribution

The only memoryless continuous probability distribution is the exponential distribution, so memorylessness completely characterizes the exponential distribution among all continuous ones. The property is derived through the following proof:

To see this, first define the survival function, S, as

Note that S(t) is then monotonically decreasing. From the relation

and the definition of conditional probability, it follows that

This gives the functional equation (which is a result of the memorylessness property):

From this, we must have for example:

In general:

The only continuous function that will satisfy this equation for any positive, rational a is:

where

Therefore, since S(a) is a probability and must have , then any memorylessness function must be an exponential.

Put a different way, S is a monotone decreasing function (meaning that for times then )

The functional equation alone will imply that S restricted to rational multiples of any particular number is an exponential function. Combined with the fact that S is monotone, this implies that S over its whole domain is an exponential function.

Notes

  1. "Notes on Memoryless Random Variables" (PDF).
  2. "Markov Chains and Random Walks" (PDF).
  3. Bowden, Rory; Keeler, Holger Paul; Krzezinski, Anthony E.; Taylor, Peter G. (2018). "Block arrivals in the Bitcoin blockchain". arXiv: 1801.07447 [cs.CR].

Related Research Articles

<span class="mw-page-title-main">Cumulative distribution function</span> Probability that random variable X is less than or equal to x

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Geometric distribution</span> Probability distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

<span class="mw-page-title-main">Markov chain</span> Random process independent of past history

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.

<span class="mw-page-title-main">Probability mass function</span> Discrete-variable probability distribution

In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than is at most , where is any positive constant.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are often employed for their succinct description of the sequence of probabilities Pr(X = i) in the probability mass function for a random variable X, and to make available the well-developed theory of power series with non-negative coefficients.

In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive constant. Markov's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality.

<span class="mw-page-title-main">Markov property</span> Memoryless property of a stochastic process

In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov property, except that the meaning of "present" is defined in terms of a random variable known as a stopping time.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) holding times that have finite mean. A renewal-reward process additionally has a random sequence of rewards incurred at each holding time, which are IID but need not be independent of the holding times.

A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distributions. It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occurs may itself be a stochastic process. The distribution can be represented by a random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of the Markov process represents one of the phases.

Markov renewal processes are a class of random processes in probability and statistics that generalize the class of Markov jump processes. Other classes of random processes, such as Markov chains and Poisson processes, can be derived as special cases among the class of Markov renewal processes, while Markov renewal processes are special cases among the more general class of renewal processes.

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value.

In the theory of renewal processes, a part of the mathematical theory of probability, the residual time or the forward recurrence time is the time between any given time and the next epoch of the renewal process under consideration. In the context of random walks, it is also known as overshoot. Another way to phrase residual time is "how much more time is there to wait?".

References