Memorylessness

Last updated

In probability and statistics, memorylessness is a property of certain probability distributions. It describes situations where the time already spent waiting for an event does not affect how much longer the wait will be. To model memoryless situations accurately, we have to disregard the past state of the system – the probabilities remain unaffected by the history of the process. [1]

Contents

Only two kinds of distributions are memoryless: geometric and exponential probability distributions.

Waiting time examples

With memory

Most phenomena are not memoryless, which means that observers will obtain information about them over time. For example, suppose that X is a random variable, the lifetime of a car engine, expressed in terms of "number of miles driven until the engine breaks down". It is clear, based on our intuition, that an engine which has already been driven for 300,000 miles will have a much lower X than would a second (equivalent) engine which has only been driven for 1,000 miles. Hence, this random variable would not have the memorylessness property.

Without memory

In contrast, let us examine a situation which would exhibit memorylessness. Imagine a long hallway, lined on one wall with thousands of safes. Each safe has a dial with 500 positions, and each has been assigned an opening position at random. Imagine that an eccentric person walks down the hallway, stopping once at each safe to make a single random attempt to open it. In this case, we might define random variable X as the lifetime of their search, expressed in terms of "number of attempts the person must make until they successfully open a safe". In this case, E[X] will always be equal to the value of 500, regardless of how many attempts have already been made. Each new attempt has a (1/500) chance of succeeding, so the person is likely to open exactly one safe sometime in the next 500 attempts but with each new failure they make no "progress" toward ultimately succeeding. Even if the safe-cracker has just failed 499 consecutive times (or 4,999 times), we expect to wait 500 more attempts until we observe the next success. If, instead, this person focused their attempts on a single safe, and "remembered" their previous attempts to open it, they would be guaranteed to open the safe after, at most, 500 attempts (and, in fact, at onset would only expect to need 250 attempts, not 500).

The universal law of radioactive decay, which describes the time until a given radioactive particle decays, is a real-life example of memorylessness. An often used (theoretical) example of memorylessness in queueing theory is the time a storekeeper must wait before the arrival of the next customer.

Discrete memorylessness

If a discrete random variable is memoryless, then it satisfies where and are natural numbers. The equality is still true when is substituted for on the left hand side of the equation. [2]

The only discrete random variable that is memoryless is the geometric random variable taking values in . [3] This random variable describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. [4] The memorylessness property asserts that the number of previously failed trials has no effect on the number of future trials needed for a success.

Geometric random variables can also be defined as taking values in , which describes the number of failed trials before the first success in a sequence of independent and identically distributed bernoulli trials. These random variables do not satisfy the memoryless condition stated above; however they do satisfy a slightly modified memoryless condition: [5]

Similar to the first definition, only discrete random variables that satisfy this memoryless condition are geometric random variables taking values in . In the continuous case, these two definitions of memorylessness are equivalent.

Continuous memorylessness

If a continuous random variable is memoryless, then it satisfieswhere and are nonnegative real numbers. [6] The equality is still true when is substituted. [7]

The only continuous random variable that is memoryless is the exponential random variable. It models random processes like time between consecutive events. [8] The memorylessness property asserts that the amount of time since the previous event has no effect on the future time until the next event occurs.

Exponential distribution and memorylessness proof

The only memoryless continuous probability distribution is the exponential distribution, shown in the following proof: [9]

First, define , also known as the distribution's survival function. From the memorylessness property and the definition of conditional probability, it follows that

This gives the functional equation which implies where is a natural number. Similarly, where is a natural number, excluding . Therefore, all rational numbers satisfySince is continuous and the set of rational numbers is dense in the set of real numbers, where is a nonnegative real number. When , As a result, where .

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Geometric distribution</span> Probability distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

<span class="mw-page-title-main">Markov chain</span> Random process independent of past history

A Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov.

<span class="mw-page-title-main">Probability mass function</span> Discrete-variable probability distribution

In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than is at most , where is any positive constant and is the standard deviation.

In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive constant. Markov's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality.

<span class="mw-page-title-main">Martingale (probability theory)</span> Model in probability theory

In probability theory, a martingale is a sequence of random variables for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

<span class="mw-page-title-main">Continuous uniform distribution</span> Uniform distribution on an interval

In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, and which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated where stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable under no constraint other than that it is contained in the distribution's support.

In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In probability theory and statistics, the Rademacher distribution is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being −1.

In probability theory, a probability distribution is infinitely divisible if it can be expressed as the probability distribution of the sum of an arbitrary number of independent and identically distributed (i.i.d.) random variables. The characteristic function of any infinitely divisible distribution is then called an infinitely divisible characteristic function.

In probability theory and statistics, a stochastic order quantifies the concept of one random variable being "bigger" than another. These are usually partial orders, so that one random variable may be neither stochastically greater than, less than, nor equal to another random variable . Many different orders exist, which have different applications.

Inequalities are very important in the study of information theory. There are a number of different contexts in which these inequalities appear.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value. The deviation or other function of the random variable can be thought of as a secondary random variable. The simplest example of the concentration of such a secondary random variable is the CDF of the first random variable which concentrates the probability to unity. If an analytic form of the CDF is available this provides a concentration equality that provides the exact probability of concentration. It is precisely when the CDF is difficult to calculate or even the exact form of the first random variable is unknown that the applicable concentration inequalities provide useful insight.

References

  1. "Notes on Memoryless Random Variables" (PDF).
  2. Chattamvelli, Rajan; Shanmugam, Ramalingam (2020). Discrete Distributions in Engineering and the Applied Sciences. Synthesis Lectures on Mathematics & Statistics. Cham: Springer International Publishing. p. 71. doi:10.1007/978-3-031-02425-2. ISBN   978-3-031-01297-6.
  3. Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. p. 50. doi:10.1007/1-84628-168-7. ISBN   978-1-85233-896-1.
  4. Nagel, Werner; Steyer, Rolf (2017-04-04). Probability and Conditional Expectation: Fundamentals for the Empirical Sciences. Wiley Series in Probability and Statistics (1st ed.). Wiley. pp. 260–261. doi:10.1002/9781119243496. ISBN   978-1-119-24352-6.
  5. Weisstein, Eric W. "Memoryless". mathworld.wolfram.com. Retrieved 2024-07-25.
  6. Pitman, Jim (1993). Probability. New York, NY: Springer New York. p. 279. doi:10.1007/978-1-4612-4374-8. ISBN   978-0-387-94594-1.
  7. Brémaud, Pierre (2024). An Introduction to Applied Probability. Texts in Applied Mathematics. Vol. 77. Cham: Springer International Publishing. p. 84. doi:10.1007/978-3-031-49306-5. ISBN   978-3-031-49305-8.
  8. Bas, Esra (2019). Basics of Probability and Stochastic Processes. Cham: Springer International Publishing. p. 74. doi:10.1007/978-3-030-32323-3. ISBN   978-3-030-32322-6.
  9. Riposo, Julien (2023). Some Fundamentals of Mathematics of Blockchain. Cham: Springer Nature Switzerland. pp. 8–9. doi:10.1007/978-3-031-31323-3. ISBN   978-3-031-31322-6.