In probability and statistics, memorylessness is a property of certain probability distributions. It describes situations where the time already spent waiting for an event does not affect how much longer the wait will be. To model memoryless situations accurately, we have to disregard the past state of the system – the probabilities remain unaffected by the history of the process. [1]
Only two kinds of distributions are memoryless: geometric and exponential probability distributions.
Most phenomena are not memoryless, which means that observers will obtain information about them over time. For example, suppose that X is a random variable, the lifetime of a car engine, expressed in terms of "number of miles driven until the engine breaks down". It is clear, based on our intuition, that an engine which has already been driven for 300,000 miles will have a much lower X than would a second (equivalent) engine which has only been driven for 1,000 miles. Hence, this random variable would not have the memorylessness property.
In contrast, let us examine a situation which would exhibit memorylessness. Imagine a long hallway, lined on one wall with thousands of safes. Each safe has a dial with 500 positions, and each has been assigned an opening position at random. Imagine that an eccentric person walks down the hallway, stopping once at each safe to make a single random attempt to open it. In this case, we might define random variable X as the lifetime of their search, expressed in terms of "number of attempts the person must make until they successfully open a safe". In this case, E[X] will always be equal to the value of 500, regardless of how many attempts have already been made. Each new attempt has a (1/500) chance of succeeding, so the person is likely to open exactly one safe sometime in the next 500 attempts – but with each new failure they make no "progress" toward ultimately succeeding. Even if the safe-cracker has just failed 499 consecutive times (or 4,999 times), we expect to wait 500 more attempts until we observe the next success. If, instead, this person focused their attempts on a single safe, and "remembered" their previous attempts to open it, they would be guaranteed to open the safe after, at most, 500 attempts (and, in fact, at onset would only expect to need 250 attempts, not 500).
The universal law of radioactive decay, which describes the time until a given radioactive particle decays, is a real-life example of memorylessness. An often used (theoretical) example of memorylessness in queueing theory is the time a storekeeper must wait before the arrival of the next customer.
If a discrete random variable is memoryless, then it satisfies where and are natural numbers. The equality is still true when is substituted for on the left hand side of the equation. [2]
The only discrete random variable that is memoryless is the geometric random variable taking values in . [3] This random variable describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. [4] The memorylessness property asserts that the number of previously failed trials has no effect on the number of future trials needed for a success.
Geometric random variables can also be defined as taking values in , which describes the number of failed trials before the first success in a sequence of independent and identically distributed bernoulli trials. These random variables do not satisfy the memoryless condition stated above; however they do satisfy a slightly modified memoryless condition: [5]
Similar to the first definition, only discrete random variables that satisfy this memoryless condition are geometric random variables taking values in . In the continuous case, these two definitions of memorylessness are equivalent.
If a continuous random variable is memoryless, then it satisfieswhere and are nonnegative real numbers. [6] The equality is still true when is substituted. [7]
The only continuous random variable that is memoryless is the exponential random variable. It models random processes like time between consecutive events. [8] The memorylessness property asserts that the amount of time since the previous event has no effect on the future time until the next event occurs.
The only memoryless continuous probability distribution is the exponential distribution, shown in the following proof: [9]
First, define , also known as the distribution's survival function. From the memorylessness property and the definition of conditional probability, it follows that
This gives the functional equation which implies where is a natural number. Similarly, where is a natural number, excluding . Therefore, all rational numbers satisfySince is continuous and the set of rational numbers is dense in the set of real numbers, where is a nonnegative real number. When , As a result, where .
In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
A Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov.
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.
In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than is at most , where is any positive constant and is the standard deviation.
In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive constant. Markov's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality.
In probability theory, a martingale is a sequence of random variables for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.
In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.
A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.
In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, and which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated where stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable under no constraint other than that it is contained in the distribution's support.
In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
In probability theory and statistics, the Rademacher distribution is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being −1.
In probability theory, a probability distribution is infinitely divisible if it can be expressed as the probability distribution of the sum of an arbitrary number of independent and identically distributed (i.i.d.) random variables. The characteristic function of any infinitely divisible distribution is then called an infinitely divisible characteristic function.
In probability theory and statistics, a stochastic order quantifies the concept of one random variable being "bigger" than another. These are usually partial orders, so that one random variable may be neither stochastically greater than, less than, nor equal to another random variable . Many different orders exist, which have different applications.
Inequalities are very important in the study of information theory. There are a number of different contexts in which these inequalities appear.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.
In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value. The deviation or other function of the random variable can be thought of as a secondary random variable. The simplest example of the concentration of such a secondary random variable is the CDF of the first random variable which concentrates the probability to unity. If an analytic form of the CDF is available this provides a concentration equality that provides the exact probability of concentration. It is precisely when the CDF is difficult to calculate or even the exact form of the first random variable is unknown that the applicable concentration inequalities provide useful insight.