Markov reward model

Last updated September 07, 2022

In probability theory, a Markov reward model or Markov reward process is a stochastic process which extends either a Markov chain or continuous-time Markov chain by adding a reward rate to each state. An additional variable records the reward accumulated up to the current time.^[1] Features of interest in the model include expected reward at a given time and expected time to accumulate a given reward.^[2] The model appears in Ronald A. Howard's book.^[3] The models are often studied in the context of Markov decision processes where a decision strategy can impact the rewards received.

Markov chain

See Markov Chain

See Markov chain Monte Carlo

Continuous-time Markov chain

The accumulated reward at a time t can be computed numerically over the time domain or by evaluating the linear hyperbolic system of equations which describe the accumulated reward using transform methods or finite difference methods.^[4]

Related Research Articles

Stochastic process Collection of random variables

In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner. Examples include the growth of a bacterial population, an electrical current fluctuating due to thermal noise, or the movement of a gas molecule. Stochastic processes have applications in many disciplines such as biology, chemistry, ecology, neuroscience, physics, image processing, signal processing, control theory, information theory, computer science, cryptography and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.

In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, substitution matrix, or Markov matrix. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. There are several different definitions and types of stochastic matrices:

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the Metropolis–Hastings algorithm.

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov as they are an extension of Markov chains.

A mathematical or physical process is time-reversible if the dynamics of the process remain well-defined when the sequence of time-states is reversed.

In probability theory and machine learning, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines, who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine. The multi-armed bandit problem also falls into the broad category of stochastic scheduling.

A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distributions. It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occurs may itself be a stochastic process. The distribution can be represented by a random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of the Markov process represents one of the phases.

The Gittins index is a measure of the reward that can be achieved through a given stochastic process with certain properties, namely: the process has an ultimate termination state and evolves with an option, at each intermediate state, of terminating. Upon terminating at a given state, the reward achieved is the sum of the probabilistic expected rewards associated with every state from the actual terminating state to the ultimate terminal state, inclusive. The index is a real scalar.

In queueing models, a discipline within the mathematical theory of probability, the quasi-birth–death process describes a generalisation of the birth–death process. As with the birth-death process it moves up and down between levels one at a time, but the time between these transitions can have a more complicated distribution encoded in the blocks.

In queueing theory, a discipline within the mathematical theory of probability, an M/M/1 queue represents the queue length in a system having a single server, where arrivals are determined by a Poisson process and job service times have an exponential distribution. The model name is written in Kendall's notation. The model is the most elementary of queueing models and an attractive object of study as closed-form expressions can be obtained for many metrics of interest in this model. An extension of this model with more than one server is the M/M/c queue.

In probability theory, a product-form solution is a particularly efficient form of solution for determining some metric of a system with distinct sub-components, where the metric for the collection of components can be written as a product of the metric across the different components. Using capital Pi notation a product-form solution has algebraic form

In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it. Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property.

In queueing theory, a discipline within the mathematical theory of probability, a fork–join queue is a queue where incoming jobs are split on arrival for service by numerous servers and joined before departure. The model is often used for parallel computations or systems where products need to be obtained simultaneously from different suppliers. The key quantity of interest in this model is usually the time taken to service a complete job. The model has been described as a "key model for the performance analysis of parallel and distributed systems." Few analytical results exist for fork–join queues, but various approximations are known.

In probability theory, uniformization method, is a method to compute transient solutions of finite state continuous-time Markov chains, by approximating the process by a discrete-time Markov chain. The original chain is scaled by the fastest transition rate γ, so that transitions occur at the same rate in every state, hence the name. The method is simple to program and efficiently calculates an approximation to the transient distribution at a single point in time. The method was first introduced by Winfried Grassmann in 1977.

In queueing theory, a discipline within the mathematical theory of probability, an M/G/1 queue is a queue model where arrivals are Markovian, service times have a General distribution and there is a single server. The model name is written in Kendall's notation, and is an extension of the M/M/1 queue, where service times must be exponentially distributed. The classic application of the M/G/1 queue is to model performance of a fixed head hard disk.

In queueing theory, a discipline within the mathematical theory of probability, a fluid queue is a mathematical model used to describe the fluid level in a reservoir subject to randomly determined periods of filling and emptying. The term dam theory was used in earlier literature for these models. The model has been used to approximate discrete models, model the spread of wildfires, in ruin theory and to model high speed data networks. The model applies the leaky bucket algorithm to a stochastic source.

In probability theory, the matrix analytic method is a technique to compute the stationary probability distribution of a Markov chain which has a repeating structure and a state space which grows unboundedly in no more than one dimension. Such models are often described as M/G/1 type Markov chains because they can describe transitions in an M/G/1 queue. The method is a more complicated version of the matrix geometric method and is the classical solution method for M/G/1 chains.

In queueing theory, a discipline within the mathematical theory of probability, the G/G/1 queue represents the queue length in a system with a single server where interarrival times have a general distribution and service times have a (different) general distribution. The evolution of the queue can be described by the Lindley equation.

References

↑ Begain, K.; Bolch, G.; Herold, H. (2001). "Theoretical Background". Practical Performance Modeling . pp. 9. doi:10.1007/978-1-4615-1387-2_2. ISBN 978-1-4613-5528-1.
↑ Li, Q. L. (2010). "Markov Reward Processes". Constructive Computation in Stochastic Models with Applications. pp. 526–573. doi:10.1007/978-3-642-11492-2_10. ISBN 978-3-642-11491-5.
↑ Howard, R.A. (1971). Dynamic Probabilistic Systems, Vol II: Semi-Markov and Decision Processes . New York: Wiley. ISBN 0471416657.
↑ Reibman, A.; Smith, R.; Trivedi, K. (1989). "Markov and Markov reward model transient analysis: An overview of numerical approaches" (PDF). European Journal of Operational Research. 40 (2): 257. doi:10.1016/0377-2217(89)90335-4.

This probability-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Begain, K.; Bolch, G.; Herold, H. (2001). "Theoretical Background". Practical Performance Modeling . pp. 9. doi:10.1007/978-1-4615-1387-2_2. ISBN 978-1-4613-5528-1.

[2] Li, Q. L. (2010). "Markov Reward Processes". Constructive Computation in Stochastic Models with Applications. pp. 526–573. doi:10.1007/978-3-642-11492-2_10. ISBN 978-3-642-11491-5.

[3] Howard, R.A. (1971). Dynamic Probabilistic Systems, Vol II: Semi-Markov and Decision Processes . New York: Wiley. ISBN 0471416657.

[4] Reibman, A.; Smith, R.; Trivedi, K. (1989). "Markov and Markov reward model transient analysis: An overview of numerical approaches" (PDF). European Journal of Operational Research. 40 (2): 257. doi:10.1016/0377-2217(89)90335-4.

[1]

[2]

[3]

[4]

Markov reward model

Contents

Markov chain

Continuous-time Markov chain

Related Research Articles

References