Sequential decision making

Last updated

Sequential decision making is a concept in control theory and operations research, which involves making a series of decisions over time to optimize an objective function, such as maximizing cumulative rewards or minimizing costs. In this framework, each decision influences subsequent choices and system outcomes, taking into account the current state, available actions, and the probabilistic nature of state transitions. [1] This process is used for modeling and regulation of dynamic systems, especially under uncertainty, and is commonly addressed using methods like Markov decision processes (MDPs) and dynamic programming. [2]

Related Research Articles

<span class="mw-page-title-main">Stochastic process</span> Collection of random variables

In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner. Examples include the growth of a bacterial population, an electrical current fluctuating due to thermal noise, or the movement of a gas molecule. Stochastic processes have applications in many disciplines such as biology, chemistry, ecology, neuroscience, physics, image processing, signal processing, control theory, information theory, computer science, and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.

<span class="mw-page-title-main">Markov chain</span> Random process independent of past history

In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov.

Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent Markov process. An HMM requires that there be an observable process whose outcomes depend on the outcomes of in a known way. Since cannot be observed directly, the goal is to learn about state of by observing . By definition of being a Markov model, an HMM has an additional requirement that the outcome of at time must be "influenced" exclusively by the outcome of at and that the outcomes of and at must be conditionally independent of at given at time . Estimation of the parameters in an HMM can be performed using maximum likelihood estimation. For linear chain HMMs, the Baum–Welch algorithm can be used to estimate parameters.

In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, substitution matrix, or Markov matrix. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. There are several different definitions and types of stochastic matrices:

In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.

Stochastic is the property of being well-described by a random probability distribution. Stochasticity and randomness are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; in everyday conversation, however, these terms are often used interchangeably. In probability theory, the formal concept of a stochastic process is also referred to as a random process.

<span class="mw-page-title-main">Markov property</span> Memoryless property of a stochastic process

In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov property, except that the meaning of "present" is defined in terms of a random variable known as a stopping time.

Markov decision process (MDP), also called a stochastic dynamic program or stochastic control problem, is a model for sequential decision making when outcomes are uncertain.

A mathematical or physical process is time-reversible if the dynamics of the process remain well-defined when the sequence of time-states is reversed.

The Gittins index is a measure of the reward that can be achieved through a given stochastic process with certain properties, namely: the process has an ultimate termination state and evolves with an option, at each intermediate state, of terminating. Upon terminating at a given state, the reward achieved is the sum of the probabilistic expected rewards associated with every state from the actual terminating state to the ultimate terminal state, inclusive. The index is a real scalar.

In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it. Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property.

Mark Herbert Ainsworth Davis was Professor of Mathematics at Imperial College London. He made fundamental contributions to the theory of stochastic processes, stochastic control and mathematical finance.

In probability theory, a piecewise-deterministic Markov process (PDMP) is a process whose behaviour is governed by random jumps at points in time, but whose evolution is deterministically governed by an ordinary differential equation between those times. The class of models is "wide enough to include as special cases virtually all the non-diffusion models of applied probability." The process is defined by three quantities: the flow, the jump rate, and the transition measure.

<span class="mw-page-title-main">Cyrus Derman</span> American mathematician

Cyrus Derman was an American mathematician and amateur musician who did research in Markov decision process, stochastic processes, operations research, statistics and a variety of other fields.

In probability theory, a Markov reward model or Markov reward process is a stochastic process which extends either a Markov chain or continuous-time Markov chain by adding a reward rate to each state. An additional variable records the reward accumulated up to the current time. Features of interest in the model include expected reward at a given time and expected time to accumulate a given reward. The model appears in Ronald A. Howard's book. The models are often studied in the context of Markov decision processes where a decision strategy can impact the rewards received.

The following outline is provided as an overview of, and topical guide to, machine learning:

<span class="mw-page-title-main">Richard H. Stockbridge</span>

Richard H. Stockbridge is a Distinguished Professor Emeritus of Mathematics at the University of Wisconsin-Milwaukee. His contributions to research primarily involve stochastic control theory, optimal stopping and mathematical finance. Most notably, alongside Professors Thomas G. Kurtz, Kurt Helmes, and Chao Zhu, he developed the methodology of using linear programming to solve stochastic control problems.

Stochastic scheduling concerns scheduling problems involving random attributes, such as random processing times, random due dates, random weights, and stochastic machine breakdowns. Major applications arise in manufacturing systems, computer systems, communication systems, logistics and transportation, and machine learning, among others.

<span class="mw-page-title-main">Eugene A. Feinberg</span> American applied mathematician

Eugene A. Feinberg is an American mathematician and distinguished professor of applied mathematics and statistics at Stony Brook University. He is noted for his work in probability theory, real analysis, and Markov decision processes.

References

  1. Puterman, Martin L. (1994). Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and mathematical statistics. Applied probability and statistics section. New York: Wiley. pp. 1–2. ISBN   978-0-471-61977-2.
  2. Bellman, Richard (1958-09-01). "Dynamic programming and stochastic control processes". Information and Control. 1 (3): 228–239. doi:10.1016/S0019-9958(58)80003-0. ISSN   0019-9958.