Markov strategy

Last updated

In game theory, a Markov strategy [1] is one that depends only on state variables that summarize the history of the game in one way or another. [2] For instance, a state variable can be the current play in a repeated game, or it can be any interpretation of a recent sequence of play.

Contents

A profile of Markov strategies [3] is a Markov perfect equilibrium if it is a Nash equilibrium in every state of the game. The Markov strategy was invented by Andrey Markov. [4]

Definition

To formally define Markov strategy, we need the definition of stochastic game and behavioral strategy. Intuitively speaking, a stochastic game is a collection of normal-form games; the agents repeatedly play games from this collection, and the particular game played at any given iteration depends probabilistically on the previous game played and on the actions taken by all agents in that game. [5]

(Stochastic game). A stochastic game (also known as a Markov game) is a

tuple , where:

Now let us consider the strategy space for stochastic games. As in the other game forms, an agent's strategy can consist of any mixture over deterministic strategies.Actually there are several restricted classes of strategies that are of interest. The first restriction is that the mixing takes place at each history independently, which leads us to behavioral strategy.

(Behavioral strategy). A behavioral strategy returns the probability of playing action for history .

(Markov strategy). A Markov strategy is a behavioral strategy in which if , where and are the final states of and , respectively

A Markov strategy further restricts a behavioral strategy so that, for a given time t, the distribution over actions depends only on the current state. In other words, a Markov strategy will only consider the action from the last round to decide action in this round.

Example

Repeated Prisoner’s dilemma is a very special case of stochastic game where the . And it is easy to see that Tit-for-Tat is a Markov strategy while grim-trigger strategy is not a Markov strategy since the action in the current stage is not solely determined by the last move.

In more general case, consider a “real” stochastic game with two stages , and we have the normal-form of is:

P2

P1
CD
C(2,2)(0,3)
D(3,0)(1,1)

And we have the normal-form of is

P2

P1
CD
C(5,5)(2,1)
D(1,2)(1,1)

The transition probability function for stage is

P2

P1
CD
C(0.5,0.5)(1,0)
D(1,0)(1,0)

Related Research Articles

In game theory, the Nash equilibrium, named after the mathematician John Nash, is the most common way to define the solution of a non-cooperative game involving two or more players. In a Nash equilibrium, each player is assumed to know the equilibrium strategies of the other players, and no one has anything to gain by changing only one's own strategy. The principle of Nash equilibrium dates back to the time of Cournot, who in 1838 applied it to competing firms choosing outputs.

<span class="mw-page-title-main">Stochastic process</span> Collection of random variables

In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner. Examples include the growth of a bacterial population, an electrical current fluctuating due to thermal noise, or the movement of a gas molecule. Stochastic processes have applications in many disciplines such as biology, chemistry, ecology, neuroscience, physics, image processing, signal processing, control theory, information theory, computer science, cryptography and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.

<span class="mw-page-title-main">Markov chain</span> Random process independent of past history

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.

<span class="mw-page-title-main">Quantum superposition</span> Principle of quantum mechanics

Quantum superposition is a fundamental principle of quantum mechanics. It states that, much like waves in classical physics, any two quantum states can be added together ("superposed") and the result will be another valid quantum state; and conversely, that every quantum state can be represented as a sum of two or more other distinct states. Mathematically, it refers to a property of solutions to the Schrödinger equation; since the Schrödinger equation is linear, any linear combination of solutions will also be a solution(s) .

This article contains examples of Markov chains and Markov processes in action.

In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, substitution matrix, or Markov matrix. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. There are several different definitions and types of stochastic matrices:

In the field of mathematical optimization, stochastic programming is a framework for modeling optimization problems that involve uncertainty. A stochastic program is an optimization problem in which some or all problem parameters are uncertain, but follow known probability distributions. This framework contrasts with deterministic optimization, in which all problem parameters are assumed to be known exactly. The goal of stochastic programming is to find a decision which both optimizes some criteria chosen by the decision maker, and appropriately accounts for the uncertainty of the problem parameters. Because many real-world decisions involve uncertainty, stochastic programming has found applications in a broad range of areas ranging from finance to transportation to energy optimization.

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Ukrainian mathematician Andrey Markov as they are an extension of Markov chains.

Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) holding times that have finite mean. A renewal-reward process additionally has a random sequence of rewards incurred at each holding time, which are IID but need not be independent of the holding times.

<span class="mw-page-title-main">Multi-armed bandit</span> Machine Learning

In probability theory and machine learning, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines, who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine. The multi-armed bandit problem also falls into the broad category of stochastic scheduling.

<span class="mw-page-title-main">Discrete-time Markov chain</span> Probability concept

In probability, a discrete-time Markov chain (DTMC) is a sequence of random variables, known as a stochastic process, in which the value of the next variable depends only on the value of the current variable, and not any variables in the past. For instance, a machine may have two states, A and E. When it is in state A, there is a 40% chance of it moving to state E and a 60% chance of it remaining in state A. When it is in state E, there is a 70% chance of it moving to A and a 30% chance of it staying in E. The sequence of states of the machine is a Markov chain. If we denote the chain by then is the state which the machine starts in and is the random variable describing its state after 10 transitions. The process continues forever, indexed by the natural numbers.

In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" point. Equivalently, a sufficiently large collection of random samples from a process can represent the average statistical properties of the entire process. Ergodicity is a property of the system; it is a statement that the system cannot be reduced or factored into smaller components. Ergodic theory is the study of systems possessing ergodicity.

A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are frequently used in molecular phylogenetic analyses. In particular, they are used during the calculation of likelihood of a tree and they are used to estimate the evolutionary distance between sequences from the observed differences between the sequences.

In game theory, a stochastic game, introduced by Lloyd Shapley in the early 1950s, is a repeated game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some state. The players select actions and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

In probability theory, Kolmogorov's criterion, named after Andrey Kolmogorov, is a theorem giving a necessary and sufficient condition for a Markov chain or continuous-time Markov chain to be stochastically identical to its time-reversed version.

In the mathematical study of stochastic processes, a Harris chain is a Markov chain where the chain returns to a particular part of the state space an unbounded number of times. Harris chains are regenerative processes and are named after Theodore Harris. The theory of Harris chains and Harris recurrence is useful for treating Markov chains on general state spaces.

In probability theory, a Markov kernel is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.

Mean-field game theory is the study of strategic decision making by small interacting agents in very large populations. It lies at the intersection of game theory with stochastic analysis and control theory. The use of the term "mean field" is inspired by mean-field theory in physics, which considers the behavior of systems of large numbers of particles where individual particles have negligible impacts upon the system. In other words, each agent acts according to his minimization or maximization problem taking into account other agents’ decisions and because their population is large we can assume the number of agents goes to infinity and a representative agent exists.

Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature by Jorma Rissanen in 1983, as a universal tool to data compression, but recently have been used to model data in different areas such as biology, linguistics and music.

References

  1. "First Links in the Markov Chain". American Scientist. 2017-02-06. Retrieved 2017-02-06.
  2. Fudenberg, Drew (1995). Game Theory. Cambridge, MA: The MIT Press. pp. 501–40. ISBN   0-262-06141-4.
  3. "Markov Strategy" . Retrieved 2017-11-17.
  4. Sack, Harald (2022-06-14). "Andrey Markov and the Markov Chains". SciHi Blog. Retrieved 2017-11-23.
  5. "Essentials of Game Theory: A Concise Multidisciplinary Introduction | Synthesis Lectures on Artificial Intelligence and Machine Learning". doi:10.2200/s00108ed1v01y200802aim003?casa_token=omxkyngxikcaaaaa:hgkme_qlkinn0ph7hddzcqnodzg53-vjmg0yek0rrynkfdzzrkdif0lkkw-txvjrqfwf7nln96g.{{cite journal}}: Cite journal requires |journal= (help)