Stochastic game

Last updated

In game theory, a stochastic game (or Markov game), introduced by Lloyd Shapley in the early 1950s, [1] is a repeated game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some state. The players select actions and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

Contents

Stochastic games generalize Markov decision processes to multiple interacting decision makers, as well as strategic-form games to dynamic situations in which the environment changes in response to the players’ choices. [2]

Two-player games

Stochastic two-player games on directed graphs are widely used for modeling and analysis of discrete systems operating in an unknown (adversarial) environment[ citation needed ]. Possible configurations of a system and its environment are represented as vertices, and the transitions correspond to actions of the system, its environment, or "nature". A run of the system then corresponds to an infinite path in the graph. Thus, a system and its environment can be seen as two players with antagonistic objectives, where one player (the system) aims at maximizing the probability of "good" runs, while the other player (the environment) aims at the opposite.

In many cases, there exists an equilibrium value of this probability, but optimal strategies for both players may not exist.

A modified game of rock paper scissors played with dice, shown generally in the first graph and more strictly in the second.

Each player has a set of 6 dice, where 4 sides of each die has rock, paper or scissors, and 2 sides have one of the other options, so that the set of 6 dice contains every combination. "RP" means a die with 4 rock and 2 paper, and although both players may not choose this die, any two combinations are equivalent to a combination of any basis die and some other die, so BR is used as a basis. Ties let the players choose a new die if they would like.

The more expansive graph is needed because the probability of the players winning is influenced by which die they and their opponents choose. The probability of each outcome is therefore a function of the action profiles of the players during the "choose and roll" stage. Stochastic game 2 graphs cropped.png
A modified game of rock paper scissors played with dice, shown generally in the first graph and more strictly in the second.

Each player has a set of 6 dice, where 4 sides of each die has rock, paper or scissors, and 2 sides have one of the other options, so that the set of 6 dice contains every combination. "RP" means a die with 4 rock and 2 paper, and although both players may not choose this die, any two combinations are equivalent to a combination of any basis die and some other die, so BR is used as a basis. Ties let the players choose a new die if they would like.

The more expansive graph is needed because the probability of the players winning is influenced by which die they and their opponents choose. The probability of each outcome is therefore a function of the action profiles of the players during the "choose and roll" stage.

We introduce basic concepts and algorithmic questions studied in this area, and we mention some long-standing open problems. Then, we mention selected recent results.

Theory

The ingredients of a stochastic game are: a finite set of players ; a state space (either a finite set or a measurable space ); for each player , an action set (either a finite set or a measurable space ); a transition probability from , where is the action profiles, to , where is the probability that the next state is in given the current state and the current action profile ; and a payoff function from to , where the -th coordinate of , , is the payoff to player as a function of the state and the action profile .

The game starts at some initial state . At stage , players first observe , then simultaneously choose actions , then observe the action profile , and then nature selects according to the probability . A play of the stochastic game, , defines a stream of payoffs , where .

The discounted game with discount factor () is the game where the payoff to player is . The -stage game is the game where the payoff to player is .

The value , respectively , of a two-person zero-sum stochastic game , respectively , with finitely many states and actions exists, and Truman Bewley and Elon Kohlberg (1976) proved that converges to a limit as goes to infinity and that converges to the same limit as goes to .

The "undiscounted" game is the game where the payoff to player is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum and in defining equilibrium payoffs of a non-zero-sum . The uniform value of a two-person zero-sum stochastic game exists if for every there is a positive integer and a strategy pair of player 1 and of player 2 such that for every and and every the expectation of with respect to the probability on plays defined by and is at least , and the expectation of with respect to the probability on plays defined by and is at most . Jean-François Mertens and Abraham Neyman (1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value. [3]

If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a Nash equilibrium. The same is true for a game with infinitely many stages if the total payoff is the discounted sum.

The non-zero-sum stochastic game has a uniform equilibrium payoff if for every there is a positive integer and a strategy profile such that for every unilateral deviation by a player , i.e., a strategy profile with for all , and every the expectation of with respect to the probability on plays defined by is at least , and the expectation of with respect to the probability on plays defined by is at most . Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a uniform equilibrium payoff. [4]

The non-zero-sum stochastic game has a limiting-average equilibrium payoff if for every there is a strategy profile such that for every unilateral deviation by a player , the expectation of the limit inferior of the averages of the stage payoffs with respect to the probability on plays defined by is at least , and the expectation of the limit superior of the averages of the stage payoffs with respect to the probability on plays defined by is at most . Jean-François Mertens and Abraham Neyman (1981) proves that every two-person zero-sum stochastic game with finitely many states and actions has a limiting-average value, [3] and Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a limiting-average equilibrium payoff. [4] In particular, these results imply that these games have a value and an approximate equilibrium payoff, called the liminf-average (respectively, the limsup-average) equilibrium payoff, when the total payoff is the limit inferior (or the limit superior) of the averages of the stage payoffs.

Whether every stochastic game with finitely many players, states, and actions, has a uniform equilibrium payoff, or a limiting-average equilibrium payoff, or even a liminf-average equilibrium payoff, is a challenging open question.

A Markov perfect equilibrium is a refinement of the concept of sub-game perfect Nash equilibrium to stochastic games.

Stochastic games have been combined with Bayesian games to model uncertainty over player strategies. [5] The resulting stochastic Bayesian game model is solved via a recursive combination of the Bayesian Nash equilibrium equation and the Bellman optimality equation.

Applications

Stochastic games have applications in economics, evolutionary biology and computer networks. [6] [7] They are generalizations of repeated games which correspond to the special case where there is only one state.

See also

Notes

  1. Shapley, L. S. (1953). "Stochastic games". PNAS . 39 (10): 1095–1100. Bibcode:1953PNAS...39.1095S. doi: 10.1073/pnas.39.10.1095 . PMC   1063912 . PMID   16589380.
  2. Solan, Eilon; Vieille, Nicolas (2015). "Stochastic Games". PNAS. 112 (45): 13743–13746. doi: 10.1073/pnas.1513508112 . PMC   4653174 . PMID   26556883.
  3. 1 2 Mertens, J. F. & Neyman, A. (1981). "Stochastic Games". International Journal of Game Theory. 10 (2): 53–66. doi:10.1007/BF01769259. S2CID   189830419.
  4. 1 2 Vieille, N. (2002). "Stochastic games: Recent results". Handbook of Game Theory. Amsterdam: Elsevier Science. pp. 1833–1850. ISBN   0-444-88098-4.
  5. Albrecht, Stefano; Crandall, Jacob; Ramamoorthy, Subramanian (2016). "Belief and Truth in Hypothesised Behaviours". Artificial Intelligence . 235: 63–94. arXiv: 1507.07688 . doi:10.1016/j.artint.2016.02.004. S2CID   2599762.
  6. Constrained Stochastic Games in Wireless Networks by E.Altman, K.Avratchenkov, N.Bonneau, M.Debbah, R.El-Azouzi, D.S.Menasche
  7. Djehiche, Boualem; Tcheukam, Alain; Tembine, Hamidou (2017-09-27). "Mean-Field-Type Games in Engineering". AIMS Electronics and Electrical Engineering. 1: 18–73. arXiv: 1605.03281 . doi:10.3934/ElectrEng.2017.1.18. S2CID   16055840.

Further reading

Related Research Articles

In game theory, the Nash equilibrium is the most commonly-used solution concept for non-cooperative games. A Nash equilibrium is a situation where no player could gain by changing their own strategy. The idea of Nash equilibrium dates back to the time of Cournot, who in 1838 applied it to his model of competition in an oligopoly.

<span class="mw-page-title-main">Fokker–Planck equation</span> Partial differential equation

In statistical mechanics and information theory, the Fokker–Planck equation is a partial differential equation that describes the time evolution of the probability density function of the velocity of a particle under the influence of drag forces and random forces, as in Brownian motion. The equation can be generalized to other observables as well. The Fokker–Planck equation has multiple applications in information theory, graph theory, data science, finance, economics etc.

<span class="mw-page-title-main">Dirichlet convolution</span> Mathematical operation on arithmetical functions

In mathematics, the Dirichlet convolution is a binary operation defined for arithmetic functions; it is important in number theory. It was developed by Peter Gustav Lejeune Dirichlet.

In calculus and real analysis, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central operations of calculus—differentiation and integration. This relationship is commonly characterized in the framework of Riemann integration, but with absolute continuity it may be formulated in terms of Lebesgue integration. For real-valued functions on the real line, two interrelated notions appear: absolute continuity of functions and absolute continuity of measures. These two notions are generalized in different directions. The usual derivative of a function is related to the Radon–Nikodym derivative, or density, of a measure. We have the following chains of inclusions for functions over a compact subset of the real line:

The Ising model, named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent magnetic dipole moments of atomic "spins" that can be in one of two states. The spins are arranged in a graph, usually a lattice, allowing each spin to interact with its neighbors. Neighboring spins that agree have a lower energy than those that disagree; the system tends to the lowest energy but heat disturbs this tendency, thus creating the possibility of different structural phases. The model allows the identification of phase transitions as a simplified model of reality. The two-dimensional square-lattice Ising model is one of the simplest statistical models to show a phase transition.

In mathematics, the Radon–Nikodym theorem is a result in measure theory that expresses the relationship between two measures defined on the same measurable space. A measure is a set function that assigns a consistent magnitude to the measurable subsets of a measurable space. Examples of a measure include area and volume, where the subsets are sets of points; or the probability of an event, which is a subset of possible outcomes within a wider probability space.

The Laplace–Stieltjes transform, named for Pierre-Simon Laplace and Thomas Joannes Stieltjes, is an integral transform similar to the Laplace transform. For real-valued functions, it is the Laplace transform of a Stieltjes measure, however it is often defined for functions with values in a Banach space. It is useful in a number of areas of mathematics, including functional analysis, and certain areas of theoretical and applied probability.

<span class="mw-page-title-main">Transmittance</span> Effectiveness of a material in transmitting radiant energy

In optical physics, transmittance of the surface of a material is its effectiveness in transmitting radiant energy. It is the fraction of incident electromagnetic power that is transmitted through a sample, in contrast to the transmission coefficient, which is the ratio of the transmitted to incident electric field.

In mathematics, the Poisson summation formula is an equation that relates the Fourier series coefficients of the periodic summation of a function to values of the function's continuous Fourier transform. Consequently, the periodic summation of a function is completely defined by discrete samples of the original function's Fourier transform. And conversely, the periodic summation of a function's Fourier transform is completely defined by discrete samples of the original function. The Poisson summation formula was discovered by Siméon Denis Poisson and is sometimes called Poisson resummation.

<span class="mw-page-title-main">Granular material</span> Conglomeration of discrete solid, macroscopic particles

A granular material is a conglomeration of discrete solid, macroscopic particles characterized by a loss of energy whenever the particles interact. The constituents that compose granular material are large enough such that they are not subject to thermal motion fluctuations. Thus, the lower size limit for grains in granular material is about 1 μm. On the upper size limit, the physics of granular materials may be applied to ice floes where the individual grains are icebergs and to asteroid belts of the Solar System with individual grains being asteroids.

In probability theory and related fields, Malliavin calculus is a set of mathematical techniques and ideas that extend the mathematical field of calculus of variations from deterministic functions to stochastic processes. In particular, it allows the computation of derivatives of random variables. Malliavin calculus is also called the stochastic calculus of variations. P. Malliavin first initiated the calculus on infinite dimensional space. Then, the significant contributors such as S. Kusuoka, D. Stroock, J-M. Bismut, Shinzo Watanabe, I. Shigekawa, and so on finally completed the foundations.

Game theory is the branch of mathematics in which games are studied: that is, models describing human behaviour. This is a glossary of some terms of the subject.

In game theory, trembling hand perfect equilibrium is a type of refinement of a Nash equilibrium that was first proposed by Reinhard Selten. A trembling hand perfect equilibrium is an equilibrium that takes the possibility of off-the-equilibrium play into account by assuming that the players, through a "slip of the hand" or tremble, may choose unintended strategies, albeit with negligible probability.

In mathematics, the Weyl character formula in representation theory describes the characters of irreducible representations of compact Lie groups in terms of their highest weights. It was proved by Hermann Weyl. There is a closely related formula for the character of an irreducible representation of a semisimple Lie algebra. In Weyl's approach to the representation theory of connected compact Lie groups, the proof of the character formula is a key step in proving that every dominant integral element actually arises as the highest weight of some irreducible representation. Important consequences of the character formula are the Weyl dimension formula and the Kostant multiplicity formula.

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members often asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.

In mathematical analysis, the Russo–Vallois integral is an extension to stochastic processes of the classical Riemann–Stieltjes integral

In game theory, an epsilon-equilibrium, or near-Nash equilibrium, is a strategy profile that approximately satisfies the condition of Nash equilibrium. In a Nash equilibrium, no player has an incentive to change his behavior. In an approximate Nash equilibrium, this requirement is weakened to allow the possibility that a player may have a small incentive to do something different. This may still be considered an adequate solution concept, assuming for example status quo bias. This solution concept may be preferred to Nash equilibrium due to being easier to compute, or alternatively due to the possibility that in games of more than 2 players, the probabilities involved in an exact Nash equilibrium need not be rational numbers.

Statistical Football prediction is a method used in sports betting, to predict the outcome of football matches by means of statistical tools. The goal of statistical match prediction is to outperform the predictions of bookmakers, who use them to set odds on the outcome of football matches.

In probability theory, the Palm–Khintchine theorem, the work of Conny Palm and Aleksandr Khinchin, expresses that a large number of renewal processes, not necessarily Poissonian, when combined ("superimposed") will have Poissonian properties.

Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature by Jorma Rissanen in 1983, as a universal tool to data compression, but recently have been used to model data in different areas such as biology, linguistics and music.