Stochastic game

Last updated

In game theory, a stochastic game, introduced by Lloyd Shapley in the early 1950s, is a dynamic game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some state. The players select actions and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

Contents

Stochastic games generalize both Markov decision processes and repeated games.

Two-player games

Stochastic two-player games on directed graphs are widely used for modeling and analysis of discrete systems operating in an unknown (adversarial) environment. Possible configurations of a system and its environment are represented as vertices, and the transitions correspond to actions of the system, its environment, or "nature". A run of the system then corresponds to an infinite path in the graph. Thus, a system and its environment can be seen as two players with antagonistic objectives, where one player (the system) aims at maximizing the probability of "good" runs, while the other player (the environment) aims at the opposite.

In many cases, there exists an equilibrium value of this probability, but optimal strategies for both players may not exist.

We introduce basic concepts and algorithmic questions studied in this area, and we mention some long-standing open problems. Then, we mention selected recent results.

Theory

The ingredients of a stochastic game are: a finite set of players ; a state space (either a finite set or a measurable space ); for each player , an action set (either a finite set or a measurable space ); a transition probability from , where is the action profiles, to , where is the probability that the next state is in given the current state and the current action profile ; and a payoff function from to , where the -th coordinate of , , is the payoff to player as a function of the state and the action profile .

The game starts at some initial state . At stage , players first observe , then simultaneously choose actions , then observe the action profile , and then nature selects according to the probability . A play of the stochastic game, , defines a stream of payoffs , where .

The discounted game with discount factor () is the game where the payoff to player is . The -stage game is the game where the payoff to player is .

The value , respectively , of a two-person zero-sum stochastic game , respectively , with finitely many states and actions exists, and Truman Bewley and Elon Kohlberg (1976) proved that converges to a limit as goes to infinity and that converges to the same limit as goes to .

The "undiscounted" game is the game where the payoff to player is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum and in defining equilibrium payoffs of a non-zero-sum . The uniform value of a two-person zero-sum stochastic game exists if for every there is a positive integer and a strategy pair of player 1 and of player 2 such that for every and and every the expectation of with respect to the probability on plays defined by and is at least , and the expectation of with respect to the probability on plays defined by and is at most . Jean-François Mertens and Abraham Neyman (1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value.

If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a Nash equilibrium. The same is true for a game with infinitely many stages if the total payoff is the discounted sum.

The non-zero-sum stochastic game has a uniform equilibrium payoff if for every there is a positive integer and a strategy profile such that for every unilateral deviation by a player , i.e., a strategy profile with for all , and every the expectation of with respect to the probability on plays defined by is at least , and the expectation of with respect to the probability on plays defined by is at most . Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a uniform equilibrium payoff.

The non-zero-sum stochastic game has a limiting-average equilibrium payoff if for every there is a strategy profile such that for every unilateral deviation by a player , the expectation of the limit inferior of the averages of the stage payoffs with respect to the probability on plays defined by is at least , and the expectation of the limit superior of the averages of the stage payoffs with respect to the probability on plays defined by is at most . Jean-François Mertens and Abraham Neyman (1981) proves that every two-person zero-sum stochastic game with finitely many states and actions has a limiting-average value, and Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a limiting-average equilibrium payoff. In particular, these results imply that these games have a value and an approximate equilibrium payoff, called the liminf-average (respectively, the limsup-average) equilibrium payoff, when the total payoff is the limit inferior (or the limit superior) of the averages of the stage payoffs.

Whether every stochastic game with finitely many players, states, and actions, has a uniform equilibrium payoff, or a limiting-average equilibrium payoff, or even a liminf-average equilibrium payoff, is a challenging open question.


A Markov perfect equilibrium is a refinement of the concept of sub-game perfect Nash equilibrium to stochastic games.

Applications

Stochastic games have applications in economics, evolutionary biology and computer networks. [1] [2] They are generalizations of repeated games which correspond to the special case where there is only one state.

See also

Notes

  1. Constrained Stochastic Games in Wireless Networks by E.Altman, K.Avratchenkov, N.Bonneau, M.Debbah, R.El-Azouzi, D.S.Menasche
  2. Djehiche, Boualem; Tcheukam, Alain; Tembine, Hamidou (2017-09-27). "Mean-Field-Type Games in Engineering". AIMS Electronics and Electrical Engineering. 1: 18–73. arXiv: 1605.03281 . doi:10.3934/ElectrEng.2017.1.18.

Further reading

Related Research Articles

In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz. Lp spaces form an important class of Banach spaces in functional analysis, and of topological vector spaces. Because of their key role in the mathematical analysis of measure and probability spaces, Lebesgue spaces are used also in the theoretical discussion of problems in physics, statistics, finance, engineering, and other disciplines.

In game theory, the Nash equilibrium, named after the mathematician John Forbes Nash Jr., is a proposed solution of a non-cooperative game involving two or more players in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy.

In mathematics, an infinite series of numbers is said to converge absolutely if the sum of the absolute values of the summands is finite. More precisely, a real or complex series is said to converge absolutely if for some real number . Similarly, an improper integral of a function, , is said to converge absolutely if the integral of the absolute value of the integrand is finite—that is, if

In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central operations of calculus—differentiation and integration. This relationship is commonly characterized in the framework of Riemann integration, but with absolute continuity it may be formulated in terms of Lebesgue integration. For real-valued functions on the real line, two interrelated notions appear: absolute continuity of functions and absolute continuity of measures. These two notions are generalized in different directions. The usual derivative of a function is related to the Radon–Nikodym derivative, or density, of a measure.

In mathematics, the Radon–Nikodym theorem is a result in measure theory. It involves a measurable space on which two σ-finite measures are defined, and . It states that, if , then there is a measurable function , such that for any measurable set ,

The Laplace–Stieltjes transform, named for Pierre-Simon Laplace and Thomas Joannes Stieltjes, is an integral transform similar to the Laplace transform. For real-valued functions, it is the Laplace transform of a Stieltjes measure, however it is often defined for functions with values in a Banach space. It is useful in a number of areas of mathematics, including functional analysis, and certain areas of theoretical and applied probability.

In mathematics, the spectral radius of a square matrix or a bounded linear operator is the largest absolute value of its eigenvalues. It is sometimes denoted by ρ(·).

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In statistics, econometrics and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it is used to describe certain time-varying processes in nature, economics, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

Game theory is the branch of mathematics in which games are studied: that is, models describing human behaviour. This is a glossary of some terms of the subject.

In probability theory, fractional Brownian motion (fBm), also called a fractal Brownian motion, is a generalization of Brownian motion. Unlike classical Brownian motion, the increments of fBm need not be independent. fBm is a continuous-time Gaussian process BH(t) on [0, T], that starts at zero, has expectation zero for all t in [0, T], and has the following covariance function:

Vertex model

A vertex model is a type of statistical mechanics model in which the Boltzmann weights are associated with a vertex in the model. This contrasts with a nearest-neighbour model, such as the Ising model, in which the energy, and thus the Boltzmann weight of a statistical microstate is attributed to the bonds connecting two neighbouring particles. The energy associated with a vertex in the lattice of particles is thus dependent on the state of the bonds which connect it to adjacent vertices. It turns out that every solution of the Yang–Baxter equation with spectral parameters in a tensor product of vector spaces yields an exactly-solvable vertex model.

Newman–Penrose formalism Notation in general relativity

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.

In the Newman–Penrose (NP) formalism of general relativity, Weyl scalars refer to a set of five complex scalars which encode the ten independent components of the Weyl tensor of a four-dimensional spacetime.

In mathematical analysis, the Russo–Vallois integral is an extension to stochastic processes of the classical Riemann–Stieltjes integral

In game theory, an epsilon-equilibrium, or near-Nash equilibrium, is a strategy profile that approximately satisfies the condition of Nash equilibrium. In a Nash equilibrium, no player has an incentive to change his behavior. In an approximate Nash equilibrium, this requirement is weakened to allow the possibility that a player may have a small incentive to do something different. This may still be considered an adequate solution concept, assuming for example status quo bias. This solution concept may be preferred to Nash equilibrium due to being easier to compute, or alternatively due to the possibility that in games of more than 2 players, the probabilities involved in an exact Nash equilibrium need not be rational numbers.

Statistical Football prediction is a method used in sports betting, to predict the outcome of football matches by means of statistical tools. The goal of statistical match prediction is to outperform the predictions of bookmakers, who use them to set odds on the outcome of football matches.

In probability theory, the Palm–Khintchine theorem, the work of Conny Palm and Aleksandr Khinchin, expresses that a large number of renewal processes, not necessarily Poissonian, when combined ("superimposed") will have Poissonian properties.

In the Newman–Penrose (NP) formalism of general relativity, independent components of the Ricci tensors of a four-dimensional spacetime are encoded into seven Ricci scalars which consist of three real scalars , three complex scalars and the NP curvature scalar . Physically, Ricci-NP scalars are related with the energy–momentum distribution of the spacetime due to Einstein's field equation.

Stochastic portfolio theory (SPT) is a mathematical theory for analyzing stock market structure and portfolio behavior introduced by E. Robert Fernholz in 2002. It is descriptive as opposed to normative, and is consistent with the observed behavior of actual markets. Normative assumptions, which serve as a basis for earlier theories like modern portfolio theory (MPT) and the capital asset pricing model (CAPM), are absent from SPT.