Program equilibrium

Last updated

Program equilibrium is a game-theoretic solution concept for a scenario in which players submit computer programs to play the game on their behalf and the programs can read each other's source code. The term was introduced by Moshe Tennenholtz in 2004. [1] The same setting had previously been studied by R. Preston McAfee, [2] J. V. Howard [3] and Ariel Rubinstein. [4]

Contents

Setting and definition

The program equilibrium literature considers the following setting. Consider a normal-form game as a base game. For simplicity, consider a two-player game in which and are the sets of available strategies and and are the players' utility functions. Then we construct a new (normal-form) program game in which each player chooses a computer program . The payoff (utility) for the players is then determined as follows. Each player's program is run with the other program as input and outputs a strategy for Player . For convenience one also often imagines that programs can access their own source code. [nb 1] Finally, the utilities for the players are given by for , i.e., by applying the utility functions for the base game to the chosen strategies.

One has to further deal with the possibility that one of the programs doesn't halt. One way to deal with this is to restrict both players' sets of available programs to prevent non-halting programs. [1] [5]

A program equilibrium is a pair of programs that constitute a Nash equilibrium of the program game. In other words, is a program equilibrium if neither player can deviate to an alternative program such that their utility is higher in than in .

Instead of programs, some authors have the players submit other kinds of objects, such as logical formulas specifying what action to play depending on an encoding of the logical formula submitted by the opponent. [6] [7]

Different mechanisms for achieving cooperative program equilibrium in the Prisoner's Dilemma

Various authors have proposed ways to achieve cooperative program equilibrium in the Prisoner's Dilemma.

Cooperation based on syntactic comparison

Multiple authors have independently proposed the following program for the Prisoner's Dilemma: [1] [3] [2]

algorithm CliqueBot(opponent_program):     if opponent_program == this_program thenreturn Cooperate     elsereturn Defect

If both players submit this program, then the if-clause will resolve to true in the execution of both programs. As a result, both programs will cooperate. Moreover, (CliqueBot,CliqueBot) is an equilibrium. If either player deviates to some other program that is different from CliqueBot, then the opponent will defect. Therefore, deviating to can at best result in the payoff of mutual defection, which is worse than the payoff of mutual cooperation.

This approach has been criticized for being fragile. [5] If the players fail to coordinate on the exact source code they submit (for example, if one player adds an extra space character), both programs will defect. The development of the techniques below is in part motivated by this fragility issue.

Proof-based cooperation

Another approach is based on letting each player's program try to prove something about the opponent's program or about how the two programs relate. [6] [8] [9] [10] One example of such a program is the following:

algorithm FairBot(opponent_program):     if there is a proof that opponent_program(this_program) = Cooperate thenreturn Cooperate     elsereturn Defect

Using Löb's theorem it can be shown that when both players submit this program, they cooperate against each other. [8] [9] [10] Moreover, if one player were to instead submit a program that defects against the above program, then (assuming consistency of the proof system is used) the if-condition would resolve to false and the above program would defect. Therefore, (FairBot,FairBot) is a program equilibrium as well.

Cooperating with ε-grounded simulation

Another proposed program is the following: [5] [11]

algorithmGroundedFairBot(opponent_program):     With probability :         return Cooperate     return opponent_program(this_program)

Here is a small positive number.

If both players submit this program, then they terminate almost surely and cooperate. The expected number of steps to termination is given by the geometric series. Moreover, if both players submit this program, neither can profitably deviate, assuming is sufficiently small, because defecting with probability would cause the opponent to defect with probability .

Folk theorem

We here give a theorem that characterizes what payoffs can be achieved in program equilibrium.

The theorem uses the following terminology: A pair of payoffs is called feasible if there is a pair of (potentially mixed) strategies such that for both players . That is, a pair of payoffs is called feasible if it is achieved in some strategy profile. A payoff is called individually rational if it is better than that player's minimax payoff; that is, if , where the minimum is over all mixed strategies for Player . [nb 2]

Theorem (folk theorem for program equilibrium): [4] [1] Let G be a base game. Let be a pair of real-valued payoffs. Then the following two claims are equivalent:

The result is referred to as a folk theorem in reference to the so-called folk theorems (game theory) for repeated games, which use the same conditions on equilibrium payoffs .

See also

Notes

  1. It is not necessary for programs in the program game to be given access to their own source code. By the diagonalization lemma, one can use quining to enable programs to refer to their source code. [2] [3] [4]
  2. Equivalently, (by von Neumann's minimax theorem), , where the maximum is over all mixed strategies for Player .

Related Research Articles

The prisoner's dilemma is a game theory thought experiment that involves two rational agents, each of whom can cooperate for mutual benefit or betray their partner ("defect") for individual reward. This dilemma was originally framed by Merrill Flood and Melvin Dresher in 1950 while they worked at the RAND Corporation. Albert W. Tucker later formalized the game by structuring the rewards in terms of prison sentences and named it the "prisoner's dilemma".

In game theory, the Nash equilibrium, named after the mathematician John Nash, is the most common way to define the solution of a non-cooperative game involving two or more players. In a Nash equilibrium, each player is assumed to know the equilibrium strategies of the other players, and no one has anything to gain by changing only one's own strategy. The principle of Nash equilibrium dates back to the time of Cournot, who in 1838 applied it to competing firms choosing outputs.

In game theory, the best response is the strategy which produces the most favorable outcome for a player, taking other players' strategies as given. The concept of a best response is central to John Nash's best-known contribution, the Nash equilibrium, the point at which each player in a game has selected the best response to the other players' strategies.

In mechanics, virtual work arises in the application of the principle of least action to the study of forces and movement of a mechanical system. The work of a force acting on a particle as it moves along a displacement is different for different displacements. Among all the possible displacements that a particle may follow, called virtual displacements, one will minimize the action. This displacement is therefore the displacement followed by the particle according to the principle of least action.

The work of a force on a particle along a virtual displacement is known as the virtual work.

In game theory, the centipede game, first introduced by Robert Rosenthal in 1981, is an extensive form game in which two players take turns choosing either to take a slightly larger share of an increasing pot, or to pass the pot to the other player. The payoffs are arranged so that if one passes the pot to one's opponent and the opponent takes the pot on the next round, one receives slightly less than if one had taken the pot on this round, but after an additional switch the potential payoff will be higher. Therefore, although at each round a player has an incentive to take the pot, it would be better for them to wait. Although the traditional centipede game had a limit of 100 rounds, any game with this structure but a different number of rounds is called a centipede game.

Game theory is the branch of mathematics in which games are studied: that is, models describing human behaviour. This is a glossary of some terms of the subject.

In game theory, grim trigger is a trigger strategy for a repeated game.

In game theory, normal form is a description of a game. Unlike extensive form, normal-form representations are not graphical per se, but rather represent the game by way of a matrix. While this approach can be of greater use in identifying strictly dominated strategies and Nash equilibria, some information is lost as compared to extensive-form representations. The normal-form representation of a game includes all perceptible and conceivable strategies, and their corresponding payoffs, for each player.

In game theory, folk theorems are a class of theorems describing an abundance of Nash equilibrium payoff profiles in repeated games. The original Folk Theorem concerned the payoffs of all the Nash equilibria of an infinitely repeated game. This result was called the Folk Theorem because it was widely known among game theorists in the 1950s, even though no one had published it. Friedman's (1971) Theorem concerns the payoffs of certain subgame-perfect Nash equilibria (SPE) of an infinitely repeated game, and so strengthens the original Folk Theorem by using a stronger equilibrium concept: subgame-perfect Nash equilibria rather than Nash equilibria.

In game theory, the purification theorem was contributed by Nobel laureate John Harsanyi in 1973. The theorem aims to justify a puzzling aspect of mixed strategy Nash equilibria: that each player is wholly indifferent amongst each of the actions he puts non-zero weight on, yet he mixes them so as to make every other player also indifferent.

Proper equilibrium is a refinement of Nash Equilibrium by Roger B. Myerson. Proper equilibrium further refines Reinhard Selten's notion of a trembling hand perfect equilibrium by assuming that more costly trembles are made with significantly smaller probability than less costly ones.

In game theory, an epsilon-equilibrium, or near-Nash equilibrium, is a strategy profile that approximately satisfies the condition of Nash equilibrium. In a Nash equilibrium, no player has an incentive to change his behavior. In an approximate Nash equilibrium, this requirement is weakened to allow the possibility that a player may have a small incentive to do something different. This may still be considered an adequate solution concept, assuming for example status quo bias. This solution concept may be preferred to Nash equilibrium due to being easier to compute, or alternatively due to the possibility that in games of more than 2 players, the probabilities involved in an exact Nash equilibrium need not be rational numbers.

In game theory, a stochastic game, introduced by Lloyd Shapley in the early 1950s, is a repeated game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some state. The players select actions and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

A continuous game is a mathematical concept, used in game theory, that generalizes the idea of an ordinary game like tic-tac-toe or checkers (draughts). In other words, it extends the notion of a discrete game, where the players choose from a finite set of pure strategies. The continuous game concepts allows games to include more general sets of pure strategies, which may be uncountably infinite.

Antiplane shear or antiplane strain is a special state of strain in a body. This state of strain is achieved when the displacements in the body are zero in the plane of interest but nonzero in the direction perpendicular to the plane. For small strains, the strain tensor under antiplane shear can be written as

In game theory, Mertens stability is a solution concept used to predict the outcome of a non-cooperative game. A tentative definition of stability was proposed by Elon Kohlberg and Jean-François Mertens for games with finite numbers of players and strategies. Later, Mertens proposed a stronger definition that was elaborated further by Srihari Govindan and Mertens. This solution concept is now called Mertens stability, or just stability.

Network games of incomplete information represent strategic network formation when agents do not know in advance their neighbors, i.e. the network structure and the value stemming from forming links with neighboring agents. In such a setting, agents have prior beliefs about the value of attaching to their neighbors; take their action based on their prior belief and update their belief based on the history of the game. While games with a fully known network structure are widely applicable, there are many applications when players act without fully knowing with whom they interact or what their neighbors’ action will be.

Subjective expected relative similarity (SERS) is a normative and descriptive theory that predicts and explains cooperation levels in a family of games termed Similarity Sensitive Games (SSG), among them the well-known Prisoner's Dilemma game (PD). SERS was originally developed in order to (i) provide a new rational solution to the PD game and (ii) to predict human behavior in single-step PD games. It was further developed to account for: (i) repeated PD games, (ii) evolutionary perspectives and, as mentioned above, (iii) the SSG subgroup of 2×2 games. SERS predicts that individuals cooperate whenever their subjectively perceived similarity with their opponent exceeds a situational index derived from the game's payoffs, termed the similarity threshold of the game. SERS proposes a solution to the rational paradox associated with the single step PD and provides accurate behavioral predictions. The theory was developed by Prof. Ilan Fischer at the University of Haifa.

The Berge equilibrium is a game theory solution concept named after the mathematician Claude Berge. It is similar to the standard Nash equilibrium, except that it aims to capture a type of altruism rather than purely non-cooperative play. Whereas a Nash equilibrium is a situation in which each player of a strategic game ensures that they personally will receive the highest payoff given other players' strategies, in a Berge equilibrium every player ensures that all other players will receive the highest payoff possible. Although Berge introduced the intuition for this equilibrium notion in 1957, it was only formally defined by Vladislav Iosifovich Zhukovskii in 1985, and it was not in widespread use until half a century after Berge originally developed it.

In game theory, a satisfaction equilibrium is a solution concept for a class of non-cooperative games, namely games in satisfaction form. Games in satisfaction form model situations in which players aim at satisfying a given individual constraint, e.g., a performance metric must be smaller or bigger than a given threshold. When a player satisfies its own constraint, the player is said to be satisfied. A satisfaction equilibrium, if it exists, arises when all players in the game are satisfied.

References

  1. 1 2 3 4 Tennenholtz, M. (November 2004). "Program equilibrium". Games and Economic Behavior . Elsevier. 49 (2): 363–373. doi:10.1016/j.geb.2004.02.002. ISSN   0899-8256.
  2. 1 2 3 McAfee, R. P. (May 1984). Effective Computability in Economic Decisions (PDF) (Technical report). University of Western Ontario.
  3. 1 2 3 Howard, J. V. (May 1988). "Cooperation in the Prisoner's Dilemma". Theory and Decision . Kluwer Academic Publishers. 24 (3): 203–213. doi:10.1007/BF00148954. S2CID   121119727.
  4. 1 2 3 Rubinstein, A. (1998). "Ch. 10.4". Modeling Bounded Rationality. MIT Press. ISBN   978-0262681001.
  5. 1 2 3 Oesterheld, C. (February 2019). "Robust Program Equilibrium". Theory and Decision . Springer. 86: 143–159. doi: 10.1007/s11238-018-9679-3 . S2CID   255103752.
  6. 1 2 van der Hoek, W.; Witteveen, C.; Wooldridge, M. (2013). "Program equilibrium—a program reasoning approach". International Journal of Game Theory. Springer. 42 (3): 639–671. CiteSeerX   10.1.1.228.6517 . doi:10.1007/s00182-011-0314-6. S2CID   253720520.
  7. Peters, Michael; Szentes, Balázs (January 2012). "Definable and Contractible Contracts" (PDF). Econometrica . The Econometric Society. 80 (1): 363–411. doi:10.3982/ECTA8375.
  8. 1 2 Barasz, M.; Christiano, P.; Fallenstein, B.; Herreshoff, M.; LaVictoire, P.; Yudkowsky, E. (2014). "Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic". arXiv: 1401.5577 [cs.GT].
  9. 1 2 Critch, A. (2019). "A Parametric, Resource-Bounded Generalization of Löb's Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory". Journal of Symbolic Logic . Cambridge University Press. 84 (4): 1368–1381. doi:10.1017/jsl.2017.42. S2CID   133348715.
  10. 1 2 Critch, A.; Dennis, M.; Russell, S. (2022). "Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory". arXiv: 2208.07006 [cs.GT].
  11. DiGiovanni, A.; Clifton, J. (2023). "Commitment games with conditional information disclosure". Proceedings of the AAAI Conference on Artificial Intelligence . arXiv: 2204.03484 .