# Backward induction

Last updated

Backward induction is the process of reasoning backwards in time, from the end of a problem or situation, to determine a sequence of optimal actions. It proceeds by first considering the last time a decision might be made and choosing what to do in any situation at that time. Using this information, one can then determine what to do at the second-to-last time of decision. This process continues backwards until one has determined the best action for every possible situation (i.e. for every possible information set) at every point in time. It was first used by Zermelo in 1913, to prove that chess has pure optimal strategies. [1] [2]

## Contents

In the mathematical optimization method of dynamic programming, backward induction is one of the main methods for solving the Bellman equation. [3] [4] In game theory, backward induction is a method used to compute subgame perfect equilibria in sequential games. [5] The only difference is that optimization involves just one decision maker, who chooses what to do at each point of time, whereas game theory analyzes how the decisions of several players interact. That is, by anticipating what the last player will do in each situation, it is possible to determine what the second-to-last player will do, and so on. In the related fields of automated planning and scheduling and automated theorem proving, the method is called backward search or backward chaining . In chess it is called retrograde analysis.

Backward induction has been used to solve games as long as the field of game theory has existed. John von Neumann and Oskar Morgenstern suggested solving zero-sum, two-person games by backward induction in their Theory of Games and Economic Behavior (1944), the book which established game theory as a field of study. [6] [2]

## Backward induction in decision making: an optimal-stopping problem

Consider an unemployed person who will be able to work for ten more years t = 1,2,...,10. Suppose that each year in which he remains unemployed, he may be offered a 'good' job that pays \$100, or a 'bad' job that pays \$44, with equal probability (50/50). Once he accepts a job, he will remain in that job for the rest of the ten years. (Assume for simplicity that he cares only about his monetary earnings, and that he values earnings at different times equally, i.e., the discount rate is zero.)

Should this person accept bad jobs? To answer this question, we can reason backwards from time t = 10.

• At time 10, the value of accepting a good job is \$100; the value of accepting a bad job is \$44; the value of rejecting the job that is available is zero. Therefore, if he is still unemployed in the last period, he should accept whatever job he is offered at that time.
• At time 9, the value of accepting a good job is \$200 (because that job will last for two years); the value of accepting a bad job is 2*\$44 = \$88. The value of rejecting a job offer is \$0 now, plus the value of waiting for the next job offer, which will either be \$44 with 50% probability or \$100 with 50% probability, for an average ('expected') value of 0.5*(\$100+\$44) = \$72. Therefore, regardless of whether the job available at time 9 is good or bad, it is better to accept that offer than wait for a better one.
• At time 8, the value of accepting a good job is \$300 (it will last for three years); the value of accepting a bad job is 3*\$44 = \$132. The value of rejecting a job offer is \$0 now, plus the value of waiting for a job offer at time 9. Since we have already concluded that offers at time 9 should be accepted, the expected value of waiting for a job offer at time 9 is 0.5*(\$200+\$88) = \$144. Therefore, at time 8, it is more valuable to wait for the next offer than to accept a bad job.

It can be verified by continuing to work backwards that bad offers should only be accepted if one is still unemployed at times 9 or 10; they should be rejected at all times up to t = 8. The intuition is that if one expects to work in a job for a long time, this makes it more valuable to be picky about what job to accept.

A dynamic optimization problem of this kind is called an optimal stopping problem, because the issue at hand is when to stop waiting for a better offer. Search theory is the field of microeconomics that applies problems of this type to contexts like shopping, job search, and marriage.

## Backward induction in game theory

In game theory, backward induction is a solution concept. It is a refinement of the rationality concept that are sensitive to individual information sets in the extensive-form representation of a game [7] . The idea of backward induction utilize sequential rationality by identifying an optimal action for each information in a given game tree.

In “Strategy: An Introduction to Game Theory” by Joel Watson, Backward induction procedure is defined as:“The process of analyzing a game from the end to the beginning. At each decision node, one strikes from consideration any actions that are dominated, given the terminal nodes that can be reached through the play of the actions identified at successor nodes.” [8] .

One drawback of backward induction procedure is that it can be applied to only limited classes of games. The procedure is well defined for any game of perfect information with no ties of utility. It is also well defined and meaningful for game of perfect information with ties. However, it leads to more than one strategy profile. The procedure can be applied to some games with nontrivial information sets but it is unreliable in general. The procedure is best suited to solve games with perfect information. (Watson pg.188) [9]

The backward induction procedure can be demonstrated with a simple example.

## Backward induction in game theory: Multi-stage game

The proposed game is a multi-stage game involving 2 players. Players are planning to go to a movie. Currently, there are 2 movies that are very popular, Joker and Terminator. Player 1 wants to watch Terminator and Player 2 wants to watch Joker. The Player 1 will buy a ticket first and tell Player 2 about her choice. Then, Player 2 will buy his ticket. Once they both observe the choices, they will make choices on whether to go to the movie or stay home. Just like the first stage, Player 1 chooses first. Player 2 then makes his choice after observing Player 1’s choice.

For this example, we assume payoffs are added across different stages. The game is a perfect information game.

Normal-form Matrix:

Stage 1
Player 2

Player 1
JokerTerminator
Joker3, 50, 0
Terminator1, 15, 3
Stage 2
Player 2

Player 1
Go to MovieStay Home
Go to Movie6, 64, 2
Stay Home-2, 4-2, -2

Extensive-form Representation:

Steps for solving this Multi-Stage Game, with the extensive form as see to the right:

1. Backward induction starts to solve the game from the final nodes.
2. Player 2 will observe 8 subgames from the final nodes to choose to “Go to Movie” or “Stay Home”
1. Player 2 will make 4 comparisons in total. He will choose an option with the higher payoff.
2. For example, considering the first subgame, payoff of 11 is higher than 7. Therefore, Player 2 chooses to “Go to Movie”.
3. The method continues for every subgame.
3. Once Player 2 completes his choices, Player 1 will make his choice based on selected subgames.
1. The process is similar to Step 2. Player 1 compares her payoffs in order to make her choices.
2. Subgames not selected by Player 2 from the previous step are no longer considered by both players because they are not optimal.
3. For example, the choice to “Go to Movie” offers payoff of 9 (9,11) and choice to “Stay Home” offers payoff of 1 (1, 9). Player 1 will choose to “Go to Movie”.
4. The process repeats for each player until the initial node is reached.
1. For example, Player 2 will choose “Joker” because payoff of 11 (9, 11) is greater than “Terminator” with payoff of 6 (6, 6).
2. For example, Player 1, at initial node, will select “Terminator” because it offers higher payoff of 11. Terminator: (11, 9) > Joker: (9, 11)
5. To identify Subgame perfect equilibrium, we need to identify a route that selects optimal subgame at each information set.
1. In this example, Player 1 chooses “Terminator” and Player 2 also chooses “Terminator”. Then, they both chooses to “Go to Movie”.
2. The subgame perfect equilibrium leads to payoff of (11,9)

## Backward induction in game theory: the ultimatum game

Backward induction is ‘the process of analyzing a game from the end to the beginning. As with solving for other Nash Equilibria, rationality of players and complete knowledge is assumed. The concept of backwards induction corresponds to this assumption that it is common knowledge that each player will act rationally with each decision node when she chooses an option — even if her rationality would imply that such a node will not be reached.’ [10]

In order to solve for a Subgame Perfect Equilibrium with backwards induction, the game should be written out in extensive form and then divided into subgames. Starting with the subgame furthest from the initial node, or starting point, the expected payoffs listed for this subgame are weighed and the rational player will select the option with the higher payoff for themselves. The highest payoff vector is selected and marked. Solve for the subgame perfect equilibrium by continually working backwards from subgame to subgame until arriving at the starting point. The marked path of vectors is the subgame perfect equilibrium. [11]

Backward Induction Applied to the Ultimatum Game

Think of a game between two players where player 1 proposes to split a dollar with player 2. This is a famous, asymmetric game that is played sequentially called the ultimatum game. player one acts first by splitting the dollar however they see fit. Now, player two can either accept the portion they have been dealt by player one or reject the split. If player 2 accepts the split, then both player 1 and player 2 get the payoff according to that split. If player two decides to reject player 1’s offer, then both players get nothing. In other words, player 2 has veto power over player 1’s proposed allocation but applying the veto eliminates any reward for both players [12] . The strategy profile for this game therefore can be written as pairs (x, f(x)) for all x between 0 and 1, where f(x)) is a bi-valued function expressing whether x is accepted or not.

Consider the choice and response of player 2 given any arbitrary proposal by player 1, assuming that the offer is larger than \$0. Using backward induction, surely we would expect player 2 to accept any payoff that is greater than or equal to \$0. Accordingly, player 1 ought to propose giving player 2 as little as possible in order to gain the largest portion of the split. player 1 giving player 2 the smallest unit of money and keeping the rest for him/herself is the unique sub game perfect equilibrium. The ultimatum game does have several other Nash Equilibria which are not subgame perfect and therefore do not require backward induction.

The ultimatum game is an illustration of the usefulness of backward induction when considering infinite games; however, the game’s theoretically predicted results of the game are criticized. Empirical, experimental evidence has shown that the proposer very rarely offers \$0 and player 2 sometimes even rejects offers greater than \$0, presumably on grounds of fairness. What is deemed fair by player 2 varies by context and the pressure or presence of other players can mean that the game theoretic model can not necessarily predict what real people will choose.

In practice, subgame perfect equilibrium is not always achieved. According to Camerer, an American behavioral economist, player 2 “rejects offers of less than 20 percent of X about half the time, even though they end up with nothing.” [13] While backward induction would predict that the responder accepts any offer equal to or greater than zero, responders in reality are not rational players and therefore seem to care more about offer ‘fairness’ rather than potential monetary gains.

## Backward induction in economics: the entry-decision problem

Consider a dynamic game in which the players are an incumbent firm in an industry and a potential entrant to that industry. As it stands, the incumbent has a monopoly over the industry and does not want to lose some of its market share to the entrant. If the entrant chooses not to enter, the payoff to the incumbent is high (it maintains its monopoly) and the entrant neither loses nor gains (its payoff is zero). If the entrant enters, the incumbent can "fight" or "accommodate" the entrant. It will fight by lowering its price, running the entrant out of business (and incurring exit costs a negative payoff) and damaging its own profits. If it accommodates the entrant it will lose some of its sales, but a high price will be maintained and it will receive greater profits than by lowering its price (but lower than monopoly profits).

Consider if the best response of the incumbent is to accommodate if the entrant enters. If the incumbent accommodates, the best response of the entrant is to enter (and gain profit). Hence the strategy profile in which the entrant enters and the incumbent accommodates if the entrant enters is a Nash equilibrium consistent with backward induction. However, if the incumbent is going to fight, the best response of the entrant is to not enter, and if the entrant does not enter, it does not matter what the incumbent chooses to do in the hypothetical case that the entrant does enter. Hence the strategy profile in which the incumbent fights if the entrant enters, but the entrant does not enter is also a Nash equilibrium. However, were the entrant to deviate and enter, the incumbent's best response is to accommodate—the threat of fighting is not credible. This second Nash equilibrium can therefore be eliminated by backward induction.

Finding a Nash equilibrium in each decision-making process (subgame) constitutes as perfect subgame equilibria. Thus, these strategy profiles that depict subgame perfect equilibria exclude the possibility of actions like incredible threats that are used to "scare off" an entrant. If the incumbent threatens to start a price war Price war with an entrant, they are threatening to lower their prices from a monopoly price to slightly lower than the entrant's, which would be impractical, and incredible, if the entrant knew a price war would not actually happen since it would result in losses for both parties. Unlike a single agent optimization which includes equilibria that aren't feasible or optimal, a subgame perfect equilibrium accounts for the actions of another player, thus ensuring that no player reaches a subgame mistakenly. In this case, backwards induction yielding perfect subgame equilibria ensures that the entrant will not be convinced of the incumbent's threat knowing that it was not a best response in the strategy profile. [14]

## Backward induction paradox: the unexpected hanging

The unexpected hanging paradox is a paradox related to backward induction. Suppose a prisoner is told that she will be hanged sometime between Monday and Friday of next week. However, the exact day will be a surprise (i.e. she will not know the night before that she will be executed the next day). The prisoner, interested in outsmarting her executioner, attempts to determine which day the execution will occur.

She reasons that it cannot occur on Friday, since if it had not occurred by the end of Thursday, she would know the execution would be on Friday. Therefore, she can eliminate Friday as a possibility. With Friday eliminated, she decides that it cannot occur on Thursday, since if it had not occurred on Wednesday, she would know that it had to be on Thursday. Therefore, she can eliminate Thursday. This reasoning proceeds until she has eliminated all possibilities. She concludes that she will not be hanged next week.

To her surprise, she is hanged on Wednesday. She made the mistake of assuming that she knew definitively whether the unknown future factor that would cause her execution was one that she could reason about.

Here the prisoner reasons by backward induction, but seems to come to a false conclusion. Note, however, that the description of the problem assumes it is possible to surprise someone who is performing backward induction. The mathematical theory of backward induction does not make this assumption, so the paradox does not call into question the results of this theory. Nonetheless, this paradox has received some substantial discussion by philosophers.

## Backward induction and common knowledge of rationality

Backward induction works only if both players are rational, i.e., always select an action that maximizes their payoff. However, rationality is not enough: each player should also believe that all other players are rational. Even this is not enough: each player should believe that all other players know that all other players are rational. And so on ad infinitum. In other words, rationality should be common knowledge. [15]

## Notes

1. Von E., Zermelo (1913). "Über eine Anwendung der Mengenlehre auf die Theorie des Schachspiels" (PDF). www.ethz.ch. Retrieved 2018-12-31.
2. Mathematics of Chess, webpage by John MacQuarrie.
3. Jerome Adda and Russell Cooper, "Dynamic Economics: Quantitative Methods and Applications", Section 3.2.1, page 28. MIT Press, 2003.
4. Mario Miranda and Paul Fackler, "Applied Computational Economics and Finance", Section 7.3.1, page 164. MIT Press, 2002.
5. Drew Fudenberg and Jean Tirole, "Game Theory", Section 3.5, page 92. MIT Press, 1991.
6. John von Neumann and Oskar Morgenstern, "Theory of Games and Economic Behavior", Section 15.3.1. Princeton University Press. Third edition, 1953. (First edition, 1944.)
7. Watson, Joel (2002). Strategy: an introduction to game theory (3 ed.). New York: W.W. Norton & Company. p. 63.
8. Watson, Joel (2002). Strategy: an introduction to game theory (3 ed.). New York: W.W. Norton & Company. p. 186–187.
9. Watson, Joel (2002). Strategy: an introduction to game theory (3 ed.). New York: W.W. Norton & Company. p. 188.
10. Watson, Joel (2013). Strategy: An Introduction to Game Theory, 3rd Edition. New York, NY: Norton & Company. pp. 183–203. ISBN   9780393918380.
11. Kamiński, Marek M. (2017). "Backward Induction: Merits And Flaws". Studies in Logic, Grammar and Rhetoric. 50 (1): 9–24. doi:10.1515/slgr-2017-0016.
12. Camerer, Colin F. (1997). "Progress in Behavioral Game Theory" (PDF). The Journal of Economic Perspectives. 11 (4): 167–188. doi:10.1257/jep.11.4.167. ISSN   0895-3309. JSTOR   2138470.
13. Rust J. (2008) Dynamic Programming. In: Palgrave Macmillan (eds) The New Palgrave Dictionary of Economics. Palgrave Macmillan, London
14. Yisrael Aumann (1995-01-01). "Backward induction and common knowledge of rationality". Games and Economic Behavior. 8 (1): 6–19. doi:10.1016/S0899-8256(05)80015-6. ISSN   0899-8256.

## Related Research Articles

In game theory, the Nash equilibrium, named after the mathematician John Forbes Nash Jr., is a proposed solution of a non-cooperative game involving two or more players in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy.

In game theory, the centipede game, first introduced by Robert Rosenthal in 1981, is an extensive form game in which two players take turns choosing either to take a slightly larger share of an increasing pot, or to pass the pot to the other player. The payoffs are arranged so that if one passes the pot to one's opponent and the opponent takes the pot on the next round, one receives slightly less than if one had taken the pot on this round. Although the traditional centipede game had a limit of 100 rounds, any game with this structure but a different number of rounds is called a centipede game.

In game theory, a subgame is any part of a game that meets the following criteria :

1. It has a single initial node that is the only member of that node's information set.
2. If a node is contained in the subgame then so are all of its successors.
3. If a node in a particular information set is in the subgame then all members of that information set belong to the subgame.

In game theory, a player's strategy is any of the options which he or she chooses in a setting where the outcome depends not only on their own actions but on the actions of others. A player's strategy will determine the action which the player will take at any stage of the game.

In game theory, a solution concept is a formal rule for predicting how a game will be played. These predictions are called "solutions", and describe which strategies will be adopted by players and, therefore, the result of the game. The most commonly used solution concepts are equilibrium concepts, most famously Nash equilibrium.

An extensive-form game is a specification of a game in game theory, allowing for the explicit representation of a number of key aspects, like the sequencing of players' possible moves, their choices at every decision point, the information each player has about the other player's moves when they make a decision, and their payoffs for all possible game outcomes. Extensive-form games also allow for the representation of incomplete information in the form of chance events modeled as "moves by nature".

In game theory, an information set is a set that, for a particular player, establishes all the possible moves that could have taken place in the game so far, given what that player has observed. If the game has perfect information, every information set contains only one member, namely the point actually reached at that stage of the game. Otherwise, it is the case that some players cannot be sure exactly what has taken place so far in the game and what their position is.

In game theory, a Perfect Bayesian Equilibrium (PBE) is an equilibrium concept relevant for dynamic games with incomplete information. It is a refinement of Bayesian Nash equilibrium (BNE). A PBE has two components - strategies and beliefs:

In game theory, a sequential game is a game where one player chooses their action before the others choose theirs. Importantly, the later players must have some information of the first's choice, otherwise the difference in time would have no strategic effect. Sequential games hence are governed by the time axis, and represented in the form of decision trees.

The chainstore paradox is an apparent game theory paradox involving the chain store game, where a "deterrence strategy" appears optimal instead of the backward induction strategy of standard game theory reasoning.

In game theory, trembling hand perfect equilibrium is a refinement of Nash equilibrium due to Reinhard Selten. A trembling hand perfect equilibrium is an equilibrium that takes the possibility of off-the-equilibrium play into account by assuming that the players, through a "slip of the hand" or tremble, may choose unintended strategies, albeit with negligible probability.

In game theory, folk theorems are a class of theorems about possible Nash equilibrium payoff profiles in repeated games. The original Folk Theorem concerned the payoffs of all the Nash equilibria of an infinitely repeated game. This result was called the Folk Theorem because it was widely known among game theorists in the 1950s, even though no one had published it. Friedman's (1971) Theorem concerns the payoffs of certain subgame-perfect Nash equilibria (SPE) of an infinitely repeated game, and so strengthens the original Folk Theorem by using a stronger equilibrium concept subgame-perfect Nash equilibria rather than Nash equilibrium.

In game theory, a repeated game is an extensive form game that consists of a number of repetitions of some base game. The stage game is usually one of the well-studied 2-person games. Repeated games capture the idea that a player will have to take into account the impact of his or her current action on the future actions of other players; this impact is sometimes called his or her reputation. Single stage game or single shot game are names for non-repeated games.

In game theory, a Manipulated Nash equilibrium or MAPNASH is a refinement of subgame perfect equilibrium used in dynamic games of imperfect information. Informally, a strategy set is a MAPNASH of a game if it would be a subgame perfect equilibrium of the game if the game had perfect information. MAPNASH were first suggested by Amershi, Sadanand, and Sadanand (1988) and has been discussed in several papers since. It is a solution concept based on how players think about other players' thought processes.

In game theory, a subgame perfect equilibrium is a refinement of a Nash equilibrium used in dynamic games. A strategy profile is a subgame perfect equilibrium if it represents a Nash equilibrium of every subgame of the original game. Informally, this means that if the players played any smaller game that consisted of only one part of the larger game, their behavior would represent a Nash equilibrium of that smaller game. Every finite extensive game with perfect recall has a subgame perfect equilibrium.

Quantal response equilibrium (QRE) is a solution concept in game theory. First introduced by Richard McKelvey and Thomas Palfrey, it provides an equilibrium notion with bounded rationality. QRE is not an equilibrium refinement, and it can give significantly different results from Nash equilibrium. QRE is only defined for games with discrete strategies, although there are continuous-strategy analogues.

A non-credible threat is a term used in game theory and economics to describe a threat in a sequential game that a rational player would actually not carry out, because it would not be in his best interest to do so.

A Markov perfect equilibrium is an equilibrium concept in game theory. It is the refinement of the concept of subgame perfect equilibrium to extensive form games for which a pay-off relevant state space can be readily identified. The term appeared in publications starting about 1988 in the work of economists Jean Tirole and Eric Maskin. It has since been used, among else, in the analysis of industrial organization, macroeconomics and political economy.

Mertens stability is a solution concept used to predict the outcome of a non-cooperative game. A tentative definition of stability was proposed by Elon Kohlberg and Jean-François Mertens for games with finite numbers of players and strategies. Later, Mertens proposed a stronger definition that was elaborated further by Srihari Govindan and Mertens. This solution concept is now called Mertens stability, or just stability.

The one-shot deviation principle is the principle of optimality of dynamic programming applied to game theory. It says that a strategy profile of a finite extensive-form game is a subgame perfect equilibrium (SPE) if and only if there exist no profitable one-shot deviations for each subgame and every player. In simpler terms, if no player can increase their payoffs by deviating a single decision, or period, from their original strategy, then the strategy that they have chosen is a SPE. As a result, no player can profit from deviating from the strategy for one period and then reverting to the strategy.