Evidential decision theory

Last updated

Evidential decision theory (EDT) is a school of thought within decision theory which states that, when a rational agent is confronted with a set of possible actions, one should select the action with the highest news value, that is, the action which would be indicative of the best outcome in expectation if one received the "news" that it had been taken. In other words, it recommends to "do what you most want to learn that you will do." [1] :7

Contents

EDT contrasts with causal decision theory (CDT), which prescribes taking the action that will causally produce the best outcome. While these two theories agree in many cases, they give different verdicts in certain philosophical thought experiments. For example, EDT prescribes taking only one box in Newcomb's paradox, while CDT recommends taking both boxes. [1] :22–26

Formal description

In a 1976 paper, Allan Gibbard and William Harper distinguished between two kinds of expected utility maximization. EDT proposes to maximize the expected utility of actions computed using conditional probabilities, namely

where is the desirability of outcome and is the conditional probability of given that action occurs. [2] This is in contrast to the counterfactual formulation of expected utility used by causal decision theory

where the expression indicates the probability of outcome in the counterfactual situation in which action is performed. Since and are not always equal, these formulations of expected utility are not equivalent, [2] leading to differences in actions prescribed by EDT and CDT.

Thought experiments

Different decision theories are often examined in their recommendations for action in different thought experiments.

Newcomb's paradox

In Newcomb's paradox, there is a predictor, a player, and two boxes designated A and B. The predictor is able to reliably predict the player's choices— say, with 99% accuracy. The player is given a choice between taking only box B, or taking both boxes A and B. The player knows the following: [3]

The player does not know what the predictor predicted or what box B contains while making the choice. Should the player take both boxes, or only box B?

Evidential decision theory recommends taking only box B in this scenario, because taking only box B is strong evidence that the predictor anticipated that the player would only take box B, and therefore it is very likely that box B contains $1,000,000. Conversely, choosing to take both boxes is strong evidence that the predictor knew that the player would take both boxes; therefore we should expect that box B contains nothing. [1] :22

Conversely, causal decision theory (CDT) would have recommended that the player takes both boxes, because by that time the predictor has already made a prediction (therefore, the action of the player will not affect the outcome).

Formally, the expected utilities in EDT are

Since , EDT recommends taking only box B.

Twin prisoner's dilemma

In this variation on the Prisoner's Dilemma thought experiment, an agent must choose whether to cooperate or defect against her psychological twin, whose reasoning processes are exactly analogous to her own.

Aomame and her psychological twin are put in separate rooms and cannot communicate. If they both cooperate, they each get $5. If they both defect, they each get $1. If one cooperates and the other defects, then one gets $10, and the other gets $0. Assuming Aomame only cares about her individual payout, what should she do? [4]

Evidential decision theory recommends cooperating in this situation, because Aomame's decision to cooperate is strong evidence that her psychological twin will also cooperate, meaning that her expected payoff is $5. On the other hand, if Aomame defects, this would be strong evidence that her twin will also defect, resulting in an expected payoff of $1. Formally, the expected utilities are

Since , EDT recommends cooperating.

Other supporting arguments

Even if one puts less credence on evidential decision theory, it may be reasonable to act as if EDT were true. Namely, because EDT can involve the actions of many correlated decision-makers, its stakes may be higher than causal decision theory and thus take priority. [5]

Criticism

David Lewis has characterized evidential decision theory as promoting "an irrational policy of managing the news". [6] James M. Joyce asserted, "Rational agents choose acts on the basis of their causal efficacy, not their auspiciousness; they act to bring about good results even when doing so might betoken bad news." [7]

See also

Related Research Articles

The prisoner's dilemma is a game theory thought experiment that involves two rational agents, each of whom can cooperate for mutual benefit or betray their partner ("defect") for individual reward. This dilemma was originally framed by Merrill Flood and Melvin Dresher in 1950 while they worked at the RAND Corporation. Albert W. Tucker later formalized the game by structuring the rewards in terms of prison sentences and named it the "prisoner's dilemma".

Pareto efficiency or Pareto optimality is a situation where no action or allocation is available that makes one individual better off without making another worse off. The concept is named after Vilfredo Pareto (1848–1923), Italian civil engineer and economist, who used the concept in his studies of economic efficiency and income distribution. The following three concepts are closely related:

In economics, utility is a measure of the satisfaction that a certain person has from a certain state of the world. Over time, the term has been used in two different meanings.

In philosophy and mathematics, Newcomb's paradox, also known as Newcomb's problem, is a thought experiment involving a game between two players, one of whom is able to predict the future.

<span class="mw-page-title-main">Risk aversion</span> Economics theory

In economics and finance, risk aversion is the tendency of people to prefer outcomes with low uncertainty to those outcomes with high uncertainty, even if the average outcome of the latter is equal to or higher in monetary value than the more certain outcome.

<span class="mw-page-title-main">Prospect theory</span> Theory of behavioral economics and behavioral finance

Prospect theory is a theory of behavioral economics, judgment and decision making that was developed by Daniel Kahneman and Amos Tversky in 1979. The theory was cited in the decision to award Kahneman the 2002 Nobel Memorial Prize in Economics.

In economics and game theory, a participant is considered to have superrationality if they have perfect rationality but assume that all other players are superrational too and that a superrational individual will always come up with the same strategy as any other superrational thinker when facing the same problem. Applying this definition, a superrational player playing against a superrational opponent in a prisoner's dilemma will cooperate while a rationally self-interested player would defect.

The expected utility hypothesis is a foundational assumption in mathematical economics concerning decision making under uncertainty. It postulates that rational agents maximize utility, meaning the subjective desirability of their actions. Rational choice theory, a cornerstone of microeconomics, builds this postulate to model aggregate social behaviour.

In game theory, the centipede game, first introduced by Robert Rosenthal in 1981, is an extensive form game in which two players take turns choosing either to take a slightly larger share of an increasing pot, or to pass the pot to the other player. The payoffs are arranged so that if one passes the pot to one's opponent and the opponent takes the pot on the next round, one receives slightly less than if one had taken the pot on this round, but after an additional switch the potential payoff will be higher. Therefore, although at each round a player has an incentive to take the pot, it would be better for them to wait. Although the traditional centipede game had a limit of 100 rounds, any game with this structure but a different number of rounds is called a centipede game.

In game theory, cheap talk is communication between players that does not directly affect the payoffs of the game. Providing and receiving information is free. This is in contrast to signalling, in which sending certain messages may be costly for the sender depending on the state of the world.

In game theory, normal form is a description of a game. Unlike extensive form, normal-form representations are not graphical per se, but rather represent the game by way of a matrix. While this approach can be of greater use in identifying strictly dominated strategies and Nash equilibria, some information is lost as compared to extensive-form representations. The normal-form representation of a game includes all perceptible and conceivable strategies, and their corresponding payoffs, for each player.

In game theory, the purification theorem was contributed by Nobel laureate John Harsanyi in 1973. The theorem aims to justify a puzzling aspect of mixed strategy Nash equilibria: that each player is wholly indifferent amongst each of the actions he puts non-zero weight on, yet he mixes them so as to make every other player also indifferent.

<span class="mw-page-title-main">Revenue equivalence</span>

Revenue equivalence is a concept in auction theory that states that given certain conditions, any mechanism that results in the same outcomes also has the same expected revenue.

An optimal decision is a decision that leads to at least as good a known or expected outcome as all other available decision options. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a utility value to each of them.

Causal decision theory (CDT) is a school of thought within decision theory which states that, when a rational agent is confronted with a set of possible actions, one should select the action which causes the best outcome in expectation. CDT contrasts with evidential decision theory (EDT), which recommends the action which would be indicative of the best outcome if one received the "news" that it had been taken. In other words, EDT recommends to "do what you most want to learn that you will do."

In decision theory, the von Neumann–Morgenstern (VNM) utility theorem shows that, under certain axioms of rational behavior, a decision-maker faced with risky (probabilistic) outcomes of different choices will behave as if they are maximizing the expected value of some function defined over the potential outcomes at some specified point in the future. This function is known as the von Neumann–Morgenstern utility function. The theorem is the basis for expected utility theory.

<span class="mw-page-title-main">Thompson sampling</span> Type of heuristic technique

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Behavioral game theory seeks to examine how people's strategic decision-making behavior is shaped by social preferences, social utility and other psychological factors. Behavioral game theory analyzes interactive strategic decisions and behavior using the methods of game theory, experimental economics, and experimental psychology. Experiments include testing deviations from typical simplifications of economic theory such as the independence axiom and neglect of altruism, fairness, and framing effects. As a research program, the subject is a development of the last three decades.

Subjective expected relative similarity (SERS) is a normative and descriptive theory that predicts and explains cooperation levels in a family of games termed Similarity Sensitive Games (SSG), among them the well-known Prisoner's Dilemma game (PD). SERS was originally developed in order to (i) provide a new rational solution to the PD game and (ii) to predict human behavior in single-step PD games. It was further developed to account for: (i) repeated PD games, (ii) evolutionary perspectives and, as mentioned above, (iii) the SSG subgroup of 2×2 games. SERS predicts that individuals cooperate whenever their subjectively perceived similarity with their opponent exceeds a situational index derived from the game's payoffs, termed the similarity threshold of the game. SERS proposes a solution to the rational paradox associated with the single step PD and provides accurate behavioral predictions. The theory was developed by Prof. Ilan Fischer at the University of Haifa.

Functional Decision Theory (FDT) is a school of thought within decision theory which states that, when a rational agent is confronted with a set of possible actions, one should select the decision procedure that leads to the best output. It aims to provide a more reliable method to maximize utility — the measure of how much an outcome satisfies an agent's preference — than the more prominent decision theories, Causal Decision Theory (CDT) and Evidential Decision Theory (EDT).

References

  1. 1 2 3 Ahmed, Arif (2021). Evidential Decision Theory. Cambridge University Press. ISBN   9781108607865.
  2. 1 2 Gibbard, A.; Harper, W.L. (1976), Counterfactuals and Two Kinds of Expected Utility (PDF), pp. 7–8
  3. Wolpert, D. H.; Benford, G. (June 2013). "The lesson of Newcomb's paradox". Synthese . 190 (9): 1637–1646. doi:10.1007/s11229-011-9899-3. JSTOR   41931515. S2CID   113227.
  4. Greene, P.; Levinstein, B. (2020). "Act Consequentialism without Free Rides". Philosophical Perspectives. 34 (1): 100–101. doi:10.1111/phpe.12138. S2CID   211161349.
  5. MacAskill, William; Vallinder, Aron; Österheld, Caspar; Shulman, Carl; Treutlein, Johannes (2021), The Evidentialist's Wager (PDF)
  6. Lewis, D. (1981), "Causal decision theory" (PDF), Australasian Journal of Philosophy, 59 (1): 5–30, doi:10.1080/00048408112340011 , retrieved 2009-05-29
  7. Joyce, J.M. (1999), The foundations of causal decision theory, p. 146