Functional Decision Theory (FDT) is a school of thought within decision theory which states that, when a rational agent is confronted with a set of possible actions, one should select the decision procedure (a fixed mathematical decision function, as opposed to a singular act) that leads to the best output. [1] [2] It aims to provide a more reliable method to maximize utility — the measure of how much an outcome satisfies an agent's preference — than the more prominent decision theories, Causal Decision Theory (CDT) and Evidential Decision Theory (EDT).
In general, CDT states that the agent should consider the causal effects of their actions to maximize the utility; [3] in other words, it prescribes to act in the way that will produce the best consequences given the situation at hand. [4] EDT states that the agent should look at how likely certain outcomes are given their actions and observations (regardless of causality); in other words, it advises an agent to ‘do what you most want to learn that you will do.’ [5] Many proponents of FDT argue that, since there are some scenarios in which either CDT, EDT or both do not prescribe the most rational choice, both theories are incorrect. [1] [6]
FDT was first proposed by Eliezer Yudkowsky and Nate Soares in a 2017 research paper supported by the Machine Intelligence Research Institute (MIRI). [1] Prior to this publication, Yudkowsky had proposed another, albeit similar, decision theory, which he named Timeless Decision Theory (TDT). [7] Roughly speaking, Timeless Decision Theory states that, rather than acting like you are determining an individual decision, you should act as if you are determining the output of an abstract computation. [8] The original paper and the idea behind TDT was seen as a work in progress, which found much criticism due to its vagueness. [9]
Broadly, FDT can be seen as a replacement of TDT, [10] and a generalization of Wei Dai's Updateless Decision Theory (UDT). [11] [12]
Informally, Functional Decision Theory recommends the agent to select her decision procedure that produces the best outcome. It claims that the agent possesses a model of her decision procedures (which a reliable predictor must also know to high certainty), and which she can alter accordingly. As a visual example, when you type "2 + 2" on a calculator, and receive the answer of "4", you conclude that 2 + 2 = 4 because the calculator runs the same function. Similarly, a predictor, such as that in Newcomblike problems, also runs the same decision function of the agent in order to predict her actions. [2]
This problem shows a scenario in which FDT outperforms both CDT and EDT simultaneously.[ citation needed ] It states that: “An agent is dying in the desert. A driver comes along who offers to give the agent a ride into the city, but only if the agent will agree to visit an ATM once they arrive and give the driver $1,000. The driver will have no way to enforce this after they arrive, but he does have an extraordinary ability to detect lies with 99% accuracy. Being left to die causes the agent to lose the equivalent of $1,000,000. In the case where the agent gets to the city, should she proceed to visit the ATM and pay the driver?” [13]
The CDT agent says no. Given that she has safely arrived in the city, she sees nothing further to gain by paying the driver. The EDT agent agrees: on the assumption that she is already in the city, it would be bad news for her to learn that she was out $1,000. Assuming that the CDT and EDT agents are smart enough to know what they would do upon arriving in the city, this means that neither can honestly claim that they would pay. The driver, detecting the lie, leaves them in the desert to die. The prescriptions of CDT and EDT here run contrary to many people’s intuitions, which say that the most “rational” course of action is to pay upon reaching the city. Certainly if these agents had the opportunity to make binding pre-commitments to pay upon arriving, they would achieve better outcomes.
The FDT agent reasons the driver models her reasoning in order to detect her lies. Therefore, she does pay up, even though she knows she is out of the desert already. While it might seem irrational to pay even though one is already outside of the desert, it is convenient to be the kind of agent that pays up in these kind of scenarios — because it means you, while still in the desert, can honestly claim to pay up once you’re in the city, and therefore it means the driver will take you. [1]
The following dilemma, as stated by Yudkowsky:
A blackmailer has a nasty piece of information which incriminates both the blackmailer and the agent. She has written a computer program which, if run, will publish it on the internet, costing $1,000,000 in damages to both of them. If the program is run, the only way it can be stopped is for the agent to wire the blackmailer $1,000 within 24 hours—the blackmailer will not be able to stop the program once it is running. The blackmailer would like the $1,000, but doesn’t want to risk incriminating herself, so she only runs the program if she is quite sure that the agent will pay up. She is also a perfect predictor of the agent, and she runs the program (which, when run, automatically notifies her via a blackmail letter) if she predicts that she would pay upon receiving the blackmail. Imagine that the agent receives the blackmail letter. Should she wire $1,000 to the blackmailer?
While CDT and EDT would both pay the blackmailer, the FDT agent reasons, “Paying corresponds to a world where I lose $1,000; refusing corresponds to a world where I never get blackmailed (as the blackmailer would have predicted this). The latter looks better, so I refuse.” As such, she never gets blackmailed — her counterfactual reasoning is proven correct, according to Yudkowsky. [1]
In Newcomb's Paradox, an agent finds herself standing in front of a transparent box labeled “A” that contains $1,000, and an opaque box labeled “B” that contains either $1,000,000 or $0. A reliable predictor, who has made similar predictions in the past and has been correct 99% of the time, claims to have placed $1,000,000 in box B if she predicted that the agent would leave box A behind. The predictor has already made her prediction and left. Box B is now empty or full. Should the agent take both boxes (“two-boxing”), or only box B, leaving the transparent box containing $1,000 behind (“one-boxing”)?
Predicted One-Boxing | Predicted Two-Boxing | |
---|---|---|
One-Boxing | $1,000,000 | $0 |
Two-Boxing | $1,001,000 | $1,000 |
An agent using CDT argues that at the moment she is making the decision to one-box or two-box, the predictor has already either put a million dollars or nothing in box B. Her own decision now can't change the predictor's earlier decision; she can't cause the past to be different. Furthermore, no matter what the content of box B actually is, two-boxing gives an extra thousand dollars. The CDT agent therefore two-boxes. In contrast, an EDT agent argues as follows: “If I two-box, the predictor will almost certainly have predicted this. Future-me two-boxing would therefore be strong evidence that box B is empty. If I one box, the predictor will almost certainly have predicted that too — which is why future-me one-boxing would be strong evidence of box B containing a million dollars.” Following this line of reasoning, the EDT agent, in contrast to the CDT agent, one-boxes. [14]
In the case of an FDT agent, she reasons that the predictor must have a model of her decision process. Therefore, it would be best if the FDT agent’s decision procedure would lead to her one-boxing, because then the predictor's model of the FDT agent’s decision procedure would also output one-boxing, leading him to predict the FDT agent will one-box and put a million dollars in box B. Then, since FDT and EDT both one-box they will receive a million, outperforming CDT which only obtain $1,000 by two-boxing. [15]
In general, a Newcomb problem illustrates choice situations in which:
In this variant of the Prisoner's Dilemma, an agent and her twin must both choose to either “cooperate” or “defect.” If both cooperate, they each receive $1,000,000. If both defect, they each receive $1,000. If one cooperates and the other defects, the defector gets $1,001,000 and the cooperator gets nothing. The agent and the twin know that they reason the same way, using the same considerations to come to their conclusions. However, their decisions are causally independent, made in separate rooms without communication. Should the agent cooperate with her twin? [1]
An CDT agent would defect, as she would argue that no matter what action her twin takes, she wins an extra thousand dollars by defecting. She and her twin both reason in this way, and thus they both walk away with $1,000. EDT would prescribe cooperation, on the grounds that it would be good news to learn that one had cooperated as it would provide evidence that the twin also cooperated. [17]
In the case of an FDT agent, she would cooperate, reasoning that since her twin and herself follow the same course of reasoning, if it concludes that cooperation is better, then both cooperate and obtain $1,000,000. If it concludes that defection is better, then both defect and obtain a mere $1,000. Since the former is preferable, and since both twins have the same decision procedure, the course of reasoning therefore concludes cooperation. [2] [1]
In this plot, an agent encounters Death in Damascus and is told that Death is coming for her tomorrow, as it is written in his appointment book — including the location of the event. This agent knows that deciding to flee to Aleppo (at a cost of $1,000) means that Death will be in Aleppo tomorrow, whereas staying in Damascus means that Death will be in Damascus tomorrow. Should she stay, or flee? FDT would suggest staying, as no matter what Death will be waiting for her as it has her decision procedure, therefore it is better to save the extra $1000. However, CDT is put unstable as it bases her decision on the hypothetical that Death's action is independent of his action (as it was already written in the book). [18] [1]
While in Newcomb's Paradox, FDT and EDT outperform CDT, in the Smoking Lesion Problem it is claimed that FDT and CDT outperform EDT.
Consider a hypothetical world where smoking is strongly correlated with lung cancer, but only because there is a common cause – a condition that tends to cause both smoking and cancer. Once we fix the presence or absence of this condition, there is no additional correlation between smoking and cancer. If Susan prefers smoking without cancer to not smoking without cancer, and prefers smoking with cancer to not smoking with cancer, should Susan smoke? [19]
Sick | Healthy | |
---|---|---|
Smoke | 50 | 100 |
Abstain | 25 | 75 |
EDT tells Susan not to smoke[ citation needed ], because it treats the fact that her smoking is evidence that she has the lesion, and therefore is evidence that she is likely to get cancer, as a reason not to smoke. Causal decision theory tells her to smoke, as it does not treat the connection between an action and a bad outcome as a reason not to perform the action, rather it considers that smoking has no causal effect on whether or not one gets cancer. In the case of FDT, whether or not the cancer metastasizes does not depend upon the output of the FDT procedure since there exists no dependence of smoking and cancer, therefore FDT recommends smoking. Since smoking provides more utility to Susan regardless of whether she has cancer or not - something she cannot control - it is viewed that the "correct" answer is to smoke. [19] [2] [1] Nonetheless, there has been discussion whether this truly is the correct approach. [20]
Yudkowsky formalizes Functional Decision Theory per the following formula: [1]
Yudkowsky and Soares assume that an FDT agent is certain that she follows FDT, and this knowledge is held fixed under all counterfactual suppositions. Moreover, decision theorists do not agree on a "correct" or "rational" solution to all of the problems that Yudkowsky and Soares claim that FDT solves. In fact, many would suggest that FDT provides insane recommendations in certain cases, as detailed by Wolfgang Schwarz: [6]
Suppose you have committed an indiscretion that would ruin you if it should become public. You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay! FDT says you should not pay because, if you were the kind of person who doesn't pay, you likely wouldn't have been blackmailed. How is that even relevant? You are being blackmailed. Not being blackmailed isn't on the table. It's not something you can choose.
Similarly, in the variant of Newcomb's Problem where you already know the contents of the million dollar box:
Suppose the you see $1000 in the left box and a million in the right box. If you were to take both boxes, you would get a million and a thousand. If you were to take just the right box, you would get a million. So Causal Decision Theory says you should take [both] boxes. However, you follow FDT, and you are certain that you do. If FDT recommended two-boxing, then any FDT agent throughout history would two-box. And, crucially, the predictor would (probably) have foreseen that you would two-box, so she would have put nothing into the box on the right. As a result, if FDT recommended two-boxing, you would probably end up with $1000. To be sure, you know that there's a million in the box on the right. You can see it. But according to FDT, this is irrelevant. What matters is what would be in the box relative to different assumptions about what FDT recommends. Therefore, FDT recommends to one-box despite the fact that you gain $1000 less. [6]
In general, criticism of Functional Decision Theory can be summarized in the following points of argument. [21]
In welfare economics, a Pareto improvement formalizes the idea of an outcome being "better in every possible way". A change is called a Pareto improvement if it leaves everyone in a society better-off. A situation is called Pareto efficient or Pareto optimal if all possible Pareto improvements have already been made; in other words, there are no longer any ways left to make one person better-off, without making some other person worse-off.
Determinism is the philosophical view that all events in the universe, including human decisions and actions, are causally inevitable. Deterministic theories throughout the history of philosophy have developed from diverse and sometimes overlapping motives and considerations. Like eternalism, determinism focuses on particular events rather than the future as a concept. The opposite of determinism is indeterminism, or the view that events are not deterministically caused but rather occur due to chance. Determinism is often contrasted with free will, although some philosophers claim that the two are compatible.
Quantum suicide is a thought experiment in quantum mechanics and the philosophy of physics. Purportedly, it can falsify any interpretation of quantum mechanics other than the Everett many-worlds interpretation by means of a variation of the Schrödinger's cat thought experiment, from the cat's point of view. Quantum immortality refers to the subjective experience of surviving quantum suicide. This concept is sometimes conjectured to be applicable to real-world causes of death as well.
Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.
In philosophy and mathematics, Newcomb's paradox, also known as Newcomb's problem, is a thought experiment involving a game between two players, one of whom is able to predict the future.
In economics and game theory, a participant is considered to have superrationality if they have perfect rationality but assume that all other players are superrational too and that a superrational individual will always come up with the same strategy as any other superrational thinker when facing the same problem. Applying this definition, a superrational player playing against a superrational opponent in a prisoner's dilemma will cooperate while a rationally self-interested player would defect.
In philosophy, an action is an event that an agent performs for a purpose, that is, guided by the person's intention. The first question in the philosophy of action is to determine how actions differ from other forms of behavior, like involuntary reflexes. According to Ludwig Wittgenstein, it involves discovering "What is left over if I subtract the fact that my arm goes up from the fact that I raise my arm". There is broad agreement that the answer to this question has to do with the agent's intentions. So driving a car is an action since the agent intends to do so, but sneezing is a mere behavior since it happens independent of the agent's intention. The dominant theory of the relation between the intention and the behavior is causalism: driving the car is an action because it is caused by the agent's intention to do so. On this view, actions are distinguished from other events by their causal history. Causalist theories include Donald Davidson's account, which defines actions as bodily movements caused by intentions in the right way, and volitionalist theories, according to which volitions form a core aspect of actions. Non-causalist theories, on the other hand, often see intentions not as the action's cause but as a constituent of it.
Friendly artificial intelligence is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests or contribute to fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.
Moral reasoning is the study of how people think about right and wrong and how they acquire and apply moral rules. It is a subdiscipline of moral psychology that overlaps with moral philosophy, and is the foundation of descriptive ethics.
Libertarianism is one of the main philosophical positions related to the problems of free will and determinism which are part of the larger domain of metaphysics. In particular, libertarianism is an incompatibilist position which argues that free will is logically incompatible with a deterministic universe. Libertarianism states that since agents have free will, determinism must be false and vice versa.
Hard determinism is a view on free will which holds that determinism is true, that it is incompatible with free will, and therefore that free will does not exist. Although hard determinism generally refers to nomological determinism, it can also be a position taken with respect to other forms of determinism that necessitate the future in its entirety.
A temporal paradox, time paradox, or time travel paradox, is a paradox, an apparent contradiction, or logical contradiction associated with the idea of time travel or other foreknowledge of the future. While the notion of time travel to the future complies with the current understanding of physics via relativistic time dilation, temporal paradoxes arise from circumstances involving hypothetical time travel to the past – and are often used to demonstrate its impossibility.
In game theory, a non-cooperative game is a game in which there are no external rules or binding agreements that enforce the cooperation of the players. A non-cooperative game is typically used to model a competitive environment. This is stated in various accounts most prominent being John Nash's 1951 paper in the journal Annals of Mathematics.
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.
Probabilistic causation is a concept in a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory. The central idea behind these theories is that causes raise the probabilities of their effects, all else being equal.
Causal decision theory (CDT) is a school of thought within decision theory which states that, when a rational agent is confronted with a set of possible actions, one should select the action which causes the best outcome in expectation. CDT contrasts with evidential decision theory (EDT), which recommends the action which would be indicative of the best outcome if one received the "news" that it had been taken. In other words, EDT recommends to "do what you most want to learn that you will do."
Evidential decision theory (EDT) is a school of thought within decision theory which states that, when a rational agent is confronted with a set of possible actions, one should select the action with the highest news value, that is, the action which would be indicative of the best outcome in expectation if one received the "news" that it had been taken. In other words, it recommends to "do what you most want to learn that you will do."
In the field of artificial intelligence (AI) design, AI capability control proposals, also referred to as AI confinement, aim to increase our ability to monitor and control the behavior of AI systems, including proposed artificial general intelligences (AGIs), in order to reduce the danger they might pose if misaligned. However, capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases, potentially resulting in an existential risk from AGI. Therefore, the Oxford philosopher Nick Bostrom and others recommend capability control methods only as a supplement to alignment methods.
In philosophy, Pascal's mugging is a thought experiment demonstrating a problem in expected utility maximization. A rational agent should choose actions whose outcomes, when weighted by their probability, have higher utility. But some very unlikely outcomes may have very great utilities, and these utilities can grow faster than the probability diminishes. Hence the agent should focus more on vastly improbable cases with implausibly high rewards; this leads first to counter-intuitive choices, and then to incoherence as the utility of every choice becomes unbounded.
Roko's basilisk is a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement. It originated in a 2010 post at discussion board LessWrong, a technical forum focused on analytical rational enquiry. The thought experiment's name derives from the poster of the article (Roko) and the basilisk, a mythical creature capable of destroying enemies with its stare.
{{cite journal}}
: Cite journal requires |journal=
(help)