Friendly artificial intelligence

Last updated

Friendly artificial intelligence (also friendly AI or FAI) is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests or contribute to fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.

Contents

Etymology and usage

Eliezer Yudkowsky, AI researcher and creator of the term Eliezer Yudkowsky, Stanford 2006 (square crop).jpg
Eliezer Yudkowsky, AI researcher and creator of the term

The term was coined by Eliezer Yudkowsky, [1] who is best known for popularizing the idea, [2] [3] to discuss superintelligent artificial agents that reliably implement human values. Stuart J. Russell and Peter Norvig's leading artificial intelligence textbook, Artificial Intelligence: A Modern Approach , describes the idea: [2]

Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism designto define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.

'Friendly' is used in this context as technical terminology, and picks out agents that are safe and useful, not necessarily ones that are "friendly" in the colloquial sense. The concept is primarily invoked in the context of discussions of recursively self-improving artificial agents that rapidly explode in intelligence, on the grounds that this hypothetical technology would have a large, rapid, and difficult-to-control impact on human society. [4]

Risks of unfriendly AI

The roots of concern about artificial intelligence are very old. Kevin LaGrandeur showed that the dangers specific to AI can be seen in ancient literature concerning artificial humanoid servants such as the golem, or the proto-robots of Gerbert of Aurillac and Roger Bacon. In those stories, the extreme intelligence and power of these humanoid creations clash with their status as slaves (which by nature are seen as sub-human), and cause disastrous conflict. [5] By 1942 these themes prompted Isaac Asimov to create the "Three Laws of Robotics"—principles hard-wired into all the robots in his fiction, intended to prevent them from turning on their creators, or allowing them to come to harm. [6]

In modern times as the prospect of superintelligent AI looms nearer, philosopher Nick Bostrom has said that superintelligent AI systems with goals that are not aligned with human ethics are intrinsically dangerous unless extreme measures are taken to ensure the safety of humanity. He put it this way:

Basically we should assume that a 'superintelligence' would be able to achieve whatever goals it has. Therefore, it is extremely important that the goals we endow it with, and its entire motivation system, is 'human friendly.'

In 2008 Eliezer Yudkowsky called for the creation of "friendly AI" to mitigate existential risk from advanced artificial intelligence. He explains: "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." [7]

Steve Omohundro says that a sufficiently advanced AI system will, unless explicitly counteracted, exhibit a number of basic "drives", such as resource acquisition, self-preservation, and continuous self-improvement, because of the intrinsic nature of any goal-driven systems and that these drives will, "without special precautions", cause the AI to exhibit undesired behavior. [8] [9]

Alexander Wissner-Gross says that AIs driven to maximize their future freedom of action (or causal path entropy) might be considered friendly if their planning horizon is longer than a certain threshold, and unfriendly if their planning horizon is shorter than that threshold. [10] [11]

Luke Muehlhauser, writing for the Machine Intelligence Research Institute, recommends that machine ethics researchers adopt what Bruce Schneier has called the "security mindset": Rather than thinking about how a system will work, imagine how it could fail. For instance, he suggests even an AI that only makes accurate predictions and communicates via a text interface might cause unintended harm. [12]

In 2014, Luke Muehlhauser and Nick Bostrom underlined the need for 'friendly AI'; [13] nonetheless, the difficulties in designing a 'friendly' superintelligence, for instance via programming counterfactual moral thinking, are considerable. [14] [15]

Coherent extrapolated volition

Yudkowsky advances the Coherent Extrapolated Volition (CEV) model. According to him, our coherent extrapolated volition is "our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted". [16]

Rather than a Friendly AI being designed directly by human programmers, it is to be designed by a "seed AI" programmed to first study human nature and then produce the AI which humanity would want, given sufficient time and insight, to arrive at a satisfactory answer. [16] The appeal to an objective through contingent human nature (perhaps expressed, for mathematical purposes, in the form of a utility function or other decision-theoretic formalism), as providing the ultimate criterion of "Friendliness", is an answer to the meta-ethical problem of defining an objective morality; extrapolated volition is intended to be what humanity objectively would want, all things considered, but it can only be defined relative to the psychological and cognitive qualities of present-day, unextrapolated humanity.

Other approaches

Steve Omohundro has proposed a "scaffolding" approach to AI safety, in which one provably safe AI generation helps build the next provably safe generation. [17]

Seth Baum argues that the development of safe, socially beneficial artificial intelligence or artificial general intelligence is a function of the social psychology of AI research communities, and so can be constrained by extrinsic measures and motivated by intrinsic measures. Intrinsic motivations can be strengthened when messages resonate with AI developers; Baum argues that, in contrast, "existing messages about beneficial AI are not always framed well". Baum advocates for "cooperative relationships, and positive framing of AI researchers" and cautions against characterizing AI researchers as "not want(ing) to pursue beneficial designs". [18]

In his book Human Compatible , AI researcher Stuart J. Russell lists three principles to guide the development of beneficial machines. He emphasizes that these principles are not meant to be explicitly coded into the machines; rather, they are intended for the human developers. The principles are as follows: [19] :173

  1. The machine's only objective is to maximize the realization of human preferences.
  2. The machine is initially uncertain about what those preferences are.
  3. The ultimate source of information about human preferences is human behavior.

The "preferences" Russell refers to "are all-encompassing; they cover everything you might care about, arbitrarily far into the future." [19] :173 Similarly, "behavior" includes any choice between options, [19] :177 and the uncertainty is such that some probability, which may be quite small, must be assigned to every logically possible human preference. [19] :201

Public policy

James Barrat, author of Our Final Invention , suggested that "a public-private partnership has to be created to bring A.I.-makers together to share ideas about security—something like the International Atomic Energy Agency, but in partnership with corporations." He urges AI researchers to convene a meeting similar to the Asilomar Conference on Recombinant DNA, which discussed risks of biotechnology. [17]

John McGinnis encourages governments to accelerate friendly AI research. Because the goalposts of friendly AI are not necessarily eminent, he suggests a model similar to the National Institutes of Health, where "Peer review panels of computer and cognitive scientists would sift through projects and choose those that are designed both to advance AI and assure that such advances would be accompanied by appropriate safeguards." McGinnis feels that peer review is better "than regulation to address technical issues that are not possible to capture through bureaucratic mandates". McGinnis notes that his proposal stands in contrast to that of the Machine Intelligence Research Institute, which generally aims to avoid government involvement in friendly AI. [20]

Criticism

Some critics believe that both human-level AI and superintelligence are unlikely, and that therefore friendly AI is unlikely. Writing in The Guardian , Alan Winfield compares human-level artificial intelligence with faster-than-light travel in terms of difficulty, and states that while we need to be "cautious and prepared" given the stakes involved, we "don't need to be obsessing" about the risks of superintelligence. [21] Boyles and Joaquin, on the other hand, argue that Luke Muehlhauser and Nick Bostrom’s proposal to create friendly AIs appear to be bleak. This is because Muehlhauser and Bostrom seem to hold the idea that intelligent machines could be programmed to think counterfactually about the moral values that humans beings would have had. [13] In an article in AI & Society , Boyles and Joaquin maintain that such AIs would not be that friendly considering the following: the infinite amount of antecedent counterfactual conditions that would have to be programmed into a machine, the difficulty of cashing out the set of moral values—that is, those that are more ideal than the ones human beings possess at present, and the apparent disconnect between counterfactual antecedents and ideal value consequent. [14]

Some philosophers claim that any truly "rational" agent, whether artificial or human, will naturally be benevolent; in this view, deliberate safeguards designed to produce a friendly AI could be unnecessary or even harmful. [22] Other critics question whether it is possible for an artificial intelligence to be friendly. Adam Keiper and Ari N. Schulman, editors of the technology journal The New Atlantis , say that it will be impossible to ever guarantee "friendly" behavior in AIs because problems of ethical complexity will not yield to software advances or increases in computing power. They write that the criteria upon which friendly AI theories are based work "only when one has not only great powers of prediction about the likelihood of myriad possible outcomes, but certainty and consensus on how one values the different outcomes. [23]

The inner workings of advanced AI systems may be complex and difficult to interpret, leading to concerns about transparency and accountability. [24]

See also

Related Research Articles

The technological singularity—or simply the singularity—is a hypothetical future point in time at which technological growth becomes uncontrollable and irreversible, resulting in unforeseeable consequences for human civilization. According to the most popular version of the singularity hypothesis, I. J. Good's intelligence explosion model, an upgradable intelligent agent will eventually enter a "runaway reaction" of self-improvement cycles, each new and more intelligent generation appearing more and more rapidly, causing an "explosion" in intelligence and resulting in a powerful superintelligence that qualitatively far surpasses all human intelligence.

<span class="mw-page-title-main">Eliezer Yudkowsky</span> American AI researcher and writer (born 1979)

Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence, including the idea that there might not be a "fire alarm" for AI. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.

<span class="mw-page-title-main">Nick Bostrom</span> Swedish philosopher and writer (born 1973)

Nick Bostrom is a Swedish philosopher at the University of Oxford known for his work on existential risk, the anthropic principle, human enhancement ethics, whole brain emulation, superintelligence risks, and the reversal test. He is the founding director of the Future of Humanity Institute at Oxford University.

<span class="mw-page-title-main">Singularitarianism</span> Belief in an incipient technological singularity

Singularitarianism is a movement defined by the belief that a technological singularity—the creation of superintelligence—will likely happen in the medium future, and that deliberate action ought to be taken to ensure that the singularity benefits humans.

An artificial general intelligence (AGI) is a hypothetical type of intelligent agent which, if realized, could learn to accomplish any intellectual task that human beings or animals can perform. Alternatively, AGI has been defined as an autonomous system that surpasses human capabilities in the majority of economically valuable tasks.Creating AGI is a primary goal of some artificial intelligence research and of companies such as OpenAI, DeepMind, and Anthropic. AGI is a common topic in science fiction and futures studies.

A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems whether or not these high-level intellectual competencies are embodied in agents that act in the world. A superintelligence may or may not be created by an intelligence explosion and associated with a technological singularity.

The Machine Intelligence Research Institute (MIRI), formerly the Singularity Institute for Artificial Intelligence (SIAI), is a non-profit research institute focused since 2005 on identifying and managing potential existential risks from artificial general intelligence. MIRI's work has focused on a friendly AI approach to system design and on predicting the rate of technology development.

<span class="mw-page-title-main">AI takeover</span> Hypothetical artificial intelligence scenario

An AI takeover is a scenario in which artificial intelligence (AI) becomes the dominant form of intelligence on Earth, as computer programs or robots effectively take control of the planet away from the human species. Possible scenarios include replacement of the entire human workforce, takeover by a superintelligent AI, and the popular notion of a robot uprising. Stories of AI takeovers are very popular throughout science fiction. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.

Recursive self-improvement (RSI) is a process in which an early or weak artificial general intelligence (AGI) system enhances its own capabilities and intelligence without human intervention leading to a superintelligence or intelligence explosion.

Differential technological development is a strategy of technology governance aiming to decrease risks from emerging technologies by influencing the sequence in which they are developed. On this strategy, societies would strive to delay the development of harmful technologies and their applications, while accelerating the development of beneficial technologies, especially those that offer protection against the harmful ones.

The ethics of artificial intelligence is the branch of the ethics of technology specific to artificially intelligent systems. It is sometimes divided into a concern with the moral behavior of humans as they design, make, use and treat artificially intelligent systems, and a concern with the behavior of machines, in machine ethics.

In the field of artificial intelligence (AI) design, AI capability control proposals, also referred to as AI confinement, aim to increase our ability to monitor and control the behavior of AI systems, including proposed artificial general intelligences (AGIs), in order to reduce the danger they might pose if misaligned. However, capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases, potentially resulting in an existential risk from AGI. Therefore, the Oxford philosopher Nick Bostrom and others recommend capability control methods only as a supplement to alignment methods.

Machine ethics is a part of the ethics of artificial intelligence concerned with adding or ensuring moral behaviors of man-made machines that use artificial intelligence, otherwise known as artificial intelligent agents. Machine ethics differs from other ethical fields related to engineering and technology. Machine ethics should not be confused with computer ethics, which focuses on human use of computers. It should also be distinguished from the philosophy of technology, which concerns itself with the grander social effects of technology.

<i>Superintelligence: Paths, Dangers, Strategies</i> 2014 book by Nick Bostrom

Superintelligence: Paths, Dangers, Strategies is a 2014 book by the philosopher Nick Bostrom. It explores how superintelligence could be created and what its features and motivations might be. It argues that superintelligence, if created, would be difficult to control, and that it could take over the world in order to accomplish its goals. The book also presents strategies to help make superintelligences whose goals benefit humanity. It was particularly influential for raising concerns about existential risk from artificial intelligence.

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent beings to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.

Existential risk from artificial general intelligence is the idea that substantial progress in artificial general intelligence (AGI) could result in human extinction or an irreversible global catastrophe.

Murray Patrick Shanahan is a professor of Cognitive Robotics at Imperial College London, in the Department of Computing, and a senior scientist at DeepMind. He researches artificial intelligence, robotics, and cognitive science.

<span class="mw-page-title-main">AI aftermath scenarios</span> Overview of AIs possible effects on the human state

Many scholars believe that advances in artificial intelligence, or AI, will eventually lead to a semi-apocalyptic post-scarcity economy where intelligent machines can outperform humans in nearly, if not every, domain. The questions of what such a world might look like, and whether specific scenarios constitute utopias or dystopias, are the subject of active debate.

<i>Human Compatible</i> 2019 book by Stuart J. Russell

Human Compatible: Artificial Intelligence and the Problem of Control is a 2019 non-fiction book by computer scientist Stuart J. Russell. It asserts that the risk to humanity from advanced artificial intelligence (AI) is a serious concern despite the uncertainty surrounding future progress in AI. It also proposes an approach to the AI control problem.

Roko's basilisk is a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement. It originated in a 2010 post at discussion board LessWrong, a technical forum focused on analytical rational enquiry. The thought experiment's name derives from the poster of the article (Roko) and the basilisk, a mythical creature capable of destroying enemies with its stare.

References

  1. Tegmark, Max (2014). "Life, Our Universe and Everything". Our Mathematical Universe: My Quest for the Ultimate Nature of Reality (First ed.). Knopf Doubleday Publishing. ISBN   9780307744258. Its owner may cede control to what Eliezer Yudkowsky terms a "Friendly AI,"...
  2. 1 2 Russell, Stuart; Norvig, Peter (2009). Artificial Intelligence: A Modern Approach. Prentice Hall. ISBN   978-0-13-604259-4.
  3. Leighton, Jonathan (2011). The Battle for Compassion: Ethics in an Apathetic Universe. Algora. ISBN   978-0-87586-870-7.
  4. Wallach, Wendell; Allen, Colin (2009). Moral Machines: Teaching Robots Right from Wrong. Oxford University Press, Inc. ISBN   978-0-19-537404-9.
  5. Kevin LaGrandeur (2011). "The Persistent Peril of the Artificial Slave". Science Fiction Studies. 38 (2): 232. doi:10.5621/sciefictstud.38.2.0232. Archived from the original on January 13, 2023. Retrieved May 6, 2013.
  6. Isaac Asimov (1964). "Introduction" . The Rest of the Robots. Doubleday. ISBN   0-385-09041-2.
  7. Eliezer Yudkowsky (2008). "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (PDF). In Nick Bostrom; Milan M. Ćirković (eds.). Global Catastrophic Risks. pp. 308–345. Archived (PDF) from the original on October 19, 2013. Retrieved October 19, 2013.
  8. Omohundro, S. M. (February 2008). "The basic AI drives". Artificial General Intelligence. 171: 483–492. CiteSeerX   10.1.1.393.8356 .
  9. Bostrom, Nick (2014). "Chapter 7: The Superintelligent Will". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN   9780199678112.
  10. Dvorsky, George (April 26, 2013). "How Skynet Might Emerge From Simple Physics". Gizmodo. Archived from the original on October 8, 2021. Retrieved December 23, 2021.
  11. Wissner-Gross, A. D.; Freer, C. E. (2013). "Causal entropic forces". Physical Review Letters. 110 (16): 168702. Bibcode:2013PhRvL.110p8702W. doi: 10.1103/PhysRevLett.110.168702 . hdl: 1721.1/79750 . PMID   23679649.
  12. Muehlhauser, Luke (July 31, 2013). "AI Risk and the Security Mindset". Machine Intelligence Research Institute. Archived from the original on July 19, 2014. Retrieved July 15, 2014.
  13. 1 2 Muehlhauser, Luke; Bostrom, Nick (December 17, 2013). "Why We Need Friendly AI". Think. 13 (36): 41–47. doi:10.1017/s1477175613000316. ISSN   1477-1756. S2CID   143657841.
  14. 1 2 Boyles, Robert James M.; Joaquin, Jeremiah Joven (July 23, 2019). "Why friendly AIs won't be that friendly: a friendly reply to Muehlhauser and Bostrom". AI & Society. 35 (2): 505–507. doi:10.1007/s00146-019-00903-0. ISSN   0951-5666. S2CID   198190745.
  15. Chan, Berman (March 4, 2020). "The rise of artificial intelligence and the crisis of moral passivity". AI & Society. 35 (4): 991–993. doi:10.1007/s00146-020-00953-9. ISSN   1435-5655. S2CID   212407078. Archived from the original on February 10, 2023. Retrieved January 21, 2023.
  16. 1 2 Eliezer Yudkowsky (2004). "Coherent Extrapolated Volition" (PDF). Singularity Institute for Artificial Intelligence. Archived (PDF) from the original on September 30, 2015. Retrieved September 12, 2015.
  17. 1 2 Hendry, Erica R. (January 21, 2014). "What Happens When Artificial Intelligence Turns On Us?". Smithsonian Magazine. Archived from the original on July 19, 2014. Retrieved July 15, 2014.
  18. Baum, Seth D. (September 28, 2016). "On the promotion of safe and socially beneficial artificial intelligence". AI & Society. 32 (4): 543–551. doi:10.1007/s00146-016-0677-0. ISSN   0951-5666. S2CID   29012168.
  19. 1 2 3 4 Russell, Stuart (October 8, 2019). Human Compatible: Artificial Intelligence and the Problem of Control . United States: Viking. ISBN   978-0-525-55861-3. OCLC   1083694322.
  20. McGinnis, John O. (Summer 2010). "Accelerating AI". Northwestern University Law Review. 104 (3): 1253–1270. Archived from the original on December 1, 2014. Retrieved July 16, 2014.
  21. Winfield, Alan (August 9, 2014). "Artificial intelligence will not turn into a Frankenstein's monster". The Guardian . Archived from the original on September 17, 2014. Retrieved September 17, 2014.
  22. Kornai, András (May 15, 2014). "Bounding the impact of AGI". Journal of Experimental & Theoretical Artificial Intelligence. Informa UK Limited. 26 (3): 417–438. doi:10.1080/0952813x.2014.895109. ISSN   0952-813X. S2CID   7067517. ...the essence of AGIs is their reasoning facilities, and it is the very logic of their being that will compel them to behave in a moral fashion... The real nightmare scenario (is one where) humans find it advantageous to strongly couple themselves to AGIs, with no guarantees against self-deception.
  23. Keiper, Adam; Schulman, Ari N. (Summer 2011). "The Problem with 'Friendly' Artificial Intelligence". The New Atlantis. No. 32. pp. 80–89. Archived from the original on January 15, 2012. Retrieved January 16, 2012.
  24. Norvig, Peter; Russell, Stuart (2010). Artificial Intelligence: A Modern Approach (3rd ed.). Pearson. ISBN   978-0136042594.

Further reading