The Alignment Problem

Last updated

The Alignment Problem: Machine Learning and Human Values
The Alignment Problem book cover.jpg
Hardcover edition
Author Brian Christian
LanguageEnglish
Subject AI alignment
Publisher W. W. Norton & Company [1]
Publication date
October 6, 2020
Publication placeUnited States
Media typePrint, e-book, audiobook
Pages496
ISBN 0393635821
OCLC 1137850003
Website brianchristian.org/the-alignment-problem/

The Alignment Problem: Machine Learning and Human Values is a 2020 non-fiction book by the American writer Brian Christian. It is based on numerous interviews with experts trying to build artificial intelligence systems, particularly machine learning systems, that are aligned with human values.

Contents

Summary

The book is divided into three sections: Prophecy, Agency, and Normativity. Each section covers researchers and engineers working on different challenges in the alignment of artificial intelligence with human values.

Prophecy

In the first section, Christian interweaves discussions of the history of artificial intelligence research, particularly the machine learning approach of artificial neural networks such as the Perceptron and AlexNet, with examples of how AI systems can have unintended behavior. He tells the story of Julia Angwin, a journalist whose ProPublica investigation of the COMPAS algorithm, a tool for predicting recidivism among criminal defendants, led to widespread criticism of its accuracy and bias towards certain demographics. One of AI's main alignment challenges is its black box nature (inputs and outputs are identifiable but the transformation process in between is undetermined). The lack of transparency makes it difficult to know where the system is going right and where it is going wrong.

Agency

In the second section, Christian similarly interweaves the history of the psychological study of reward, such as behaviorism and dopamine, with the computer science of reinforcement learning, in which AI systems need to develop policy ("what to do") in the face of a value function ("what rewards or punishment to expect"). He calls the DeepMind AlphaGo and AlphaZero systems "perhaps the single most impressive achievement in automated curriculum design." He also highlights the importance of curiosity, in which reinforcement learners are intrinsically motivated to explore their environment, rather than exclusively seeking the external reward.

Normativity

The third section covers training AI through the imitation of human or machine behavior, as well as philosophical debates such as between possibilism and actualism that imply different ideal behavior for AI systems. Of particular importance is inverse reinforcement learning, a broad approach for machines to learn the objective function of a human or another agent. Christian discusses the normative challenges associated with effective altruism and existential risk, including the work of philosophers Toby Ord and William MacAskill who are trying to devise human and machine strategies for navigating the alignment problem as effectively as possible.

Reception

The book received positive reviews from critics. The Wall Street Journal 's David A. Shaywitz emphasized the frequent problems when applying algorithms to real-world problems, describing the book as "a nuanced and captivating exploration of this white-hot topic." [2] Publishers Weekly praised the book for its writing and extensive research. [3]

Kirkus Reviews gave the book a positive review, calling it "technically rich but accessible", and "an intriguing exploration of AI." [4] Writing for Nature , Virginia Dignum gave the book a positive review, favorably comparing it to Kate Crawford's Atlas of AI . [5]

In 2021, journalist Ezra Klein had Christian on his podcast, The Ezra Klein Show, writing in The New York Times , "The Alignment Problem is the best book on the key technical and moral questions of A.I. that I’ve read." [6] Later that year, the book was listed in a Fast Company feature, "5 books that inspired Microsoft CEO Satya Nadella this year". [7]

In 2022, the book won the Eric and Wendy Schmidt Award for Excellence in Science Communication, given by The National Academies of Sciences, Engineering, and Medicine in partnership with Schmidt Futures. [8]

In 2024, The New York Times named The Alignment Problem one of the "5 Best Books About Artificial Intelligence," saying: "If you're going to read one book on artificial intelligence, this is the one." [9]

See also

Related Research Articles

Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances in the field of deep learning have allowed neural networks, a class of statistical algorithms, to surpass many previous machine learning approaches in performance.

Friendly artificial intelligence is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests such as fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.

A superintelligence is a hypothetical agent that possesses intelligence surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems whether or not these high-level intellectual competencies are embodied in agents that act in the world. A superintelligence may or may not be created by an intelligence explosion and associated with a technological singularity.

<span class="mw-page-title-main">AI takeover</span> Hypothetical outcome of artificial intelligence

An AI takeover is an imagined scenario in which artificial intelligence (AI) emerges as the dominant form of intelligence on Earth and computer programs or robots effectively take control of the planet away from the human species, which relies on human intelligence. Possible scenarios include replacement of the entire human workforce due to automation, takeover by an artificial superintelligence (ASI), and the notion of a robot uprising. Stories of AI takeovers have been popular throughout science fiction, but recent advancements have made the threat more real. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.

Laws of robotics are any set of laws, rules, or principles, which are intended as a fundamental framework to underpin the behavior of robots designed to have a degree of autonomy. Robots of this degree of complexity do not yet exist, but they have been widely anticipated in science fiction, films and are a topic of active research and development in the fields of robotics and artificial intelligence.

<span class="mw-page-title-main">Intelligent agent</span> Software agent which acts autonomously

In intelligence and artificial intelligence, an intelligent agent (IA) is an agent that perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge. Agentic AI, a subset of intelligent agents, expands this concept by autonomously pursuing goals, making decisions, and taking actions over extended periods, effectively embodying a novel form of digital agency.

<span class="mw-page-title-main">History of artificial intelligence</span>

The history of artificial intelligence (AI) began in antiquity, with myths, stories, and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The study of logic and formal reasoning from antiquity to the present led directly to the invention of the programmable digital computer in the 1940s, a machine based on abstract mathematical reasoning. This device and the ideas behind it inspired scientists to begin discussing the possibility of building an electronic brain.

<span class="mw-page-title-main">Eric Horvitz</span> American computer scientist, and Technical Fellow at Microsoft

Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.

<span class="mw-page-title-main">Brian Christian</span> American non-fiction author and researcher

Brian Christian is an American non-fiction author, poet, programmer and researcher, best known for a bestselling series of books about the human implications of computer science, including The Most Human Human (2011), Algorithms to Live By (2016), and The Alignment Problem (2020).

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent, goal-directed beings to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.

Existential risk from artificial intelligence refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

In the field of artificial intelligence (AI), AI alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.

<i>Hit Refresh</i> Nonfiction book by Satya Nadella

Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone is a nonfiction book by Satya Nadella and co-authors Jill Tracie Nichols and Greg Shaw, with a foreword by Bill Gates, published in 2017. Nadella announced that the profits from the book would go to Microsoft Philanthropies and through that to nonprofit organizations.

<span class="mw-page-title-main">Center for Human-Compatible Artificial Intelligence</span> US AI safety research center

The Center for Human-Compatible Artificial Intelligence (CHAI) is a research center at the University of California, Berkeley focusing on advanced artificial intelligence (AI) safety methods. The center was founded in 2016 by a group of academics led by Berkeley computer science professor and AI expert Stuart J. Russell. Russell is known for co-authoring the widely used AI textbook Artificial Intelligence: A Modern Approach.

<i>Human Compatible</i> 2019 book by Stuart J. Russell

Human Compatible: Artificial Intelligence and the Problem of Control is a 2019 non-fiction book by computer scientist Stuart J. Russell. It asserts that the risk to humanity from advanced artificial intelligence (AI) is a serious concern despite the uncertainty surrounding future progress in AI. It also proposes an approach to the AI control problem.

<i>Atlas of AI</i> 2021 nonfiction book

Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence is a book by Australian academic Kate Crawford. It is based on Crawford's research into the development and labor behind artificial intelligence, as well as AI's impact on the world.

Dan Hendrycks is an American machine learning researcher. He serves as the director of the Center for AI Safety a nonprofit organization based in San Francisco, California.

Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests. He serves as the Head of Safety for the U.S. Artificial Intelligence Safety Institute inside NIST. He formerly led the language model alignment team at OpenAI and became founder and head of the non-profit Alignment Research Center (ARC), which works on theoretical AI alignment and evaluations of machine learning models. In 2023, Christiano was named as one of the TIME 100 Most Influential People in AI.

References

  1. "The Alignment Problem". W. W. Norton & Company .
  2. Shaywitz, David (25 October 2020). "'The Alignment Problem' Review: When Machines Miss the Point". The Wall Street Journal . Retrieved 5 December 2021.
  3. "Nonfiction Book Review: The Alignment Problem: Machine Learning and Human Values by Brian Christian. Norton, $27.95 (356p) ISBN 978-0-393-63582-9". PublishersWeekly.com. Retrieved 20 January 2022.
  4. THE ALIGNMENT PROBLEM | Kirkus Reviews.
  5. Dignum, Virginia (26 May 2021). "AI — the people and places that make, use and manage it". Nature. 593 (7860): 499–500. Bibcode:2021Natur.593..499D. doi: 10.1038/d41586-021-01397-x . S2CID   235216649.
  6. Klein, Ezra (4 June 2021). "If 'All Models Are Wrong,' Why Do We Give Them So Much Power?". The New York Times . Retrieved 5 December 2021.
  7. Nadella, Satya (15 November 2021). "5 books that inspired Microsoft CEO Satya Nadella this year". Fast Company . Retrieved 5 December 2021.
  8. "Winners - Eric and Wendy Schmidt Awards for Excellence in Science Communication - National Academies". National Academies . 12 October 2022. Retrieved 21 October 2022.
  9. Marche, Stephen (31 January 2024). "5 Best Books About Artificial Intelligence". New York Times . Retrieved 6 February 2024.