The Alignment Problem

Last updated

The Alignment Problem: Machine Learning and Human Values
The Alignment Problem book cover.jpg
Hardcover edition
Author Brian Christian
CountryUnited States
LanguageEnglish
Subject AI alignment
Publisher W. W. Norton & Company [1]
Publication date
October 6, 2020
Media typePrint, e-book, audiobook
Pages496
ISBN 0393635821
OCLC 1137850003
Website brianchristian.org/the-alignment-problem/

The Alignment Problem: Machine Learning and Human Values is a 2020 non-fiction book by the American writer Brian Christian. It is based on numerous interviews with experts trying to build artificial intelligence systems, particularly machine learning systems, that are aligned with human values.

Contents

Summary

The book is divided into three sections: Prophecy, Agency, and Normativity. Each section covers researchers and engineers working on different challenges in the alignment of artificial intelligence with human values.

Prophecy

In the first section, Christian interweaves discussions of the history of artificial intelligence research, particularly the machine learning approach of artificial neural networks such as the Perceptron and AlexNet, with examples of how AI systems can have unintended behavior. He tells the story of Julia Angwin, a journalist whose ProPublica investigation of the COMPAS algorithm, a tool for predicting recidivism among criminal defendants, led to widespread criticism of its accuracy and bias towards certain demographics. One of AI's main alignment challenges is its black box nature (inputs and outputs are identifiable but the transformation process in between is undetermined). The lack of transparency makes it difficult to know where the system is going right and where it is going wrong.

Agency

In the second section, Christian similarly interweaves the history of the psychological study of reward, such as behaviorism and dopamine, with the computer science of reinforcement learning, in which AI systems need to develop policy ("what to do") in the face of a value function ("what rewards or punishment to expect"). He calls the DeepMind AlphaGo and AlphaZero systems "perhaps the single most impressive achievement in automated curriculum design." He also highlights the importance of curiosity, in which reinforcement learners are intrinsically motivated to explore their environment, rather than exclusively seeking the external reward.

Normativity

The third section covers training AI through the imitation of human or machine behavior, as well as philosophical debates such as between possibilism and actualism that imply different ideal behavior for AI systems. Of particular importance is inverse reinforcement learning, a broad approach for machines to learn the objective function of a human or another agent. Christian discusses the normative challenges associated with effective altruism and existential risk, including the work of philosophers Toby Ord and William MacAskill who are trying to devise human and machine strategies for navigating the alignment problem as effectively as possible.

Reception

The book received positive reviews from critics. The Wall Street Journal 's David A. Shaywitz emphasized the frequent problems when applying algorithms to real-world problems, describing the book as "a nuanced and captivating exploration of this white-hot topic." [2] Publishers Weekly praised the book for its writing and extensive research. [3]

Kirkus Reviews gave the book a positive review, calling it "technically rich but accessible", and "an intriguing exploration of AI." [4] Writing for Nature , Virginia Dignum gave the book a positive review, favorably comparing it to Kate Crawford's Atlas of AI . [5]

In 2021, journalist Ezra Klein had Christian on his podcast, The Ezra Klein Show, writing in The New York Times , "The Alignment Problem is the best book on the key technical and moral questions of A.I. that I’ve read." [6] Later that year, the book was listed in a Fast Company feature, "5 books that inspired Microsoft CEO Satya Nadella this year". [7]

In 2024, The New York Times named The Alignment Problem one of the "5 Best Books About Artificial Intelligence," saying: "If you're going to read one book on artificial intelligence, this is the one." [8]

See also

Related Research Articles

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or other living beings. It is a field of study in computer science that develops and studies intelligent machines. Such machines may be called AIs.

<span class="mw-page-title-main">Eliezer Yudkowsky</span> American AI researcher and writer (born 1979)

Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence, including the idea that there might not be a "fire alarm" for AI. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance.

<span class="mw-page-title-main">Symbolic artificial intelligence</span> Methods in artificial intelligence research

In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems, symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems.

Friendly artificial intelligence is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests or contribute to fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.

A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems whether or not these high-level intellectual competencies are embodied in agents that act in the world. A superintelligence may or may not be created by an intelligence explosion and associated with a technological singularity.

<span class="mw-page-title-main">AI takeover</span> Hypothetical artificial intelligence scenario

An AI takeover is a scenario in which artificial intelligence (AI) becomes the dominant form of intelligence on Earth, as computer programs or robots effectively take control of the planet away from the human species. Possible scenarios include replacement of the entire human workforce, takeover by a superintelligent AI, and the popular notion of a robot uprising. Stories of AI takeovers are very popular throughout science fiction. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.

Laws of robotics are any set of laws, rules, or principles, which are intended as a fundamental framework to underpin the behavior of robots designed to have a degree of autonomy. Robots of this degree of complexity do not yet exist, but they have been widely anticipated in science fiction, films and are a topic of active research and development in the fields of robotics and artificial intelligence.

<span class="mw-page-title-main">Eric Horvitz</span> American computer scientist, and Technical Fellow at Microsoft

Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.

<span class="mw-page-title-main">Brian Christian</span> American non-fiction author and poet

Brian Christian is an American non-fiction author, poet, programmer and researcher, best known for a bestselling series of books about the human implications of computer science, including The Most Human Human (2011), Algorithms to Live By (2016), and The Alignment Problem (2020).

<span class="mw-page-title-main">Google DeepMind</span> Artificial intelligence division

DeepMind Technologies Limited, doing business as Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014, The company is based in London, with research centres in Canada, France, Germany and the United States.

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent beings to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.

<span class="mw-page-title-main">Francesca Rossi</span> Italian computer scientist

Francesca Rossi is an Italian computer scientist, currently working at the IBM Thomas J. Watson Research Center as an IBM Fellow and the IBM AI Ethics Global Leader.

Existential risk from artificial general intelligence is the idea that substantial progress in artificial general intelligence (AGI) could result in human extinction or an irreversible global catastrophe.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system pursues some objectives, but not the intended ones.

<i>Hit Refresh</i> Nonfiction book by Satya Nadella

Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone is a nonfiction book by Satya Nadella and co-authors Jill Tracie Nichols and Greg Shaw, with a foreword by Bill Gates, published in 2017. Nadella announced that the profits from the book would go to Microsoft Philanthropies and through that to nonprofit organizations.

<span class="mw-page-title-main">Center for Human-Compatible Artificial Intelligence</span> AI safety research center

The Center for Human-Compatible Artificial Intelligence (CHAI) is a research center at the University of California, Berkeley focusing on advanced artificial intelligence (AI) safety methods. The center was founded in 2016 by a group of academics led by Berkeley computer science professor and AI expert Stuart J. Russell. Russell is known for co-authoring the widely used AI textbook Artificial Intelligence: A Modern Approach.

<i>Human Compatible</i> 2019 book by Stuart J. Russell

Human Compatible: Artificial Intelligence and the Problem of Control is a 2019 non-fiction book by computer scientist Stuart J. Russell. It asserts that the risk to humanity from advanced artificial intelligence (AI) is a serious concern despite the uncertainty surrounding future progress in AI. It also proposes an approach to the AI control problem.

<i>Atlas of AI</i> 2021 nonfiction book

Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence is a book by Australian academic Kate Crawford. It is based on Crawford's research into the development and labor behind artificial intelligence, as well as AI's impact on the world.

References

  1. "The Alignment Problem". W. W. Norton & Company .
  2. Shaywitz, David (25 October 2020). "'The Alignment Problem' Review: When Machines Miss the Point". The Wall Street Journal . Retrieved 5 December 2021.
  3. "Nonfiction Book Review: The Alignment Problem: Machine Learning and Human Values by Brian Christian. Norton, $27.95 (356p) ISBN 978-0-393-63582-9". PublishersWeekly.com. Retrieved 20 January 2022.
  4. THE ALIGNMENT PROBLEM | Kirkus Reviews.
  5. Dignum, Virginia (26 May 2021). "AI — the people and places that make, use and manage it". Nature. 593 (7860): 499–500. Bibcode:2021Natur.593..499D. doi: 10.1038/d41586-021-01397-x . S2CID   235216649.
  6. Klein, Ezra (4 June 2021). "If 'All Models Are Wrong,' Why Do We Give Them So Much Power?". The New York Times . Retrieved 5 December 2021.
  7. Nadella, Satya (15 November 2021). "5 books that inspired Microsoft CEO Satya Nadella this year". Fast Company . Retrieved 5 December 2021.
  8. Marche, Stephen (31 January 2024). "5 Best Books About Artificial Intelligence". New York Times . Retrieved 6 February 2024.