Author | Brian Christian |
---|---|
Language | English |
Subject | AI alignment |
Publisher | W. W. Norton & Company [1] |
Publication date | October 6, 2020 |
Publication place | United States |
Media type | Print, e-book, audiobook |
Pages | 496 |
ISBN | 0393635821 |
OCLC | 1137850003 |
Website | brianchristian.org/the-alignment-problem/ |
The Alignment Problem: Machine Learning and Human Values is a 2020 non-fiction book by the American writer Brian Christian. It is based on numerous interviews with experts trying to build artificial intelligence systems, particularly machine learning systems, that are aligned with human values.
The book is divided into three sections: Prophecy, Agency, and Normativity. Each section covers researchers and engineers working on different challenges in the alignment of artificial intelligence with human values.
In the first section, Christian interweaves discussions of the history of artificial intelligence research, particularly the machine learning approach of artificial neural networks such as the Perceptron and AlexNet, with examples of how AI systems can have unintended behavior. He tells the story of Julia Angwin, a journalist whose ProPublica investigation of the COMPAS algorithm, a tool for predicting recidivism among criminal defendants, led to widespread criticism of its accuracy and bias towards certain demographics. One of AI's main alignment challenges is its black box nature (inputs and outputs are identifiable but the transformation process in between is undetermined). The lack of transparency makes it difficult to know where the system is going right and where it is going wrong.
In the second section, Christian similarly interweaves the history of the psychological study of reward, such as behaviorism and dopamine, with the computer science of reinforcement learning, in which AI systems need to develop policy ("what to do") in the face of a value function ("what rewards or punishment to expect"). He calls the DeepMind AlphaGo and AlphaZero systems "perhaps the single most impressive achievement in automated curriculum design." He also highlights the importance of curiosity, in which reinforcement learners are intrinsically motivated to explore their environment, rather than exclusively seeking the external reward.
The third section covers training AI through the imitation of human or machine behavior, as well as philosophical debates such as between possibilism and actualism that imply different ideal behavior for AI systems. Of particular importance is inverse reinforcement learning, a broad approach for machines to learn the objective function of a human or another agent. Christian discusses the normative challenges associated with effective altruism and existential risk, including the work of philosophers Toby Ord and William MacAskill who are trying to devise human and machine strategies for navigating the alignment problem as effectively as possible.
The book received positive reviews from critics. The Wall Street Journal 's David A. Shaywitz emphasized the frequent problems when applying algorithms to real-world problems, describing the book as "a nuanced and captivating exploration of this white-hot topic." [2] Publishers Weekly praised the book for its writing and extensive research. [3]
Kirkus Reviews gave the book a positive review, calling it "technically rich but accessible", and "an intriguing exploration of AI." [4] Writing for Nature , Virginia Dignum gave the book a positive review, favorably comparing it to Kate Crawford's Atlas of AI . [5]
In 2021, journalist Ezra Klein had Christian on his podcast, The Ezra Klein Show, writing in The New York Times , "The Alignment Problem is the best book on the key technical and moral questions of A.I. that I’ve read." [6] Later that year, the book was listed in a Fast Company feature, "5 books that inspired Microsoft CEO Satya Nadella this year". [7]
In 2022, the book won the Eric and Wendy Schmidt Award for Excellence in Science Communication, given by The National Academies of Sciences, Engineering, and Medicine in partnership with Schmidt Futures. [8]
In 2024, The New York Times named The Alignment Problem one of the "5 Best Books About Artificial Intelligence," saying: "If you're going to read one book on artificial intelligence, this is the one." [9]
Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and uses learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.
Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Recently, artificial neural networks have been able to surpass many previous approaches in performance.
Friendly artificial intelligence is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests or contribute to fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.
A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems whether or not these high-level intellectual competencies are embodied in agents that act in the world. A superintelligence may or may not be created by an intelligence explosion and associated with a technological singularity.
An AI takeover is an imagined scenario in which artificial intelligence (AI) emerges as the dominant form of intelligence on Earth and computer programs or robots effectively take control of the planet away from the human species, which relies on human intelligence. Stories of AI takeovers remain popular throughout science fiction, but recent advancements have made the threat more real. Possible scenarios include replacement of the entire human workforce due to automation, takeover by a superintelligent AI (ASI), and the notion of a robot uprising. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.
Laws of robotics are any set of laws, rules, or principles, which are intended as a fundamental framework to underpin the behavior of robots designed to have a degree of autonomy. Robots of this degree of complexity do not yet exist, but they have been widely anticipated in science fiction, films and are a topic of active research and development in the fields of robotics and artificial intelligence.
Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.
Brian Christian is an American non-fiction author, poet, programmer and researcher, best known for a bestselling series of books about the human implications of computer science, including The Most Human Human (2011), Algorithms to Live By (2016), and The Alignment Problem (2020).
Google DeepMind Technologies Limited is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.
Instrumental convergence is the hypothetical tendency for most sufficiently intelligent, goal directed beings to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.
Existential risk from artificial general intelligence refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.
In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system may pursue some objectives, but not the intended ones.
Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone is a nonfiction book by Satya Nadella and co-authors Jill Tracie Nichols and Greg Shaw, with a foreword by Bill Gates, published in 2017. Nadella announced that the profits from the book would go to Microsoft Philanthropies and through that to nonprofit organizations.
The Center for Human-Compatible Artificial Intelligence (CHAI) is a research center at the University of California, Berkeley focusing on advanced artificial intelligence (AI) safety methods. The center was founded in 2016 by a group of academics led by Berkeley computer science professor and AI expert Stuart J. Russell. Russell is known for co-authoring the widely used AI textbook Artificial Intelligence: A Modern Approach.
Human Compatible: Artificial Intelligence and the Problem of Control is a 2019 non-fiction book by computer scientist Stuart J. Russell. It asserts that the risk to humanity from advanced artificial intelligence (AI) is a serious concern despite the uncertainty surrounding future progress in AI. It also proposes an approach to the AI control problem.
Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence is a book by Australian academic Kate Crawford. It is based on Crawford's research into the development and labor behind artificial intelligence, as well as AI's impact on the world.
Dan Hendrycks is an American machine learning researcher. He serves as the director of the Center for AI Safety.
Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests. He formerly led the language model alignment team at OpenAI and became founder and head of the non-profit Alignment Research Center (ARC), which works on theoretical AI alignment and evaluations of machine learning models. In 2023, Christiano was named as one of the TIME 100 Most Influential People in AI.