Paul Christiano (researcher)

Last updated

Paul Christiano
Alma mater
Known for
Scientific career
Institutions
Website paulfchristiano.com

Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests. [1] He formerly led the language model alignment team at OpenAI and became founder and head of the non-profit Alignment Research Center (ARC), which works on theoretical AI alignment and evaluations of machine learning models. [2] [3] In 2023, Christiano was named as one of the TIME 100 Most Influential People in AI (TIME100 AI). [3] [4]

Contents

In September 2023, Christiano was appointed to the UK government's Frontier AI Taskforce advisory board. [5] He is also an initial trustee on Anthropic's Long-Term Benefit Trust. [6]

Education

Christiano attended the Harker School in San Jose, California. [7] He competed on the U.S. team and won a silver medal at the 49th International Mathematics Olympiad (IMO) in 2008. [7] [8]

In 2012, Christiano graduated from the Massachusetts Institute of Technology (MIT) with a degree in mathematics. [9] [10] At MIT, he researched data structures, quantum cryptography, and combinatorial optimization. [10]

He then went on to complete a PhD at the University of California, Berkeley. [11] While at Berkeley, Christiano collaborated with researcher Katja Grace on AI Impacts, co-developing a preliminary methodology for comparing supercomputers to brains, using traversed edges per second (TEPS). [12] He also experimented with putting Carl Shulman's donor lottery theory into practice, raising nearly $50,000 in a pool to be donated to a single charity. [13]

Career

At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF). [14] [15] He is considered one of the principal architects of RLHF, [3] [6] which in 2017 was "considered a notable step forward in AI safety research", according to The New York Times . [16] Other works such as "AI safety via debate" (2018) focus on the problem of scalable oversight – supervising AIs in domains where humans would have difficulty judging output quality. [17] [18] [19]

Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment and subsequently founded the Alignment Research Center to focus on this area. [1] One subject of study is the problem of eliciting latent knowledge from advanced machine learning models. [20] [21] ARC also develops techniques to identify and test whether an AI model is potentially dangerous. [3] In April 2023, Christiano told The Economist that ARC was considering developing an industry standard for AI safety. [22]

Views on AI risks

He is known for his views on the potential risks of advanced AI. In 2017, Wired magazine stated that Christiano and his colleagues at OpenAI weren't worried about the destruction of the human race by "evil robots", explaining that "[t]hey’re more concerned that, as AI progresses beyond human comprehension, the technology’s behavior may diverge from our intended goals." [23]

However, in a widely quoted interview with Business Insider in 2023, Christiano said that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.” [24] [1]

Personal life

Christiano is married to Ajeya Cotra of Open Philanthropy. [25]

Related Research Articles

Anthropic PBC is an American artificial intelligence (AI) startup company, founded by former members of OpenAI. Anthropic has developed a family of large language models named Claude.

In artificial intelligence, apprenticeship learning is the process of learning by observing an expert. It can be viewed as a form of supervised learning, where the training dataset consists of task executions by a demonstration teacher.

<span class="mw-page-title-main">Google DeepMind</span> Artificial intelligence division

DeepMind Technologies Limited, doing business as Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014. The company is based in London, with research centres in Canada, France, Germany, and the United States.

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent beings to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.

<span class="mw-page-title-main">Domain adaptation</span> Field associated with machine learning and transfer learning

Domain adaptation is a field associated with machine learning and transfer learning. This scenario arises when we aim at learning a model from a source data distribution and applying that model on a different target data distribution. For instance, one of the tasks of the common spam filtering problem consists in adapting a model from one user to a new user who receives significantly different emails. Domain adaptation has also been shown to be beneficial for learning unrelated sources. Note that, when more than one source distribution is available the problem is referred to as multi-source domain adaptation.

Existential risk from artificial general intelligence is the idea that substantial progress in artificial general intelligence (AGI) could result in human extinction or an irreversible global catastrophe.

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S. based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI spring, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI spring.

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system may pursue some objectives, but not the intended ones.

<span class="mw-page-title-main">Thomas G. Dietterich</span>

Thomas G. Dietterich is emeritus professor of computer science at Oregon State University. He is one of the pioneers of the field of machine learning. He served as executive editor of Machine Learning (journal) (1992–98) and helped co-found the Journal of Machine Learning Research. In response to the media's attention on the dangers of artificial intelligence, Dietterich has been quoted for an academic perspective to a broad range of media outlets including National Public Radio, Business Insider, Microsoft Research, CNET, and The Wall Street Journal.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

<span class="mw-page-title-main">Multi-agent reinforcement learning</span> Sub-field of reinforcement learning

Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does actions to advance its own interests; in some environments these interests are opposed to the interests of other agents, resulting in complex group dynamics.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL·E, DALL·E 2, and DALL·E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions, called "prompts."

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on external labels provided by humans. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving it requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples. One sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is considered to be a part of the ongoing AI boom.

AI safety is an interdisciplinary field concerned with preventing accidents, misuse, or other harmful consequences that could result from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, monitoring AI systems for risks and making them highly reliable. Beyond AI research, it involves developing norms and policies that promote safety.

Sparrow is a chatbot developed by the artificial intelligence research lab DeepMind, a subsidiary of Alphabet Inc. It is designed to answer users' questions correctly, while reducing the risk of unsafe and inappropriate answers. One motivation behind Sparrow is to address the problem of language models producing incorrect, biased or potentially harmful outputs. Sparrow is trained using human judgements, in order to be more “Helpful, Correct and Harmless” compared to baseline pre-trained language models. The development of Sparrow involved asking paid study participants to interact with Sparrow, and collecting their preferences to train a model of how useful an answer is.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">Reinforcement learning from human feedback</span> Machine learning technique

In machine learning, reinforcement learning from human feedback (RLHF), also known as reinforcement learning from human preferences, is a technique to align an intelligent agent to human preferences. In classical reinforcement learning, the goal of such an agent is to learn a function that guides its behavior called a policy. This function learns to maximize the reward it receives from a separate reward function based on its task performance. In the case of human preferences, however, it tends to be difficult to define explicitly a reward function that approximates human preferences. Therefore, RLHF seeks to train a "reward model" directly from human feedback. The reward model is first trained in a supervised fashion—independently from the policy being optimized—to predict if a response to a given prompt is good or bad based on ranking data collected from human annotators. This model is then used as a reward function to improve an agent's policy through an optimization algorithm like proximal policy optimization.

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

The Alignment Research Center (ARC) is a nonprofit research institute based in Berkeley, California, dedicated to the alignment of advanced artificial intelligence with human values and priorities. Established by former OpenAI researcher Paul Christiano, ARC focuses on recognizing and comprehending the potentially harmful capabilities of present-day AI models.

References

  1. 1 2 3 "A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns". Fortune. Retrieved June 4, 2023.
  2. Piper, Kelsey (March 29, 2023). "How to test what an AI model can — and shouldn't — do". Vox. Retrieved August 4, 2023.
  3. 1 2 3 4 Henshall, Will (September 7, 2023). "Paul Christiano – Founder, Alignment Research Center". TIME magazine . Retrieved November 16, 2023.
  4. Sibley, Jess (September 10, 2023). "The Future Is Now". Time magazine . Vol. 202, no. 11/12. Retrieved November 16, 2023 via EBSCOHost.
  5. Skelton, Sebastian Klovig (September 7, 2023). "Government AI taskforce appoints new advisory board members". ComputerWeekly.com. Retrieved November 16, 2023.
  6. 1 2 Matthews, Dylan (September 25, 2023). "The $1 billion gamble to ensure AI doesn't destroy humanity". Vox . Retrieved November 16, 2023.
  7. 1 2 Kehoe, Elaine (October 2008). "Mathematics People – 2008 International Mathematical Olympiad" (PDF). American Mathematical Society. Retrieved November 16, 2023.
  8. Feng, Zumin; Gelca, Razvan; Le, Ian; Dunbar, Steven R. (June 2009). "NEWS AND LETTERS: 49th International Mathematical Olympiad". Mathematics Magazine. 82 (e): 235–238. JSTOR   27765911.
  9. "Paul F. Christiano". Association for Computing Machinery Digital Library. Retrieved November 16, 2023.
  10. 1 2 "About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science" . Retrieved November 16, 2023.
  11. "Paul Christiano – Research Associate". The Future of Humanity Institute. Retrieved August 4, 2023.
  12. Hsu, Jeremy (August 26, 2015). "Estimate: Human Brain 30 Times Faster than Best Supercomputers". IEEE Spectrum . Retrieved November 16, 2023.
  13. Paynter, Ben (January 31, 2017). "Take A Chance With Your Charity And Try A Donor Lottery". Fast Company . Retrieved November 16, 2023.
  14. Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  15. Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (December 6, 2022). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv: 2203.02155 .
  16. Metz, Cade (August 13, 2017). "Teaching A.I. Systems to Behave Themselves" . The New York Times . Retrieved November 16, 2023.
  17. Irving, G.; Christiano, P.; Amodei, Dario (May 2, 2018). "AI safety via debate". arXiv: 1805.00899 [stat.ML].
  18. Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nissan; Lowe, Ryan; Leike, J.; Christiano, P. (September 22, 2021). "Recursively Summarizing Books with Human Feedback". arXiv: 2109.10862 [cs.CL].
  19. Christiano, P.; Shlegeris, Buck; Amodei, Dario (October 19, 2018). "Supervising strong learners by amplifying weak experts". arXiv: 1810.08575 [cs.LG].
  20. Burns, Collin; Ye, Haotian; Klein, Dan; Steinhardt, Jacob (2022). "Discovering Latent Knowledge in Language Models Without Supervision". arXiv: 2212.03827 [cs.CL].
  21. Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved April 16, 2023.
  22. "How generative models could go wrong". The Economist . April 19, 2023. Retrieved November 16, 2023.
  23. Newman, Lily Hay (September 2017). "Should We Worry? – Will AI Turn Against Me?". Wired . Retrieved November 16, 2023.
  24. Nolan, Beatrice. "Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom'". Business Insider. Retrieved June 4, 2023.
  25. Piper, Kelsey (June 2023). "A Field Guide to AI Safety". Asterisk Magazine. No. 3. Retrieved November 16, 2023.