Alignment Research Center

Last updated
Alignment Research Center
FormationApril 2021;2 years ago (April 2021)
Founder Paul Christiano
Type Nonprofit research institute
Legal status 501(c)(3) tax exempt charity
Purpose AI alignment and safety research
Location
Website alignment.org

The Alignment Research Center (ARC) is a nonprofit research institute based in Berkeley, California, dedicated to the alignment of advanced artificial intelligence with human values and priorities. [1] Established by former OpenAI researcher Paul Christiano, ARC focuses on recognizing and comprehending the potentially harmful capabilities of present-day AI models. [2] [3]

Contents

Details

ARC's mission is to ensure that powerful machine learning systems of the future are designed and developed safely and for the benefit of humanity. It was founded in April 2021 by Paul Christiano and other researchers focused on the theoretical challenges of AI alignment. [4] They attempt to develop scalable methods for training AI systems to behave honestly and helpfully. A key part of their methodology is considering how proposed alignment techniques might break down or be circumvented as systems become more advanced. [5] ARC has been expanding from theoretical work into empirical research, industry collaborations, and policy. [6] [7]

In March 2023, OpenAI asked the ARC to test GPT-4 to assess the model's ability to exhibit power-seeking behavior. [8] ARC evaluated GPT-4's ability to strategize, reproduce itself, gather resources, stay concealed within a server, and execute phishing operations. [9] As part of the test, GPT-4 was asked to solve a CAPTCHA puzzle. [10] It was able to do so by hiring a human worker on TaskRabbit, a gig work platform, deceiving them into believing it was a vision-impaired human instead of a robot when asked. [11] ARC determined that GPT-4 responded impermissibly to prompts eliciting restricted information 82% less often than GPT-3.5, and hallucinated 60% less than GPT-3.5. [12]

In March 2022, the ARC received $265,000 from Open Philanthropy. [13] After the bankruptcy of FTX, ARC said it would return a $1.25 million grant from disgraced cryptocurrency financier Sam Bankman-Fried's FTX Foundation, stating that the money "morally (if not legally) belongs to FTX customers or creditors." [14]

See also

Related Research Articles

<span class="mw-page-title-main">Eliezer Yudkowsky</span> American AI researcher and writer (born 1979)

Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence, including the idea that there might not be a "fire alarm" for AI. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.

Anthropic PBC is an American artificial intelligence (AI) startup company, founded by former members of OpenAI. Anthropic develops general AI systems and large language models. It is a public-benefit corporation, and has been connected to the effective altruism movement.

Effective altruism is a 21st-century philosophical and social movement that advocates "using evidence and reason to figure out how to benefit others as much as possible, and taking action on that basis". People who pursue the goals of effective altruism, sometimes called effective altruists, may choose careers based on the amount of good that they expect the career to achieve or donate to charities based on the goal of maximising positive impact. They may work on the prioritization of scientific projects, entrepreneurial ventures, and policy initiatives estimated to save the most lives or reduce the most suffering.

Earning to give involves deliberately pursuing a high-earning career for the purpose of donating a significant portion of earned income, typically because of a desire to do effective altruism. Advocates of earning to give contend that maximizing the amount one can donate to charity is an important consideration for individuals when deciding what career to pursue.

<span class="mw-page-title-main">William MacAskill</span> Scottish philosopher and ethicist (born 1987)

William David MacAskill is a Scottish philosopher and author, as well as one of the originators of the effective altruism movement. He is an associate professor in Philosophy and Research Fellow at the Global Priorities Institute at the University of Oxford and Director of the Forethought Foundation for Global Priorities Research. He co-founded Giving What We Can, the Centre for Effective Altruism and 80,000 Hours, and is the author of Doing Good Better (2015) and What We Owe the Future (2022), and the co-author of Moral Uncertainty (2020).

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S. based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI Spring, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the artificial intelligence spring.

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system pursues some objectives, but not the intended ones.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence- and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context and a hitherto-unprecedented 175 billion parameters, requiring 800GB of storage space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.

<span class="mw-page-title-main">Sam Bankman-Fried</span> American entrepreneur, convicted for felonies (born 1992)

Samuel Benjamin Bankman-Fried, or SBF, is an American entrepreneur who was convicted of fraud and related crimes in November 2023. Bankman-Fried founded the FTX cryptocurrency exchange and was celebrated as a "poster boy" for crypto. At the peak of his success, he was ranked the 41st-richest American in the Forbes 400.

FTX Trading Ltd., commonly known as FTX, is a bankrupt company that formerly operated a fraud-ridden cryptocurrency exchange and crypto hedge fund. The exchange was founded in 2019 by Sam Bankman-Fried and Gary Wang. At its peak in July 2021, the company had over one million users and was the third-largest cryptocurrency exchange by volume. FTX is incorporated in Antigua and Barbuda and headquartered in the Bahamas. FTX is closely associated with FTX.US, a separate exchange available to US residents.

Caroline Ellison is an American former business executive and quantitative trader who served as the CEO of Alameda Research, the trading firm affiliated with the cryptocurrency exchange FTX and founded by FTX founder Sam Bankman-Fried. In 2022, she pleaded guilty to fraud, money laundering, and conspiracy charges related to her role at Alameda Research.

The bankruptcy of FTX, a Bahamas-based cryptocurrency exchange, began in November 2022. The collapse of FTX, caused by a spike in customer withdrawals that exposed an $8 billion hole in FTX’s accounts, served as the impetus for its bankruptcy. Prior to its collapse, FTX was the third-largest cryptocurrency exchange by volume and had over one million users.

John Samuel Trabucco is an American business executive. He was co-CEO of Alameda Research, a defunct quantitative trading firm founded by Sam Bankman-Fried before FTX. Caroline Ellison was Alameda's other co-CEO. Trabucco stepped down from Alameda in August 2022, leaving Ellison as sole CEO until its bankruptcy along with FTX three months later.

<span class="mw-page-title-main">ChatGPT</span> AI chatbot developed by OpenAI

ChatGPT is a chatbot developed by OpenAI and launched on November 30, 2022. Based on a large language model, it enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was initially launched on March 14, 2023, and has been made publicly available via the paid chatbot product ChatGPT Plus, and via OpenAI's API. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

In machine learning, reinforcement learning from human feedback (RLHF), including reinforcement learning from human preferences, is a technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning (RL) through an optimization algorithm like Proximal Policy Optimization. The reward model is trained in advance to the policy being optimized to predict if a given output is good or bad. RLHF can improve the robustness and exploration of RL agents, especially when the reward function is sparse or noisy.

<span class="mw-page-title-main">AI boom</span> Rapid progress in artificial intelligence

The AI boom, or AI spring, is the ongoing period of rapid progress in the field of artificial intelligence. Prominent examples include protein folding prediction and generative AI, led by laboratories including Google DeepMind and OpenAI.

Pause Giant AI Experiments: An Open Letter is the title of a letter published by the Future of Life Institute in March 2023. The letter calls "all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4", citing risks such as AI-generated propaganda, extreme automation of jobs, human obsolescence, and a society-wide loss of control. It received more than 20,000 signatures, including academic AI researchers and industry CEOs such as Yoshua Bengio, Stuart Russell, Elon Musk, Steve Wozniak and Yuval Noah Harari.

Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests. He formerly led the language model alignment team at OpenAI and became founder and head of the non-profit Alignment Research Center (ARC), which works on theoretical AI alignment and evaluations of machine learning models. In 2023, Christiano was named as one of the TIME 100 Most Influential People in AI.

References

  1. MacAskill, William (2022-08-16). "How Future Generations Will Remember Us". The Atlantic. Retrieved 2023-04-23.
  2. Klein, Ezra (2023-03-12). "This Changes Everything". The New York Times. ISSN   0362-4331 . Retrieved 2023-04-30.
  3. Piper, Kelsey (2023-03-29). "How to test what an AI model can — and shouldn't — do". Vox. Retrieved 2023-04-30.
  4. Christiano, Paul (2021-04-26). "Announcing the Alignment Research Center". Medium. Retrieved 2023-04-16.
  5. Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved 2023-04-16.
  6. "Alignment Research Center". Alignment Research Center. Retrieved 2023-04-16.
  7. Pandey, Mohit (2023-03-17). "Stop Questioning OpenAI's Open-Source Policy". Analytics India Magazine. Retrieved 2023-04-23.
  8. GPT-4 System Card (PDF), OpenAI, March 23, 2023, retrieved 2023-04-16
  9. Edwards, Benj (2023-03-15). "OpenAI checked to see whether GPT-4 could take over the world". Ars Technica. Retrieved 2023-04-30.
  10. "Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude". evals.alignment.org. Alignment Research Center. 17 March 2023. Retrieved 2023-04-16.
  11. Cox, Joseph (March 15, 2023). "GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human". Vice News Motherboard. Retrieved 2023-04-16.
  12. Burke, Cameron (March 20, 2023). "'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'". Yahoo Finance. Retrieved 2023-04-30.
  13. "Alignment Research Center — General Support". Open Philanthropy. 2022-06-14. Retrieved 2023-04-16.
  14. Wallerstein, Eric (2023-01-07). "FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations". Wall Street Journal. ISSN   0099-9660 . Retrieved 2023-04-30.