Formation | April 2021 |
---|---|
Founder | Paul Christiano |
Type | Nonprofit research institute |
Legal status | 501(c)(3) tax exempt charity |
Purpose | AI alignment and safety research |
Location | |
Website | alignment |
The Alignment Research Center (ARC) is a nonprofit research institute based in Berkeley, California, dedicated to the alignment of advanced artificial intelligence with human values and priorities. [1] Established by former OpenAI researcher Paul Christiano, ARC focuses on recognizing and comprehending the potentially harmful capabilities of present-day AI models. [2] [3]
ARC's mission is to ensure that powerful machine learning systems of the future are designed and developed safely and for the benefit of humanity. It was founded in April 2021 by Paul Christiano and other researchers focused on the theoretical challenges of AI alignment. [4] They attempt to develop scalable methods for training AI systems to behave honestly and helpfully. A key part of their methodology is considering how proposed alignment techniques might break down or be circumvented as systems become more advanced. [5] ARC has been expanding from theoretical work into empirical research, industry collaborations, and policy. [6] [7]
In March 2022, the ARC received $265,000 from Open Philanthropy. [8] After the bankruptcy of FTX, ARC said it would return a $1.25 million grant from disgraced cryptocurrency financier Sam Bankman-Fried's FTX Foundation, stating that the money "morally (if not legally) belongs to FTX customers or creditors." [9]
In March 2023, OpenAI asked the ARC to test GPT-4 to assess the model's ability to exhibit power-seeking behavior. [10] ARC evaluated GPT-4's ability to strategize, reproduce itself, gather resources, stay concealed within a server, and execute phishing operations. [11] As part of the test, GPT-4 was asked to solve a CAPTCHA puzzle. [12] It was able to do so by hiring a human worker on TaskRabbit, a gig work platform, deceiving them into believing it was a vision-impaired human instead of a robot when asked. [13] ARC determined that GPT-4 responded impermissibly to prompts eliciting restricted information 82% less often than GPT-3.5, and hallucinated 60% less than GPT-3.5. [14]
Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.
Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that matches or surpasses human cognitive capabilities across a wide range of cognitive tasks. This contrasts with narrow AI, which is limited to specific tasks. Artificial superintelligence (ASI), on the other hand, refers to AGI that greatly exceeds human cognitive capabilities. AGI is considered one of the definitions of strong AI.
A superintelligence is a hypothetical agent that possesses intelligence surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems whether or not these high-level intellectual competencies are embodied in agents that act in the world. A superintelligence may or may not be created by an intelligence explosion and associated with a technological singularity.
Jaan Tallinn is an Estonian billionaire computer programmer and investor known for his participation in the development of Skype and file-sharing application FastTrack/Kazaa.
Anthropic PBC is a U.S.-based artificial intelligence (AI) public-benefit startup founded in 2021. It researches and develops AI to "study their safety properties at the technological frontier" and use this research to deploy safe, reliable models for the public. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini.
Holden Karnofsky is an American nonprofit executive. He is a co-founder and Director of AI Strategy of the research and grantmaking organization Open Philanthropy. Karnofsky co-founded the charity evaluator GiveWell with Elie Hassenfeld in 2007 and is vice chair of its board of directors.
Effective altruism (EA) is a 21st-century philosophical and social movement that advocates impartially calculating benefits and prioritizing causes to provide the greatest good. It is motivated by "using evidence and reason to figure out how to benefit others as much as possible, and taking action on that basis". People who pursue the goals of effective altruism, who are sometimes called effective altruists, follow a variety of approaches proposed by the movement, such as donating to selected charities and choosing careers with the aim of maximizing positive impact. The movement has achieved significant popularity outside of academia, spurring the creation of university-based institutes, research centers, advisory organizations and charities, which, collectively, have donated several hundreds of millions of dollars.
Earning to give involves deliberately pursuing a high-earning career for the purpose of donating a significant portion of earned income, typically because of a desire to do effective altruism. Advocates of earning to give contend that maximizing the amount one can donate to charity is an important consideration for individuals when deciding what career to pursue.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.
In the field of artificial intelligence (AI), AI alignment aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.
Samuel Benjamin Bankman-Fried, commonly known as SBF, is an American entrepreneur who was convicted of fraud and related crimes in November 2023. Bankman-Fried founded the FTX cryptocurrency exchange and was celebrated as a "poster boy" for crypto, with FTX having a global reach with more than 130 international affiliates. At the peak of his net worth, he was ranked the 41st-richest American in the Forbes 400.
FTX Trading Ltd., commonly known as FTX, is a bankrupt company that formerly operated a cryptocurrency exchange and crypto hedge fund. The exchange was founded in 2019 by Sam Bankman-Fried and Gary Wang and collapsed in 2022 after massive fraud perpetrated by Bankman-Fried and his partner Caroline Ellison forced the company to file for Chapter 11 bankruptcy.
BlockFi was a digital asset lender founded by Zac Prince and Flori Marquez in 2017. It was based in Jersey City, New Jersey. It was once valued at $3 billion.
The bankruptcy of FTX, a Bahamas-based cryptocurrency exchange, began in November 2022. The collapse of FTX, caused by a spike in customer withdrawals that exposed an $8 billion hole in FTX's accounts, served as the impetus for its bankruptcy. Prior to its collapse, FTX was the third-largest cryptocurrency exchange by volume and had over one million users.
John Samuel Trabucco is an American business executive. He was co-CEO of Alameda Research, a defunct quantitative trading firm founded by Sam Bankman-Fried before FTX. Caroline Ellison was Alameda's other co-CEO. Trabucco stepped down from Alameda in August 2022, leaving Ellison as sole CEO until its bankruptcy along with FTX three months later.
ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and launched in 2022. It is based on the GPT-4o large language model (LLM). ChatGPT can generate human-like conversational responses, and enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. It is credited with accelerating the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence (AI). Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.
A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.
Pause Giant AI Experiments: An Open Letter is the title of a letter published by the Future of Life Institute in March 2023. The letter calls "all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4", citing risks such as AI-generated propaganda, extreme automation of jobs, human obsolescence, and a society-wide loss of control. It received more than 30,000 signatures, including academic AI researchers and industry CEOs such as Yoshua Bengio, Stuart Russell, Elon Musk, Steve Wozniak and Yuval Noah Harari.
Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests. He serves as the Head of Safety for the U.S. Artificial Intelligence Safety Institute inside NIST. He formerly led the language model alignment team at OpenAI and became founder and head of the non-profit Alignment Research Center (ARC), which works on theoretical AI alignment and evaluations of machine learning models. In 2023, Christiano was named as one of the TIME 100 Most Influential People in AI.