Center for AI Safety

Last updated
Center for AI Safety
Formation2022
Headquarters San Francisco, California
Director
Dan Hendrycks
Website www.safe.ai

The Center for AI Safety (CAIS) is a nonprofit organization based in San Francisco, that promotes the safe development and deployment of artificial intelligence (AI). CAIS's work encompasses research in technical AI safety and AI ethics, advocacy, and support to grow the AI safety research field. [1] [2]

Contents

In May 2023, CAIS published a statement on AI risk of extinction signed by hundreds of professors of AI, leaders of major AI companies, and other public figures. [3] [4] [5] [6] [7]

Research

CAIS researchers published "An Overview of Catastrophic AI Risks", which details risk scenarios and risk mitigation strategies. Risks described include the use of AI in autonomous warfare or for engineering pandemics, as well as AI capabilities for deception and hacking. [8] [9] Another work, conducted in collaboration with researchers at Carnegie Mellon University, described an automated way to discover adversarial attacks of large language models that bypass safety measures, highlighting the inadequacy of current safety systems. [10] [11]

Activities

Other initiatives include a compute cluster to support AI safety research, an online course titled "Intro to ML Safety", and a fellowship for philosophy professors to address conceptual problems. [9]

The Center for AI Safety Action Fund is a sponsor of the California bill SB 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act. [12]

See also

Related Research Articles

<span class="mw-page-title-main">Eliezer Yudkowsky</span> American AI researcher and writer (born 1979)

Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.

Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that matches or surpasses human capabilities across a wide range of cognitive tasks. This is in contrast to narrow AI, which is designed for specific tasks. AGI is considered one of various definitions of strong AI.

<span class="mw-page-title-main">AI takeover</span> Hypothetical outcome of artificial intelligence

An AI takeover is an imagined scenario in which artificial intelligence (AI) emerges as the dominant form of intelligence on Earth and computer programs or robots effectively take control of the planet away from the human species, which relies on human intelligence. Stories of AI takeovers remain popular throughout science fiction, but recent advancements have made the threat more real. Possible scenarios include replacement of the entire human workforce due to automation, takeover by a superintelligent AI (ASI), and the notion of a robot uprising. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.

<span class="mw-page-title-main">Human extinction</span> Hypothetical end of the human species

Human extinction is the hypothetical end of the human species, either by population decline due to extraneous natural causes, such as an asteroid impact or large-scale volcanism, or via anthropogenic destruction (self-extinction), for example by sub-replacement fertility.

Anthropic PBC is a U.S.-based artificial intelligence (AI) startup public-benefit company, founded in 2021. It researches and develops AI to "study their safety properties at the technological frontier" and use this research to deploy safe, reliable models for the public. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini.

The ethics of artificial intelligence covers a broad range of topics within the field that are considered to have particular ethical stakes. This includes algorithmic biases, fairness, automated decision-making, accountability, privacy, and regulation.

The Centre for the Study of Existential Risk (CSER) is a research centre at the University of Cambridge, intended to study possible extinction-level threats posed by present or future technology. The co-founders of the centre are Huw Price, Martin Rees and Jaan Tallinn.

<span class="mw-page-title-main">Future of Life Institute</span> International nonprofit research institute

The Future of Life Institute (FLI) is a nonprofit organization which aims to steer transformative technology towards benefiting life and away from large-scale risks, with a focus on existential risk from advanced artificial intelligence (AI). FLI's work includes grantmaking, educational outreach, and advocacy within the United Nations, United States government, and European Union institutions.

Existential risk from artificial general intelligence refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system may pursue some objectives, but not the intended ones.

<span class="mw-page-title-main">Partnership on AI</span> Nonprofit coalition

Partnership on Artificial Intelligence to Benefit People and Society, otherwise known as Partnership on AI, is a nonprofit coalition committed to the responsible use of artificial intelligence. Coming into inception in September 2016, PAI grouped together members from over 90 companies and non-profits in order to explore best practice recommendations for the tech community.

Regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating artificial intelligence (AI). It is part of the broader regulation of algorithms. The regulatory and policy landscape for AI is an emerging issue in jurisdictions worldwide, including for international organizations without direct enforcement power like the IEEE or the OECD.

<span class="mw-page-title-main">Suffering risks</span> Risks of astronomical suffering

Suffering risks, or s-risks, are risks involving an astronomical amount of suffering; much more than all of the suffering that has occurred on Earth thus far. They are sometimes categorized as a subclass of existential risks.

AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to ensure AI systems are moral and beneficial, as well as monitoring AI systems for risks and enhancing their reliability. The field is particularly concerned with existential risks posed by advanced AI models.

Pause Giant AI Experiments: An Open Letter is the title of a letter published by the Future of Life Institute in March 2023. The letter calls "all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4", citing risks such as AI-generated propaganda, extreme automation of jobs, human obsolescence, and a society-wide loss of control. It received more than 20,000 signatures, including academic AI researchers and industry CEOs such as Yoshua Bengio, Stuart Russell, Elon Musk, Steve Wozniak and Yuval Noah Harari.

Dan Hendrycks is an American machine learning researcher. He serves as the director of the Center for AI Safety.

On May 30, 2023, hundreds of artificial intelligence experts and other notable figures signed the following short Statement on AI Risk:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

Discussions on regulation of artificial intelligence in the United States have included topics such as the timeliness of regulating AI, the nature of the federal regulatory framework to govern and promote AI, including what agency should lead, the regulatory and governing powers of that agency, and how to update regulations in the face of rapidly changing technology, as well as the roles of state governments and courts.

P(doom) is a term in AI safety that refers to the probability of catastrophic outcomes (or "doom") as a result of artificial intelligence. The exact outcomes in question differ from one prediction to another, but generally allude to the existential risk from artificial general intelligence.

The Safe and Secure Innovation for Frontier Artificial Intelligence Models Act, or SB 1047, is a 2024 California bill with the policy goal of reducing the potential risks posed by highly advanced, general-purpose foundation artificial intelligence models, also known as frontier AI models. The bill would also establish CalCompute, a public cloud computing cluster for startups, researchers and community groups.

References

  1. "AI poses risk of extinction, tech leaders warn in open letter. Here's why alarm is spreading". USA TODAY. 31 May 2023.
  2. "Our Mission | CAIS". www.safe.ai. Retrieved 2023-04-13.
  3. Center for AI Safety's Hendrycks on AI Risks, Bloomberg Technology, 31 May 2023
  4. Roose, Kevin (2023-05-30). "A.I. Poses 'Risk of Extinction,' Industry Leaders Warn". The New York Times. ISSN   0362-4331 . Retrieved 2023-06-03.
  5. "Artificial intelligence warning over human extinction – all you need to know". The Independent. 2023-05-31. Retrieved 2023-06-03.
  6. Lomas, Natasha (2023-05-30). "OpenAI's Altman and other AI giants back warning of advanced AI as 'extinction' risk". TechCrunch. Retrieved 2023-06-03.
  7. Castleman, Terry (2023-05-31). "Prominent AI leaders warn of 'risk of extinction' from new technology". Los Angeles Times. Retrieved 2023-06-03.
  8. Hendrycks, Dan; Mazeika, Mantas; Woodside, Thomas (2023). "An Overview of Catastrophic AI Risks". arXiv: 2306.12001 [cs.CY].
  9. 1 2 Scharfenberg, David (July 6, 2023). "Dan Hendrycks from the Center for AI Safety hopes he can prevent a catastrophe". The Boston Globe. Retrieved 2023-07-09.
  10. Metz, Cade (2023-07-27). "Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots". The New York Times . Retrieved 2023-07-27.
  11. "Universal and Transferable Attacks on Aligned Language Models". llm-attacks.org. Retrieved 2023-07-27.
  12. "Senator Wiener Introduces Legislation to Ensure Safe Development of Large-Scale Artificial Intelligence Systems and Support AI Innovation in California". Senator Scott Wiener. 2024-02-08. Retrieved 2024-06-28.