Center for AI Safety

Last updated
Center for AI Safety
Formation2022
Headquarters San Francisco, California
Director
Dan Hendrycks
Website https://www.safe.ai/

The Center for AI Safety (CAIS) is a nonprofit organization based in San Francisco, that promotes the safe development and deployment of artificial intelligence (AI). CAIS's work encompasses research in technical AI safety and AI ethics, advocacy, and support to grow the AI safety research field. [1] [2]

Contents

In May 2023, CAIS published a statement on AI risk of extinction signed by hundreds of professors of AI, leaders of major AI companies, and other public figures. [3] [4] [5] [6] [7]

Research

CAIS researchers published "An Overview of Catastrophic AI Risks", which details risk scenarios and risk mitigation strategies. Risks described include the use of AI in autonomous warfare or for engineering pandemics, as well as AI capabilities for deception and hacking. [8] [9] Another work, conducted in collaboration with researchers at Carnegie Mellon University, described an automated way to discover adversarial attacks of large language models that bypass safety measures, highlighting the inadequacy of current safety systems. [10] [11]

Activities

Other initiatives include a compute cluster to support AI safety research, an online course titled "Intro to ML Safety", and a fellowship for philosophy professors to address conceptual problems. [9]

See also

Related Research Articles

<span class="mw-page-title-main">Eliezer Yudkowsky</span> American AI researcher and writer (born 1979)

Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence, including the idea that there might not be a "fire alarm" for AI. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.

<span class="mw-page-title-main">AI takeover</span> Hypothetical artificial intelligence scenario

An AI takeover is a hypothetical scenario in which artificial intelligence (AI) becomes the dominant form of intelligence on Earth, as computer programs or robots effectively take control of the planet away from the human species. Possible scenarios include replacement of the entire human workforce, takeover by a superintelligent AI, and the popular notion of a robot uprising. Stories of AI takeovers are very popular throughout science fiction. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.

<span class="mw-page-title-main">Human extinction</span> Hypothetical end of the human species

Human extinction is the hypothetical end of the human species, either by population decline due to extraneous natural causes, such as an asteroid impact or large-scale volcanism, or via anthropogenic destruction (self-extinction), for example by sub-replacement fertility.

Anthropic PBC is an American artificial intelligence (AI) startup company, founded by former members of OpenAI. Anthropic develops general AI systems and large language models. It is a public-benefit corporation, and has been connected to the effective altruism movement.

The ethics of artificial intelligence is the branch of the ethics of technology specific to artificially intelligent systems. It is sometimes divided into a concern with the moral behavior of humans as they design, make, use and treat artificially intelligent systems, and a concern with the behavior of machines, in machine ethics.

<span class="mw-page-title-main">Global catastrophic risk</span> Potentially harmful worldwide events

A global catastrophic risk or a doomsday scenario is a hypothetical event that could damage human well-being on a global scale, even endangering or destroying modern civilization. An event that could cause human extinction or permanently and drastically curtail humanity's existence or potential is known as an "existential risk."

<span class="mw-page-title-main">Eric Horvitz</span> American computer scientist, and Technical Fellow at Microsoft

Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.

The Centre for the Study of Existential Risk (CSER) is a research centre at the University of Cambridge, intended to study possible extinction-level threats posed by present or future technology. The co-founders of the centre are Huw Price, Martin Rees and Jaan Tallinn.

<span class="mw-page-title-main">Future of Life Institute</span> International nonprofit research institute

The Future of Life Institute (FLI) is a nonprofit organization with the stated goal of steering transformative technology towards benefiting life and away from large-scale risks facing humanity, including existential risk from advanced artificial intelligence (AI). FLI's work includes grantmaking, educational outreach, and advocacy within the United Nations, United States government, and European Union institutions.

Existential risk from artificial general intelligence is the idea that substantial progress in artificial general intelligence (AGI) could result in human extinction or an irreversible global catastrophe.

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system pursues some objectives, but not the intended ones.

<span class="mw-page-title-main">Partnership on AI</span> Nonprofit coalition

Partnership on Artificial Intelligence to Benefit People and Society, otherwise known as Partnership on AI, is a nonprofit coalition committed to the responsible use of artificial intelligence. Coming into inception in September 2016, PAI grouped together members from over 90 companies and non-profits in order to explore best practice recommendations for the tech community. Since its founding, Partnership on AI has experienced plethora of change with influential moments, comprehensive principles and missions, and generating more relevancy by every passing day.

The artificial intelligence (AI) industry in China is a rapidly developing multi-billion dollar industry. As of 2021, the artificial intelligence market is worth about RMB 150 billion, and is projected to reach RMB 400 billion by 2025. The roots of China's AI development started in the late 1970s following economic reforms emphasizing science and technology as the country's primary productive force.

The regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating artificial intelligence (AI); it is therefore related to the broader regulation of algorithms. The regulatory and policy landscape for AI is an emerging issue in jurisdictions globally, including in the European Union and in supra-national bodies like the IEEE, OECD and others. Since 2016, a wave of AI ethics guidelines have been published in order to maintain social control over the technology. Regulation is considered necessary to both encourage AI and manage associated risks. In addition to regulation, AI-deploying organizations need to play a central role in creating and deploying trustworthy AI in line with the principles of trustworthy AI, and take accountability to mitigate the risks. Regulation of AI through mechanisms such as review boards can also be seen as social means to approach the AI control problem.

<span class="mw-page-title-main">Suffering risks</span> Risks of astronomical suffering

Suffering risks, or s-risks, are risks involving an astronomical amount of suffering, much more than all of the suffering having occurred on Earth. They are sometimes categorized as a subclass of existential risks.

AI safety is an interdisciplinary field concerned with preventing accidents, misuse, or other harmful consequences that could result from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to make AI systems moral and beneficial, and AI safety encompasses technical problems including monitoring systems for risks and making them highly reliable. Beyond AI research, it involves developing norms and policies that promote safety.

Pause Giant AI Experiments: An Open Letter is the title of a letter published by the Future of Life Institute in March 2023. The letter calls "all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4", citing risks such as AI-generated propaganda, extreme automation of jobs, human obsolescence, and a society-wide loss of control. It received more than 20,000 signatures, including academic AI researchers and industry CEOs such as Yoshua Bengio, Stuart Russell, Elon Musk, Steve Wozniak and Yuval Noah Harari.

Dan Hendrycks is an American machine learning researcher. He serves as the director of the Center for AI Safety.

On May 30, 2023, hundreds of artificial intelligence experts and other notable figures signed the following short Statement on AI Risk:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

<span class="mw-page-title-main">AI Safety Summit</span> 2023 global summit on AI safety

The AI Safety Summit was an international conference discussing the safety and regulation of artificial intelligence. It was held at Bletchley Park, Milton Keynes, United Kingdom, on 1–2 November 2023. It was the first ever global summit on artificial intelligence, and is planned to become a recurring event.

References

  1. "AI poses risk of extinction, tech leaders warn in open letter. Here's why alarm is spreading". USA TODAY. 31 May 2023.
  2. "Our Mission | CAIS". www.safe.ai. Retrieved 2023-04-13.
  3. Center for AI Safety's Hendrycks on AI Risks, Bloomberg Technology, 31 May 2023
  4. Roose, Kevin (2023-05-30). "A.I. Poses 'Risk of Extinction,' Industry Leaders Warn". The New York Times. ISSN   0362-4331 . Retrieved 2023-06-03.
  5. "Artificial intelligence warning over human extinction – all you need to know". The Independent. 2023-05-31. Retrieved 2023-06-03.
  6. Lomas, Natasha (2023-05-30). "OpenAI's Altman and other AI giants back warning of advanced AI as 'extinction' risk". TechCrunch. Retrieved 2023-06-03.
  7. Castleman, Terry (2023-05-31). "Prominent AI leaders warn of 'risk of extinction' from new technology". Los Angeles Times. Retrieved 2023-06-03.
  8. Hendrycks, Dan; Mazeika, Mantas; Woodside, Thomas (2023). "An Overview of Catastrophic AI Risks". arXiv: 2306.12001 .{{cite journal}}: Cite journal requires |journal= (help)
  9. 1 2 Scharfenberg, David (July 6, 2023). "Dan Hendrycks from the Center for AI Safety hopes he can prevent a catastrophe". The Boston Globe. Retrieved 2023-07-09.
  10. Metz, Cade (2023-07-27). "Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots". The New York Times . Retrieved 2023-07-27.
  11. "Universal and Transferable Attacks on Aligned Language Models". llm-attacks.org. Retrieved 2023-07-27.