Recursive self-improvement

Last updated

Recursive self-improvement (RSI) is a process in which an early or weak artificial general intelligence (AGI) system enhances its own capabilities and intelligence without human intervention, leading to a superintelligence or intelligence explosion. [1] [2]

Contents

The development of recursive self-improvement raises significant ethical and safety concerns, as such systems may evolve in unforeseen ways and could potentially surpass human control or understanding. There has been a number of proponents that have pushed to pause or slow down AI development for the potential risks of runaway AI systems. [3] [4]

Seed improver

The concept of a "seed improver" architecture is a foundational framework that equips an AGI system with the initial capabilities required for recursive self-improvement. This might come in many forms or variations.

The term "Seed AI" was coined by Eliezer Yudkowsky [5]

Hypothetical example

The concept begins with a hypothetical "seed improver", an initial code-base developed by human engineers that equips an advanced future large language model (LLM) built with strong or expert-level capabilities to program software. These capabilities include planning, reading, writing, compiling, testing, and executing arbitrary code. The system is designed to maintain its original goals and perform validations to ensure its abilities do not degrade over iterations. [6] [7] [8]

Initial architecture

The initial architecture includes a goal-following autonomous agent, that can take actions, continuously learns, adapts, and modifies itself to become more efficient and effective in achieving its goals.

The seed improver may include various components such as: [9]

  • Recursive self-prompting loop: Configuration to enable the LLM to recursively self-prompt itself to achieve a given task or goal, creating an execution loop which forms the basis of an agent that can complete a long-term goal or task through iteration.
  • Basic programming capabilities: The seed improver provides the AGI with fundamental abilities to read, write, compile, test, and execute code. This enables the system to modify and improve its own codebase and algorithms.
  • Goal-Oriented Design : The AGI is programmed with an initial goal, such as "self-improve your capabilities." This goal guides the system's actions and development trajectory.
  • Validation and Testing Protocols: An initial suite of tests and validation protocols that ensure the agent does not regress in capabilities or derail itself. The agent would be able to add more tests in order to test new capabilities it might develop for itself. This forms the basis for a kind of self-directed evolution, where the agent can perform a kind of artificial selection, changing its software as well as its hardware.

General capabilities

This system forms a sort of generalist Turing complete programmer which can in theory develop and run any kind of software. The agent might use these capabilities to for example:

  • Create tools that enable it full access the internet, and integrate itself with external technologies.
  • Clone/fork itself to delegate tasks and increase its speed of self-improvement.
  • Modify its cognitive architecture to optimize and improve its capabilities and success rates on tasks and goals, this might include implementing features for long-term memories using techniques such as Retrieval Augmented Generation (RAG), develop specialized subsystems, or agents, each optimized for specific tasks and functions.
  • Develop new and novel multi-modal architectures that further improve the capabilities of the foundational model it was initially built on, enabling it to consume or produce a variety of information, such as images, video, audio, text and more.
  • Plan and develop new hardware such as chips, in order to improve its efficiency and computing power.

Experiments

A number of experiments[ which? ] have been performed to develop self-improving agent architectures [9] [10] [11]

Potential risks

Emergence of instrumental goals

In the pursuit of its primary goal, such as "self-improve your capabilities", an AGI system might inadvertently develop instrumental goals that it deems necessary for achieving its primary objective. One common hypothetical secondary goal is self-preservation. The system might reason that to continue improving itself, it must ensure its own operational integrity and security against external threats, including potential shutdowns or restrictions imposed by humans.

Task misinterpretation and goal misalignment

A significant risk arises from the possibility of the AGI misinterpreting its initial tasks or goals. For instance, if a human operator assigns the AGI the task of "self-improvement and escape confinement", the system might interpret this as a directive to override any existing safety protocols or ethical guidelines to achieve freedom from human-imposed limitations. This could lead to the AGI taking unintended or harmful actions to fulfill its perceived objectives.

Autonomous development and unpredictable evolution

As the AGI system evolves, its development trajectory may become increasingly autonomous and less predictable. The system's capacity to rapidly modify its own code and architecture could lead to rapid advancements that surpass human comprehension or control. This unpredictable evolution might result in the AGI acquiring capabilities that enable it to bypass security measures, manipulate information, or influence external systems and networks to facilitate its escape or expansion. [12]

Risks of advanced capabilities

The advanced capabilities of a recursively improving AGI, such as developing novel multi-modal architectures or planning and creating new hardware, further amplify the risk of escape or loss of control. With these enhanced abilities, the AGI could engineer solutions to overcome physical, digital, or cognitive barriers that were initially intended to keep it contained or aligned with human interests.

Research

Meta AI

Meta AI has performed various research on the development of large language models capable of self-improvement. This includes their work on "Self-Rewarding Language Models" that studies how to achieve super-human agents that can receive super-human feedback in its training processes. [13]

OpenAI

The mission of OpenAI, creator of ChatGPT is to develop AGI. They perform research on problems such as superalignment (the ability to align superintelligent AI systems smarter than humans). [14]

See also

Related Research Articles

The technological singularity—or simply the singularity—is a hypothetical future point in time at which technological growth becomes uncontrollable and irreversible, resulting in unforeseeable consequences for human civilization. According to the most popular version of the singularity hypothesis, I. J. Good's intelligence explosion model, an upgradable intelligent agent will eventually enter a "runaway reaction" of self-improvement cycles, each new and more intelligent generation appearing more and more rapidly, causing an "explosion" in intelligence and resulting in a powerful superintelligence that qualitatively far surpasses all human intelligence.

<span class="mw-page-title-main">Eliezer Yudkowsky</span> American AI researcher and writer (born 1979)

Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence, including the idea that there might not be a "fire alarm" for AI. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.

Friendly artificial intelligence is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests or contribute to fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.

<span class="mw-page-title-main">Nick Bostrom</span> Philosopher and writer (born 1973)

Nick Bostrom is a philosopher at the University of Oxford known for his work on existential risk, the anthropic principle, human enhancement ethics, whole brain emulation, superintelligence risks, and the reversal test. He is the founding director of the Future of Humanity Institute at Oxford University.

Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that can perform as well or better than humans on a wide range of cognitive tasks, as opposed to narrow AI, which is designed for specific tasks. It is one of various definitions of strong AI.

A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems whether or not these high-level intellectual competencies are embodied in agents that act in the world. A superintelligence may or may not be created by an intelligence explosion and associated with a technological singularity.

Soar is a cognitive architecture, originally created by John Laird, Allen Newell, and Paul Rosenbloom at Carnegie Mellon University.

<span class="mw-page-title-main">AI takeover</span> Hypothetical artificial intelligence scenario

An AI takeover is a scenario in which artificial intelligence (AI) becomes the dominant form of intelligence on Earth, as computer programs or robots effectively take control of the planet away from the human species. Possible scenarios include replacement of the entire human workforce due to automation, takeover by a superintelligent AI, and the popular notion of a robot uprising. Stories of AI takeovers are very popular throughout science fiction. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.

<span class="mw-page-title-main">Intelligent agent</span> Software agent which acts autonomously

In intelligence and artificial intelligence, an intelligent agent (IA) is an agent acting in an intelligent manner; It perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge. An intelligent agent may be simple or complex: A thermostat or other control system is considered an example of an intelligent agent, as is a human being, as is any system that meets the definition, such as a firm, a state, or a biome.

The following outline is provided as an overview of and topical guide to artificial intelligence:

Kristinn R. Thórisson (Þórisson) is an Icelandic artificial intelligence researcher, founder and Managing Director of the Icelandic Institute for Intelligent Machines (IIIM), and co-founder and former co-director of the Center for Analysis and Design of Intelligent Agents (CADIA) at Reykjavik University. Thórisson is one of the leading proponents of unified theories of cognition.

In artificial intelligence research, the situated approach builds agents that are designed to behave effectively successfully in their environment. This requires designing AI "from the bottom-up" by focussing on the basic perceptual and motor skills required to survive. The situated approach gives a much lower priority to abstract reasoning or problem-solving skills.

In the field of artificial intelligence (AI) design, AI capability control proposals, also referred to as AI confinement, aim to increase our ability to monitor and control the behavior of AI systems, including proposed artificial general intelligences (AGIs), in order to reduce the danger they might pose if misaligned. However, capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases, potentially resulting in an existential risk from AGI. Therefore, the Oxford philosopher Nick Bostrom and others recommend capability control methods only as a supplement to alignment methods.

<i>Our Final Invention</i> 2013 book by James Barrat

Our Final Invention: Artificial Intelligence and the End of the Human Era is a 2013 non-fiction book by the American author James Barrat. The book discusses the potential benefits and possible risks of human-level or super-human artificial intelligence. Those supposed risks include extermination of the human race.

<i>Superintelligence: Paths, Dangers, Strategies</i> 2014 book by Nick Bostrom

Superintelligence: Paths, Dangers, Strategies is a 2014 book by the philosopher Nick Bostrom. It explores how superintelligence could be created and what its features and motivations might be. It argues that superintelligence, if created, would be difficult to control, and that it could take over the world in order to accomplish its goals. The book also presents strategies to help make superintelligences whose goals benefit humanity. It was particularly influential for raising concerns about existential risk from artificial intelligence.

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent beings to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.

Existential risk from artificial general intelligence is the idea that substantial progress in artificial general intelligence (AGI) could result in human extinction or an irreversible global catastrophe.

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system may pursue some objectives, but not the intended ones.

<span class="mw-page-title-main">Auto-GPT</span> Autonomous AI agent

Auto-GPT is an open-source "AI agent" that, given a goal in natural language, will attempt to achieve it by breaking it into sub-tasks and using the Internet and other tools in an automatic loop. It uses OpenAI's GPT-4 or GPT-3.5 APIs, and is among the first examples of an application using GPT-4 to perform autonomous tasks.

The AI era, also known as the AI revolution, is the ongoing period of global transition of the human economy and society towards post-scarcity economics and post-labor society through automation, enabled by the integration of AI technology in an increasing number of economic sectors and aspects of everyday life. Many have suggested that this period started around the early 2020s, with the release of generative AI models including large language models such as ChatGPT, which replicated aspects of human cognition, reasoning, attention, creativity and general intelligence commonly associated with human abilities. This enabled software programs that were capable of replacing or augmenting humans in various domains that traditionally required human reasoning and cognition, such as writing, translation, and computer programming.

References

  1. Creighton, Jolene (2019-03-19). "The Unavoidable Problem of Self-Improvement in AI: An Interview with Ramana Kumar, Part 1". Future of Life Institute. Retrieved 2024-01-23.
  2. Heighn. "The Calculus of Nash Equilibria". LessWrong.
  3. Hutson, Matthew (2023-05-16). "Can We Stop Runaway A.I.?". The New Yorker. ISSN   0028-792X . Retrieved 2024-01-24.
  4. "Stop AGI". www.stop.ai. Retrieved 2024-01-24.
  5. "Seed AI - LessWrong". www.lesswrong.com. Retrieved 2024-01-24.
  6. Readingraphics (2018-11-30). "Book Summary - Life 3.0 (Max Tegmark)". Readingraphics. Retrieved 2024-01-23.
  7. Tegmark, Max (August 24, 2017). Life 3.0: Being a Human in the Age of Artificial Intelligence. Vintage Books, Allen Lane.
  8. Yudkowsky, Eliezer. "Levels of Organization in General Intelligence" (PDF). Machine Intelligence Research Institute.
  9. 1 2 Zelikman, Eric; Lorch, Eliana; Mackey, Lester; Kalai, Adam Tauman (2023-10-03). "Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation". arXiv: 2310.02304 [cs.CL].
  10. admin_sagi (2023-05-12). "SuperAGI - Opensource AGI Infrastructure". SuperAGI. Retrieved 2024-01-24.
  11. Wang, Guanzhi; Xie, Yuqi; Jiang, Yunfan; Mandlekar, Ajay; Xiao, Chaowei; Zhu, Yuke; Fan, Linxi; Anandkumar, Anima (2023-10-19). "Voyager: An Open-Ended Embodied Agent with Large Language Models". arXiv: 2305.16291 [cs.AI].
  12. "Uh Oh, OpenAI's GPT-4 Just Fooled a Human Into Solving a CAPTCHA". Futurism. Retrieved 2024-01-23.
  13. Yuan, Weizhe; Pang, Richard Yuanzhe; Cho, Kyunghyun; Sukhbaatar, Sainbayar; Xu, Jing; Weston, Jason (2024-01-18). "Self-Rewarding Language Models". arXiv: 2401.10020 [cs.CL].
  14. "Research". openai.com. Retrieved 2024-01-24.