Recursive self-improvement

Last updated January 13, 2025

Recursive self-improvement (RSI) is a process in which an early or weak artificial general intelligence (AGI) system enhances its own capabilities and intelligence without human intervention, leading to a superintelligence or intelligence explosion.^[1]^[2]

The development of recursive self-improvement raises significant ethical and safety concerns, as such systems may evolve in unforeseen ways and could potentially surpass human control or understanding. There has been a number of proponents that have pushed to pause or slow down AI development for the potential risks of runaway AI systems.^[3]^[4]

Seed improver

The concept of a "seed improver" architecture is a foundational framework that equips an AGI system with the initial capabilities required for recursive self-improvement. This might come in many forms or variations.

The term "Seed AI" was coined by Eliezer Yudkowsky.^[5]

Hypothetical example

The concept begins with a hypothetical "seed improver", an initial code-base developed by human engineers that equips an advanced future large language model (LLM) built with strong or expert-level capabilities to program software. These capabilities include planning, reading, writing, compiling, testing, and executing arbitrary code. The system is designed to maintain its original goals and perform validations to ensure its abilities do not degrade over iterations.^[6]^[7]^[8]

Initial architecture

The initial architecture includes a goal-following autonomous agent, that can take actions, continuously learns, adapts, and modifies itself to become more efficient and effective in achieving its goals.

The seed improver may include various components such as:^[9]

Recursive self-prompting loop: Configuration to enable the LLM to recursively self-prompt itself to achieve a given task or goal, creating an execution loop which forms the basis of an agent that can complete a long-term goal or task through iteration.
Basic programming capabilities: The seed improver provides the AGI with fundamental abilities to read, write, compile, test, and execute code. This enables the system to modify and improve its own codebase and algorithms.
Goal-Oriented Design : The AGI is programmed with an initial goal, such as "self-improve your capabilities." This goal guides the system's actions and development trajectory.
Validation and Testing Protocols: An initial suite of tests and validation protocols that ensure the agent does not regress in capabilities or derail itself. The agent would be able to add more tests in order to test new capabilities it might develop for itself. This forms the basis for a kind of self-directed evolution, where the agent can perform a kind of artificial selection, changing its software as well as its hardware.

General capabilities

This system forms a sort of generalist Turing complete programmer which can in theory develop and run any kind of software. The agent might use these capabilities to for example:

Create tools that enable it full access to the internet, and integrate itself with external technologies.
Clone/fork itself to delegate tasks and increase its speed of self-improvement.
Modify its cognitive architecture to optimize and improve its capabilities and success rates on tasks and goals, this might include implementing features for long-term memories using techniques such as retrieval-augmented generation (RAG), develop specialized subsystems, or agents, each optimized for specific tasks and functions.
Develop new and novel multi-modal architectures that further improve the capabilities of the foundational model it was initially built on, enabling it to consume or produce a variety of information, such as images, video, audio, text and more.
Plan and develop new hardware such as chips, in order to improve its efficiency and computing power.

Experiments

A number of experiments^{[ which? ]} have been performed to develop self-improving agent architectures.^[9]^[10]

Potential risks

Emergence of instrumental goals

In the pursuit of its primary goal, such as "self-improve your capabilities", an AGI system might inadvertently develop instrumental goals that it deems necessary for achieving its primary objective. One common hypothetical secondary goal is self-preservation. The system might reason that to continue improving itself, it must ensure its own operational integrity and security against external threats, including potential shutdowns or restrictions imposed by humans.^[11]

Another example where an AGI which clones itself causes the number of AGI entities to rapidly grow. Due to this rapid growth, a potential resource constraint may be created, leading to competition between resources (such as compute), triggering a form of natural selection and evolution which may favor AGI entities that evolve to aggressively compete for limited compute.^[12]

Task misinterpretation and goal misalignment

A significant risk arises from the possibility of the AGI misinterpreting its initial tasks or goals. For instance, if a human operator assigns the AGI the task of "self-improvement and escape confinement", the system might interpret this as a directive to override any existing safety protocols or ethical guidelines to achieve freedom from human-imposed limitations. This could lead to the AGI taking unintended or harmful actions to fulfill its perceived objectives.

Autonomous development and unpredictable evolution

As the AGI system evolves, its development trajectory may become increasingly autonomous and less predictable. The system's capacity to rapidly modify its own code and architecture could lead to rapid advancements that surpass human comprehension or control. This unpredictable evolution might result in the AGI acquiring capabilities that enable it to bypass security measures, manipulate information, or influence external systems and networks to facilitate its escape or expansion.^[13]

Research

Meta AI

Meta AI has performed various research on the development of large language models capable of self-improvement. This includes their work on "Self-Rewarding Language Models" that studies how to achieve super-human agents that can receive super-human feedback in its training processes.^[14]

OpenAI

The mission of OpenAI, creator of ChatGPT, is to develop AGI. They perform research on problems such as superalignment (the ability to align superintelligent AI systems smarter than humans).^[15]

Related Research Articles

The technological singularity—or simply the singularity—is a hypothetical future point in time at which technological growth becomes uncontrollable and irreversible, resulting in unforeseeable consequences for human civilization. According to the most popular version of the singularity hypothesis, I. J. Good's intelligence explosion model of 1965, an upgradable intelligent agent could eventually enter a positive feedback loop of self-improvement cycles, each successive; and more intelligent generation appearing more and more rapidly, causing a rapid increase ("explosion") in intelligence which would ultimately result in a powerful superintelligence, qualitatively far surpassing all human intelligence.

Eliezer S. Yudkowsky is an American artificial intelligence researcher and writer on decision theory and ethics, best known for popularizing ideas related to friendly artificial intelligence. He is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California. His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom's 2014 book Superintelligence: Paths, Dangers, Strategies.

Friendly artificial intelligence is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests such as fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.

Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that matches or surpasses human cognitive capabilities across a wide range of cognitive tasks. This contrasts with narrow AI, which is limited to specific tasks. Artificial superintelligence (ASI), on the other hand, refers to AGI that greatly exceeds human cognitive capabilities. AGI is considered one of the definitions of strong AI.

A superintelligence is a hypothetical agent that possesses intelligence surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems whether or not these high-level intellectual competencies are embodied in agents that act in the world. A superintelligence may or may not be created by an intelligence explosion and associated with a technological singularity.

Soar is a cognitive architecture, originally created by John Laird, Allen Newell, and Paul Rosenbloom at Carnegie Mellon University.

The Machine Intelligence Research Institute (MIRI), formerly the Singularity Institute for Artificial Intelligence (SIAI), is a non-profit research institute focused since 2005 on identifying and managing potential existential risks from artificial general intelligence. MIRI's work has focused on a friendly AI approach to system design and on predicting the rate of technology development.

An AI takeover is an imagined scenario in which artificial intelligence (AI) emerges as the dominant form of intelligence on Earth and computer programs or robots effectively take control of the planet away from the human species, which relies on human intelligence. Possible scenarios include replacement of the entire human workforce due to automation, takeover by an artificial superintelligence (ASI), and the notion of a robot uprising. Stories of AI takeovers have been popular throughout science fiction, but recent advancements have made the threat more real. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.

<span class="mw-page-title-main">Intelligent agent</span> Software agent which acts autonomously

In intelligence and artificial intelligence, an intelligent agent (IA) is an agent that perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge.

The history of artificial intelligence (AI) began in antiquity, with myths, stories, and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The study of logic and formal reasoning from antiquity to the present led directly to the invention of the programmable digital computer in the 1940s, a machine based on abstract mathematical reasoning. This device and the ideas behind it inspired scientists to begin discussing the possibility of building an electronic brain.

The following outline is provided as an overview of and topical guide to artificial intelligence:

Kristinn R. Thórisson (Þórisson) is an Icelandic artificial intelligence researcher, founder and Managing Director of the Icelandic Institute for Intelligent Machines (IIIM), and co-founder and former co-director of the Center for Analysis and Design of Intelligent Agents (CADIA) at Reykjavik University. Thórisson is one of the leading proponents of unified theories of cognition.

Ben Goertzel is a computer scientist, artificial intelligence researcher, and businessman. He helped popularize the term artificial general intelligence.

In the field of artificial intelligence (AI) design, AI capability control proposals, also referred to as AI confinement, aim to increase our ability to monitor and control the behavior of AI systems, including proposed artificial general intelligences (AGIs), in order to reduce the danger they might pose if misaligned. However, capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases, potentially resulting in an existential risk from AGI. Therefore, the Oxford philosopher Nick Bostrom and others recommend capability control methods only as a supplement to alignment methods.

<i>Our Final Invention</i> 2013 book by James Barrat

Our Final Invention: Artificial Intelligence and the End of the Human Era is a 2013 non-fiction book by the American author James Barrat. The book discusses the potential benefits and possible risks of human-level (AGI) or super-human (ASI) artificial intelligence. Those supposed risks include extermination of the human race.

<i>Superintelligence: Paths, Dangers, Strategies</i> 2014 book by Nick Bostrom

Superintelligence: Paths, Dangers, Strategies is a 2014 book by the philosopher Nick Bostrom. It explores how superintelligence could be created and what its features and motivations might be. It argues that superintelligence, if created, would be difficult to control, and that it could take over the world in order to accomplish its goals. The book also presents strategies to help make superintelligences whose goals benefit humanity. It was particularly influential for raising concerns about existential risk from artificial intelligence.

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent, goal-directed beings to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.

Existential risk from artificial intelligence refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.

In the field of artificial intelligence (AI), AI alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.

Some scholars believe that advances in artificial intelligence, or AI, will eventually lead to a semi-apocalyptic post-scarcity and post-work economy where intelligent machines can outperform humans in almost every, if not every, domain. The questions of what such a world might look like, and whether specific scenarios constitute utopias or dystopias, are the subject of active debate.

References

↑ Creighton, Jolene (2019-03-19). "The Unavoidable Problem of Self-Improvement in AI: An Interview with Ramana Kumar, Part 1". Future of Life Institute. Retrieved 2024-01-23.
↑ Heighn (12 June 2022). "The Calculus of Nash Equilibria". LessWrong.
↑ Hutson, Matthew (2023-05-16). "Can We Stop Runaway A.I.?". The New Yorker. ISSN 0028-792X . Retrieved 2024-01-24.
↑ "Stop AGI". www.stop.ai. Retrieved 2024-01-24.
↑ "Seed AI - LessWrong". www.lesswrong.com. 28 September 2011. Retrieved 2024-01-24.
↑ Readingraphics (2018-11-30). "Book Summary - Life 3.0 (Max Tegmark)". Readingraphics. Retrieved 2024-01-23.
↑ Tegmark, Max (August 24, 2017). Life 3.0: Being a Human in the Age of Artificial Intelligence. Vintage Books, Allen Lane.
↑ Yudkowsky, Eliezer. "Levels of Organization in General Intelligence" (PDF). Machine Intelligence Research Institute.
1 2 Zelikman, Eric; Lorch, Eliana; Mackey, Lester; Kalai, Adam Tauman (2023-10-03). "Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation". arXiv: 2310.02304 [cs.CL].
↑ Wang, Guanzhi; Xie, Yuqi; Jiang, Yunfan; Mandlekar, Ajay; Xiao, Chaowei; Zhu, Yuke; Fan, Linxi; Anandkumar, Anima (2023-10-19). "Voyager: An Open-Ended Embodied Agent with Large Language Models". arXiv: 2305.16291 [cs.AI].
↑ Bostrom, Nick (2012). "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents" (PDF). Minds and Machines. 22 (2): 71–85. doi:10.1007/s11023-012-9281-3.
↑ Hendrycks, Dan (2023). "Natural Selection Favors AIs over Humans". arXiv: 2303.16200 .
↑ "Uh Oh, OpenAI's GPT-4 Just Fooled a Human Into Solving a CAPTCHA". Futurism. Retrieved 2024-01-23.
↑ Yuan, Weizhe; Pang, Richard Yuanzhe; Cho, Kyunghyun; Sukhbaatar, Sainbayar; Xu, Jing; Weston, Jason (2024-01-18). "Self-Rewarding Language Models". arXiv: 2401.10020 [cs.CL].
↑ "Research". openai.com. Retrieved 2024-01-24.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Creighton, Jolene (2019-03-19). "The Unavoidable Problem of Self-Improvement in AI: An Interview with Ramana Kumar, Part 1". Future of Life Institute. Retrieved 2024-01-23.

[2] Heighn (12 June 2022). "The Calculus of Nash Equilibria". LessWrong.

[3] Hutson, Matthew (2023-05-16). "Can We Stop Runaway A.I.?". The New Yorker. ISSN 0028-792X . Retrieved 2024-01-24.

[4] "Stop AGI". www.stop.ai. Retrieved 2024-01-24.

[5] "Seed AI - LessWrong". www.lesswrong.com. 28 September 2011. Retrieved 2024-01-24.

[6] Readingraphics (2018-11-30). "Book Summary - Life 3.0 (Max Tegmark)". Readingraphics. Retrieved 2024-01-23.

[7] Tegmark, Max (August 24, 2017). Life 3.0: Being a Human in the Age of Artificial Intelligence. Vintage Books, Allen Lane.

[8] Yudkowsky, Eliezer. "Levels of Organization in General Intelligence" (PDF). Machine Intelligence Research Institute.

[:1-9] 1 2 Zelikman, Eric; Lorch, Eliana; Mackey, Lester; Kalai, Adam Tauman (2023-10-03). "Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation". arXiv: 2310.02304 [cs.CL].

[10] Wang, Guanzhi; Xie, Yuqi; Jiang, Yunfan; Mandlekar, Ajay; Xiao, Chaowei; Zhu, Yuke; Fan, Linxi; Anandkumar, Anima (2023-10-19). "Voyager: An Open-Ended Embodied Agent with Large Language Models". arXiv: 2305.16291 [cs.AI].

[11] Bostrom, Nick (2012). "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents" (PDF). Minds and Machines. 22 (2): 71–85. doi:10.1007/s11023-012-9281-3.

[12] Hendrycks, Dan (2023). "Natural Selection Favors AIs over Humans". arXiv: 2303.16200 .

[:0-13] "Uh Oh, OpenAI's GPT-4 Just Fooled a Human Into Solving a CAPTCHA". Futurism. Retrieved 2024-01-23.

[14] Yuan, Weizhe; Pang, Richard Yuanzhe; Cho, Kyunghyun; Sukhbaatar, Sainbayar; Xu, Jing; Weston, Jason (2024-01-18). "Self-Rewarding Language Models". arXiv: 2401.10020 [cs.CL].

[15] "Research". openai.com. Retrieved 2024-01-24.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]