OpenAI o3

Last updated

o3
Developer(s) OpenAI
Predecessor OpenAI o1
Type Generative pre-trained transformer

OpenAI o3 is a generative pre-trained transformer (GPT) model developed by OpenAI as a successor to OpenAI o1. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning. [1] [2]

Contents

History

The OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the mobile carrier brand named O2. The model is available in two versions: o3 and o3-mini. OpenAI invited safety and security researchers to apply for early access of these models until January 10, 2025. [1] [3] OpenAI plans to release o3-mini to the public in January 2025. [4] o3-mini features three compute levels: low, medium and high. [5] OpenAI has not yet announced how much it will cost for users.

Capabilities

Reinforcement learning was used to teach o3 to "think" before generating answers, using what OpenAI refers to as a "private chain of thought". This approach enables the model to plan ahead and reason through tasks, performing a series of intermediate reasoning steps to assist in solving the problem, at the cost of additional computing power and increased latency of responses. [6]

o3 demonstrates significantly better performance than o1 on complex tasks, including coding, mathematics, and science. [1] OpenAI reported that o3 achieved a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online. [7]

On SWE-bench Verified, a software engineering benchmark assessing the ability to solve real GitHub issues, o3 scored 71.7%, compared to 48.9% for o1. On Codeforces, o3 reached an Elo score of 2727, whereas o1 scored 1891. [7]

On the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) benchmark, which evaluates an AI's ability to handle new, challenging logical and skill acquisition problems, o3 attained three times the accuracy of o1. [1] [8]

Related Research Articles

Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. Such machines may be called AIs.

In the field of artificial intelligence (AI), tasks that are hypothesized to require artificial general intelligence to solve are informally known as AI-complete or AI-hard. Calling a problem AI-complete reflects the belief that it cannot be solved by a simple specific algorithm.

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that matches or surpasses human cognitive capabilities across a wide range of cognitive tasks. This contrasts with narrow AI, which is limited to specific tasks. Artificial superintelligence (ASI), on the other hand, refers to AGI that greatly exceeds human cognitive capabilities. AGI is considered one of the definitions of strong AI.

Recursive self-improvement (RSI) is a process in which an early or weak artificial general intelligence (AGI) system enhances its own capabilities and intelligence without human intervention, leading to a superintelligence or intelligence explosion.

<span class="mw-page-title-main">Intelligent agent</span> Software agent which acts autonomously

In intelligence and artificial intelligence, an intelligent agent (IA) is an agent that perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge.

<span class="mw-page-title-main">History of artificial intelligence</span>

The history of artificial intelligence (AI) began in antiquity, with myths, stories, and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The study of logic and formal reasoning from antiquity to the present led directly to the invention of the programmable digital computer in the 1940s, a machine based on abstract mathematical reasoning. This device and the ideas behind it inspired scientists to begin discussing the possibility of building an electronic brain.

<span class="mw-page-title-main">Progress in artificial intelligence</span> How AI-related technologies evolve

Progress in artificial intelligence (AI) refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a multidisciplinary branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. AI applications have been used in a wide range of fields including medical diagnosis, finance, robotics, law, video games, agriculture, and scientific discovery. However, many AI applications are not perceived as AI: "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 1990s and early 2000s, AI technology became widely used as elements of larger systems, but the field was rarely credited for these successes at the time.

<span class="mw-page-title-main">Analytical skill</span> Crucial skill in all different fields of work and life

Analytical skill is the ability to deconstruct information into smaller categories in order to draw conclusions. Analytical skill consists of categories that include logical reasoning, critical thinking, communication, research, data analysis and creativity. Analytical skill is taught in contemporary education with the intention of fostering the appropriate practices for future professions. The professions that adopt analytical skill include educational institutions, public institutions, community organisations and industry.

Cognitive skills are skills of the mind, as opposed to other types of skills such as motor skills or social skills. Some examples of cognitive skills are literacy, self-reflection, logical reasoning, abstract thinking, critical thinking, introspection and mental arithmetic. Cognitive skills vary in processing complexity, and can range from more fundamental processes such as perception and various memory functions, to more sophisticated processes such as decision making, problem solving and metacognition.

DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc.. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.

The Winograd schema challenge (WSC) is a test of machine intelligence proposed in 2012 by Hector Levesque, a computer scientist at the University of Toronto. Designed to be an improvement on the Turing test, it is a multiple-choice test that employs questions of a very specific structure: they are instances of what are called Winograd schemas, named after Terry Winograd, professor of computer science at Stanford University.

Existential risk from artificial intelligence refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.

OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

<span class="mw-page-title-main">François Chollet</span> Machine learning researcher

François Chollet is a French software engineer and artificial intelligence researcher formerly Senior Staff Engineer at Google. Chollet is the creator of the Keras deep-learning library, released in 2015. His research focuses on computer vision, the application of machine learning to formal reasoning, abstraction, and how to achieve greater generality in artificial intelligence.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model.

OpenAI o1 is a generative pre-trained transformer (GPT). A preview of o1 was released by OpenAI on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science and programming than GPT-4o. The full version was released on December 5, 2024.

DeepSeek is a Chinese artificial intelligence (AI) firm and family of Large Language Models based in Hangzhou. It is founded and backed by the Chinese hedge fund, High-Flyer. It has released its models as open source. The latest version, DeepSeek-V3, is competitive with other LLMs released in 2024 such as that of Qwen and OpenAI.

Qwen is a family of large language models developed by Alibaba. In July of 2024 it ranked as the top Chinese language model in some benchmarks and third globally behind the top models of Anthropic and OpenAI.

References

  1. 1 2 3 4 Knight, Will (December 20, 2024). "OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills". Wired .
  2. Metz, Cade (December 20, 2024). "OpenAI Unveils New A.l. That Can 'Reason' Through Math and Science Problems". The New York Times .
  3. "Early access for safety testing". OpenAI. December 20, 2024.
  4. Edwards, Benj (December 20, 2024). "OpenAI announces o3 and o3-mini, its next simulated reasoning models". Ars Technica .
  5. Mauran, Cecily (December 20, 2024). "OpenAI announces o3 and o3 mini reasoning models". Mashable . Retrieved January 7, 2025.
  6. Zeff, Maxwell; Wiggers, Kyle (December 20, 2024). "OpenAI announces new o3 models". TechCrunch . Retrieved December 22, 2024.
  7. 1 2 Franzen, Carl; David, Emilia (December 20, 2024). "OpenAI confirms new frontier models o3 and o3-mini". VentureBeat . Retrieved December 26, 2024.
  8. Hsu, Jeremy (December 20, 2024). "OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI". New Scientist . Retrieved December 22, 2024.