Developer(s) | OpenAI |
---|---|
Stable release | o3-mini / January 31, 2025 |
Predecessor | OpenAI o1 |
Type | Generative pre-trained transformer |
OpenAI o3 is a reflective generative pre-trained transformer (GPT) model developed by OpenAI as a successor to OpenAI o1. It is designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning. [1] [2] OpenAI released the smaller model, o3-mini, on January 31st, 2025. [3]
The OpenAI o3 model was announced on December 20, 2024, with the designation "o3" chosen to avoid trademark conflict with the mobile carrier brand named O2. [1] OpenAI invited safety and security researchers to apply for early access of these models until January 10, 2025. [4] Similarly to o1, there are two different models: o3 and o3-mini. [3]
On January 31, 2025, OpenAI released o3-mini to all ChatGPT users (including free-tier) and some API users. OpenAI describes o3-mini as a "specialized alternative" to o1 for "technical domains requiring precision and speed". [5] o3-mini features three reasoning effort levels: low, medium and high. The free version uses medium. The variant using more compute is called o3-mini-high, and is available to paid subscribers. [3] [6] Subscribers to ChatGPT's Pro tier have unlimited access to both o3-mini and o3-mini-high. [5]
On February 2, 2025, OpenAI launched OpenAI Deep Research, a ChatGPT service using a version of o3 that makes comprehensive reports within 5 to 30 minutes, based on web searches. [7]
On February 6, 2025, in response to pressure from rivals like DeepSeek, OpenAI announced an update aimed at enhancing the transparency of the thought process in its o3-mini model. [8] On February 12th, 2025, OpenAI further increased rate limits for o3-mini-high to 50 requests per day (from 50 requests per week) for ChatGPT Plus subscribers, and implemented file/image upload support. [9]
Reinforcement learning was used to teach o3 to "think" before generating answers, using what OpenAI refers to as a "private chain of thought". [10] This approach enables the model to plan ahead and reason through tasks, performing a series of intermediate reasoning steps to assist in solving the problem, at the cost of additional computing power and increased latency of responses. [11]
o3 demonstrates significantly better performance than o1 on complex tasks, including coding, mathematics, and science. [1] OpenAI reported that o3 achieved a score of 87.7% on the GPQA Diamond benchmark, which contains expert-level science questions not publicly available online. [12]
On SWE-bench Verified, a software engineering benchmark assessing the ability to solve real GitHub issues, o3 scored 71.7%, compared to 48.9% for o1. On Codeforces, o3 reached an Elo score of 2727, whereas o1 scored 1891. [12]
On the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) benchmark, which evaluates an AI's ability to handle new logical and skill acquisition problems, o3 attained three times the accuracy of o1. [1] [13]