Developer(s) | OpenAI |
---|---|
Initial release | May 13, 2024 |
Predecessor | GPT-4 Turbo |
Successor | OpenAI o1 |
Type | |
License | Proprietary |
Website | openai |
GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. [1] GPT-4o is free, but with a usage limit that is five times higher for ChatGPT Plus subscribers. [2] It can process and generate text, images and audio. [3] Its application programming interface (API) is twice as fast and half the price of its predecessor, GPT-4 Turbo. [1]
Multiple versions of GPT-4o were originally secretly launched under different names on Large Model Systems Organization's (LMSYS) Chatbot Arena as three different models. These three models were called gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. [4] On 7 May 2024, Sam Altman tweeted "im-a-good-gpt2-chatbot", which was commonly interpreted as a confirmation that these were new OpenAI models being A/B tested. [5]
GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation. [6] [7] GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 by GPT-4. [8] Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice. [8] Sam Altman noted on 15 May 2024 that GPT-4o's voice-to-voice capabilities were not yet integrated into ChatGPT, and that the old version was still being used. [9] This new mode, called Advanced Voice Mode, is currently in limited alpha release [10] and is based on the 4o-audio-preview. [11] On 1 October 2024, the Realtime API was introduced. [12]
The model supports over 50 languages, [1] which OpenAI claims cover over 97% of speakers. [13] Mira Murati demonstrated the model's multilingual capability by speaking Italian to the model and having it translate between English and Italian during the live-streamed OpenAI demonstration event on 13 May 2024. In addition, the new tokenizer [14] uses fewer tokens for certain languages, especially languages that are not based on the Latin alphabet, making it cheaper for those languages. [8]
GPT-4o has knowledge up to October 2023, [15] [16] but can access the Internet if up-to-date information is needed. It has a context length of 128k tokens [15] with an output token limit capped to 4,096, [16] and after a later update (gpt-4o-2024-08-06) to 16,384. [17]
As of May 2024, it is the leading model in the LMSYS Elo Arena Benchmarks by the University of California, Berkeley. [18]
In August 2024, OpenAI introduced a new feature allowing corporate customers to customize GPT-4o using proprietary company data. This customization, known as fine-tuning, enables businesses to adapt GPT-4o to specific tasks or industries, enhancing its utility in areas like customer service and specialized knowledge domains. Previously, fine-tuning was available only on the less powerful model GPT-4o mini. [19] [20]
The fine-tuning process requires customers to upload their data to OpenAI's servers, with the training typically taking one to two hours. Initially, the customization will be limited to text-based data. OpenAI's focus with this rollout is to reduce the complexity and effort required for businesses to tailor AI solutions to their needs, potentially increasing the adoption and effectiveness of AI in corporate environments. [21] [19]
On July 18, 2024, OpenAI released a smaller and cheaper version, GPT-4o mini. [22]
According to OpenAI, its low cost is expected to be particularly useful for companies, startups, and developers that seek to integrate it into their services, which often make a high number of API calls. Its API costs $0.15 per million input tokens and $0.6 per million output tokens, compared to $5 and $15, respectively, for GPT-4o. It is also significantly more capable and 60% cheaper than GPT-3.5 Turbo, which it replaced on the ChatGPT interface. [22] The price after fine-tuning doubles: $0.3 per million input tokens and $1.2 per million output tokens. [23]
GPT-4o mini is the default model for users not logged in who use ChatGPT as guests and those who have hit the limit for GPT-4o.
GPT-4o mini will become available in fall 2024 on Apple's mobile devices and Mac desktops, through the Apple Intelligence feature. [22]
As released, GPT-4o offered five voices: Breeze, Cove, Ember, Juniper, and Sky. A similarity between the voice of American actress Scarlett Johansson and Sky was quickly noticed. On May 14, Entertainment Weekly asked themselves whether this likeness was on purpose. [24] On May 18, Johansson's husband, Colin Jost, joked about the similarity in a segment on Saturday Night Live . [25] On May 20, 2024, OpenAI disabled the Sky voice, issuing a statement saying "We've heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them." [26]
Scarlett Johansson starred in the 2013 sci-fi movie Her, playing Samantha, an artificially intelligent virtual assistant personified by a female voice. As part of the promotion leading up to the release of GPT-4o, Sam Altman on May 13 tweeted a single word: "her". [27] [28]
OpenAI stated that each voice was based on the voice work of a hired actor. According to OpenAI, "Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice." [26] CTO Mira Murati stated "I don't know about the voice. I actually had to go and listen to Scarlett Johansson's voice." OpenAI further stated the voice talent was recruited before reaching out to Johansson. [28] [29]
On May 21, Johansson issued a statement explaining that OpenAI had repeatedly offered to make her a deal to gain permission to use her voice as early as nine months prior to release, a deal she rejected. She said she was "shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference." In the statement, Johansson also used the incident to draw attention to the lack of legal safeguards around the use of creative work to power leading AI tools, as her legal counsel demanded OpenAI detail the specifics of how the Sky voice was created. [28] [30]
Observers noted similarities to how Johansson had previously sued and settled with The Walt Disney Company for breach of contract over the direct-to-streaming rollout of her Marvel film Black Widow , [31] a settlement widely speculated to have netted her around $40M. [32]
Also on May 21, Shira Ovide at The Washington Post shared her list of "most bone-headed self-owns" by technology companies, with the decision to go ahead with a Johansson sound-alike voice despite her opposition and then denying the similarities ranking 6th. [33] On May 24, Derek Robertson at Politico wrote about the "massive backlash", concluding that "appropriating the voice of one of the world's most famous movie stars — in reference [...] to a film that serves as a cautionary tale about over-reliance on AI — is unlikely to help shift the public back into [Sam Altman's] corner anytime soon." [34]
A chatbot is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.
Samuel Harris Altman is an American entrepreneur and investor best known as the chief executive officer of OpenAI since 2019. He is also the Chairman of clean energy companies Oklo Inc. and Helion Energy. Altman is considered to be one of the leading figures of the AI boom. He dropped out of Stanford University after two years and founded Loopt, a mobile social networking service, raising more than $30 million in venture capital. In 2011, Altman joined Y Combinator, a startup accelerator, and was its president from 2014 to 2019.
Her is a 2013 American science-fiction romantic comedy-drama film written, directed, and co-produced by Spike Jonze. Her follows Theodore Twombly, a man who develops a relationship with Samantha, an artificially intelligent virtual assistant personified through a female voice. The film also stars Amy Adams, Rooney Mara, Olivia Wilde, and Chris Pratt. Her was dedicated to James Gandolfini, Harris Savides, Maurice Sendak and Adam Yauch, who all died before the film's release.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its stated mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.
Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.
DALL-E, DALL-E 2, and DALL-E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as "prompts".
LaMDA is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year.
ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and launched in 2022. It is based on the GPT-4o large language model (LLM). ChatGPT can generate human-like conversational responses, and enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. It is credited with accelerating the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence (AI). Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.
A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.
Ermira "Mira" Murati is an engineer, researcher, and tech executive. She served as chief technology officer of OpenAI from May 2022 to September 2024.
A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
The AI boom, or AI spring, is an ongoing period of rapid progress in the field of artificial intelligence (AI) that started in the late 2010s before gaining international prominence in the early 2020s. Examples include protein folding prediction led by Google DeepMind as well as large language models and generative AI applications developed by OpenAI.
Microsoft Copilot is a generative artificial intelligence chatbot developed by Microsoft. Based on the GPT-4 series of large language models, it was launched in 2023 as Microsoft's primary replacement for the discontinued Cortana.
AutoGPT is an open-source "AI agent" that, given a goal in natural language, will attempt to achieve it by breaking it into sub-tasks and using the Internet and other tools in an automatic loop. It uses OpenAI's GPT-4 or GPT-3.5 APIs, and is among the first examples of an application using GPT-4 to perform autonomous tasks.
Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the chatbot of the same name.
Since the public release of ChatGPT by OpenAI in November 2022, the integration of chatbots in education has sparked considerable debate and exploration. Educators' opinions vary widely; while some are skeptical about the benefits of large language models, many see them as valuable tools.
Grok is a generative artificial intelligence chatbot developed by xAI. Based on the large language model (LLM) of the same name, it was launched in 2023 as an initiative by Elon Musk. The chatbot is advertised as having a "sense of humor" and direct access to X. It is currently under beta testing.
Claude is a family of large language models developed by Anthropic. The first model was released in March 2023.
OpenAI o1 is a generative pre-trained transformer. A preview of o1 was released by OpenAI on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science and programming than GPT-4o. The full version was released on December 5, 2024.