Original author(s) | OpenAI [1] |
---|---|
Initial release | May 28, 2020 (publication); June 11, 2020 (OA API beta) |
Repository | |
Predecessor | GPT-2 |
Successor | GPT-3.5 GPT-4 |
Type | |
License | proprietary |
Website | openai |
Part of a series on |
Machine learning and data mining |
---|
Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.
Like its predecessor, GPT-2, it is a decoder-only [2] transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". [3] This attention mechanism allows the model to focus selectively on segments of input text it predicts to be most relevant. [4] GPT-3 has 175 billion parameters, each with 16-bit precision, requiring 350GB of storage since each parameter occupies 2 bytes. It has a context window size of 2048 tokens, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks. [2]
On September 22, 2020, Microsoft announced that it had licensed GPT-3 exclusively. Others can still receive output from its public API, but only Microsoft has access to the underlying model. [5]
According to The Economist , improved algorithms, more powerful computers, and a recent increase in the amount of digitized material have fueled a revolution in machine learning. New techniques in the 2010s resulted in "rapid improvements in tasks", including manipulating language. [6]
Software models are trained to learn by using thousands or millions of examples in a "structure ... loosely based on the neural architecture of the brain". [6] One architecture used in natural language processing (NLP) is a neural network based on a deep learning model that was introduced in 2017—the transformer architecture. [7] There are a number of NLP systems capable of processing, mining, organizing, connecting and contrasting textual input, as well as correctly answering questions. [8]
On June 11, 2018, OpenAI researchers and engineers published a paper introducing the first generative pre-trained transformer (GPT)—a type of generative large language model that is pre-trained with an enormous and diverse text corpus in datasets, followed by discriminative fine-tuning to focus on a specific task. GPT models are transformer-based deep-learning neural network architectures. Previously, the best-performing neural NLP models commonly employed supervised learning from large amounts of manually-labeled data, which made it prohibitively expensive and time-consuming to train extremely large language models. [2] The first GPT model was known as "GPT-1," and it was followed by "GPT-2" in February 2019. Created as a direct scale-up of its predecessor, GPT-2 had both its parameter count and dataset size increased by a factor of 10. It had 1.5 billion parameters, and was trained on a dataset of 8 million web pages. [9]
In February 2020, Microsoft introduced its Turing Natural Language Generation (T-NLG), which they claimed was "largest language model ever published at 17 billion parameters." [10] It performed better than any other language model at a variety of tasks, including summarizing texts and answering questions.
The construct of "learning styles" is problematic because it fails to account for the processes through which learning styles are shaped. Some students might develop a particular learning style because they have had particular experiences. Others might develop a particular learning style by trying to accommodate to a learning environment that was not well suited to their learning needs. Ultimately, we need to understand the interactions among learning styles and environmental and personal factors, and how these shape how we learn and the kinds of learning we experience.
– Text generated by Mike Sharples [11]
On May 28, 2020, an arXiv preprint by a group of 31 engineers and researchers at OpenAI described the achievement and development of GPT-3, a third-generation "state-of-the-art language model". [1] [12] The team increased the capacity of GPT-3 by over two orders of magnitude from that of its predecessor, GPT-2, [13] making GPT-3 the largest non-sparse language model to date. [1] : 14 [14] Because GPT-3 is structurally similar to its predecessors, [1] its greater accuracy is attributed to its increased capacity and greater number of parameters. [15] GPT-3's capacity is ten times larger than that of Microsoft's Turing NLG, the next largest NLP model known at the time. [12]
Lambdalabs estimated a hypothetical cost of around $4.6 million US dollars and 355 years to train GPT-3 on a single GPU in 2020, [16] with lower actual training time by using more GPUs in parallel.
Sixty percent of the weighted pre-training dataset for GPT-3 comes from a filtered version of Common Crawl consisting of 410 billion byte-pair-encoded tokens. Fuzzy deduplication used Apache Spark's MinHashLSH. [1] : 9 Other sources are 19 billion tokens from WebText2 representing 22% of the weighted total, 12 billion tokens from Books1 representing 8%, 55 billion tokens from Books2 representing 8%, and 3 billion tokens from Wikipedia representing 3%. [1] : 9 GPT-3 was trained on hundreds of billions of words and is also capable of coding in CSS, JSX, and Python, among others.[ citation needed ]
Dataset | # tokens | Proportion within training |
---|---|---|
Common Crawl | 410 billion | 60% |
WebText2 | 19 billion | 22% |
Books1 | 12 billion | 8% |
Books2 | 55 billion | 8% |
Wikipedia | 3 billion | 3% |
Since GPT-3's training data was all-encompassing, it does not require further training for distinct language tasks.[ citation needed ] The training data contains occasional toxic language and GPT-3 occasionally generates toxic language as a result of mimicking its training data. A study from the University of Washington found that GPT-3 produced toxic language at a toxicity level comparable to the similar natural language processing models of GPT-2 and CTRL. OpenAI has implemented several strategies to limit the amount of toxic language generated by GPT-3. As a result, GPT-3 produced less toxic language compared to its predecessor model, GPT-1, although it produced both more generations and a higher toxicity of toxic language compared to CTRL Wiki, a language model trained entirely on Wikipedia data. [17]
On June 11, 2020, OpenAI announced that users could request access to its user-friendly GPT-3 API—a "machine learning toolset"—to help OpenAI "explore the strengths and limits" of this new technology. [18] [19] The invitation described how this API had a general-purpose "text in, text out" interface that can complete almost "any English language task", instead of the usual single use-case. [18] According to one user, who had access to a private early release of the OpenAI GPT-3 API, GPT-3 was "eerily good" at writing "amazingly coherent text" with only a few simple prompts. [20] In an initial experiment 80 US subjects were asked to judge if short ~200 word articles were written by humans or GPT-3. The participants judged correctly 52% of the time, doing only slightly better than random guessing. [1]
On November 18, 2021, OpenAI announced that enough safeguards had been implemented that access to its API would be unrestricted. [21] OpenAI provided developers with a content moderation tool that helps them abide by OpenAI's content policy. [22] On January 27, 2022, OpenAI announced that its newest GPT-3 language models (collectively referred to as InstructGPT) were now the default language model used on their API. According to OpenAI, InstructGPT produced content that was better aligned to user intentions by following instructions better, generating fewer made-up facts, and producing somewhat less toxic content. [23]
Because GPT-3 can "generate news articles which human evaluators have difficulty distinguishing from articles written by humans," [12] GPT-3 has the "potential to advance both the beneficial and harmful applications of language models." [1] : 34 In their May 28, 2020 paper, the researchers described in detail the potential "harmful effects of GPT-3" [12] which include "misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting". [1] The authors draw attention to these dangers to call for research on risk mitigation. [1] : 34
GPT-3 is capable of performing zero-shot and few-shot learning (including one-shot). [1]
In June 2022, Almira Osmanovic Thunström wrote that GPT-3 was the primary author on an article on itself, that they had submitted it for publication, [24] and that it had been pre-published while waiting for completion of its review. [25]
There are many models in the GPT-3 family, some serving different purposes than others. In the initial research paper published by OpenAI, they mentioned 8 different sizes of the main GPT-3 model:
Model name | Parameters | API name |
---|---|---|
GPT-3 Small | 125 M | n/a |
GPT-3 Medium | 350 M | ada |
GPT-3 Large | 760 M | n/a |
GPT-3 XL | 1.3 B | babbage |
GPT-3 2.7B | 2.7 B | n/a |
GPT-3 6.7B | 6.7 B | curie |
GPT-3 13B | 13B | n/a |
GPT-3 175B | 175B | davinci |
Half of the models are accessible through the API, namely GPT-3-medium, GPT-3-xl, GPT-3-6.7B and GPT-3-175b, which are referred to as ada, babbage, curie and davinci respectively. While the size of the API models was not originally disclosed by OpenAI, EleutherAI announced the mapping between model sizes and API names in May 2021. [26] These model sizes were later confirmed by OpenAI, [27] but the sizes of subsequent models have not been disclosed.
Model | Parameters | Description | Series |
---|---|---|---|
ada | 350 M | Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. | Base GPT-3 |
babbage babbage-002 | 1.3 B | Capable of straightforward tasks, very fast, and lower cost. | Base GPT-3 |
curie | 6.7B | Very capable, but faster and lower cost than Davinci. | Base GPT-3 |
davinci davinci-002 | 175 B | Most capable GPT-3 model. Can do any task the other models can do, often with higher quality. | Base GPT-3 |
text-ada-001 | 350 M | Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. | InstructGPT |
text-babbage-001 | 1.3B | Capable of straightforward tasks, very fast, and lower cost. | InstructGPT |
text-curie-001 | 6.7B | Very capable, faster and lower cost than Davinci. | InstructGPT |
text-davinci-001 | 175B | Older version of the most capable model in the GPT-3 series. Can perform any task the other GPT-3 models can, often with less context. | InstructGPT |
text-davinci-002 code-davinci-002 | Undisclosed | Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning | GPT-3.5 |
text-davinci-003 | Undisclosed | Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports inserting completions within text. | GPT-3.5 |
gpt-3.5-turbo gpt-3.5-turbo-instruct gpt-3.5-turbo-16k | Undisclosed | Most capable and cost effective (fastest) GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003 . | GPT-3.5 |
Original author(s) | OpenAI [1] |
---|---|
Initial release | March 15, 2022 |
Repository | n/a |
Predecessor | GPT-3 |
Successor | GPT-4 |
Type | |
License | Privative |
Website | n/a |
Generative Pre-trained Transformer 3.5 (GPT-3.5) is a sub class of GPT-3 Models created by OpenAI in 2022.
On March 15, 2022, OpenAI made available new versions of GPT-3 and Codex in its API with edit and insert capabilities under the names "text-davinci-002" and "code-davinci-002". [28] These models were described as more capable than previous versions and were trained on data up to June 2021. [29] On November 28, 2022, OpenAI introduced text-davinci-003. [30] On November 30, 2022, OpenAI began referring to these models as belonging to the "GPT-3.5" series, [29] and released ChatGPT, which was fine-tuned from a model in the GPT-3.5 series. [31] OpenAI does not include GPT-3.5 in GPT-3. [32]
There are three models: [33]
On April 10, 2023, OpenAI introduced a new variant of its GPT-3.5 series model, known as GPT-3.5 with Browsing (ALPHA). [34] This updated model was described to build upon the capabilities of its predecessors "text-davinci-002" and "code-davinci-002". [35] The GPT-3.5 with Browsing (ALPHA) model incorporated the ability to access and browse online information. This has led to more accurate and up-to-date responses to user queries. [34]
The GPT-3.5 with Browsing (ALPHA) model has been trained on data up to September 2021, giving it more information compared to previous GPT-3.5 models, which were trained on data up until June 2021. The model attempted to provide developers and users with an advanced natural language processing tool that can effectively retrieve and synthesize online information. [34]
To enable browsing capabilities, OpenAI implemented a new API that allows the GPT-3.5 with Browsing (ALPHA) model to access selected online resources during operation. [36] This feature allows users to ask questions or request information with the expectation that the model will deliver updated, accurate, and relevant answers based on the latest online sources available to it.
On April 27, 2023, OpenAI made the GPT-3.5 with Browsing (ALPHA) model publicly available to GPT Plus users. This allowed more people to access to its new features. [36]
InstructGPT is a fine-tuned version of GPT-3.5 trained on a dataset of human-written instructions. [37]
GPT-3's builder, OpenAI, was initially founded as a non-profit in 2015. [62] In 2019, OpenAI broke from its usual open-source standards by not publicly releasing GPT-3's predecessor model, citing concerns that the model could facilitate the propagation of fake news. OpenAI eventually released a version of GPT-2 that was 8% of the original model's size. [63] In the same year, OpenAI restructured to be a for-profit company. [64] In 2020, Microsoft announced the company had exclusive licensing of GPT-3 for Microsoft's products and services following a multi-billion dollar investment in OpenAI. The agreement permits OpenAI to offer a public-facing API such that users can send text to GPT-3 to receive the model's output, but only Microsoft will have access to GPT-3's source code. [5]
Large language models, such as GPT-3, have come under criticism from a few of Google's AI ethics researchers for the environmental impact of training and storing the models, detailed in a paper co-authored by Timnit Gebru and Emily M. Bender in 2021. [65]
The growing[ when? ] use of automated writing technologies based on GPT-3 and other language generators, has raised concerns regarding academic integrity [66] and raised the stakes of how universities and schools will gauge what constitutes academic misconduct such as plagiarism. [67]
OpenAI's GPT series was built with data from the Common Crawl dataset, [68] a conglomerate of copyrighted articles, internet posts, web pages, and books scraped from 60 million domains over a period of 12 years. TechCrunch reports this training data includes copyrighted material from the BBC, The New York Times, Reddit, the full text of online books, and more. [69] In its response to a 2019 Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation from the United States Patent and Trademark Office (USPTO), OpenAI argued that "Under current law, training AI systems [such as its GPT models] constitutes fair use," but that "given the lack of case law on point, OpenAI and other AI developers like us face substantial legal uncertainty and compliance costs." [70]
A chatbot is a software application or web interface that is designed to mimic human conversation through text or voice interactions. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.
Microsoft Azure, or just Azure, is the cloud computing platform developed by Microsoft. It has management, access and development of applications and services to individuals, companies, and governments through its global infrastructure. It also provides capabilities that are usually not included within other cloud platforms, including software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Microsoft Azure supports many programming languages, tools, and frameworks, including Microsoft-specific and third-party software and systems.
OpenAI is an American artificial intelligence (AI) research organization founded in December 2015 and headquartered in San Francisco, California. Its mission is to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.
Clarifai is a leader in AI orchestration and development, helping organizations, teams, and developers build, deploy, orchestrate, and operationalize AI at scale. Clarifai’s cutting-edge AI workflow orchestration platform leverages today's modern AI technologies like Large Language Model(LLMs), Large Vision Model (LVMs), and Retrieval Augmented Generation (RAG), data labeling, inference, and more, and is available in cloud, on-premises, or hybrid environments. Founded in 2013, Clarifai has been used to build more than 1.5 million AI models with more than 400,000 users in 170 countries.
AI Dungeon is a single-player/multiplayer text adventure game which uses artificial intelligence (AI) to generate content and allows players to create and share adventures and custom prompts. The game's first version was made available in May 2019, and its second version was released on Google Colaboratory in December 2019. It was later ported that same month to its current cross-platform web application. The AI model was then reformed in July 2020.
Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.
DALL·E, DALL·E 2, and DALL·E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as "prompts".
GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go. In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.
OpenAI Codex is an artificial intelligence model developed by OpenAI. It parses natural language and generates code in response. It powers GitHub Copilot, a programming autocompletion tool for select IDEs, like Visual Studio Code and Neovim. Codex is a descendant of OpenAI's GPT-3 model, fine-tuned for use in programming applications.
Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", or a longer statement including context, instructions, and conversation history.
LaMDA is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year.
Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.
ChatGPT is a generative artificial intelligence (AI) chatbot developed by OpenAI and launched in 2022. It is based on the GPT-4o large language model (LLM). ChatGPT can generate human-like conversational responses, and enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. It is credited with accelerating the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence. Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.
Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.
A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.
GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.
A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
Microsoft Copilot is a generative artificial intelligence chatbot developed by Microsoft. Based on the GPT-4 series of large language models, it was launched in 2023 as Microsoft's primary replacement for the discontinued Cortana.
Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for hosting hateful and extremist content.
Claude is a family of large language models developed by Anthropic. The first model was released in March 2023.
The companies say OpenAI will continue to offer its public-facing API, which allows chosen users to send text to GPT-3 or OpenAI's other models and receive its output. Only Microsoft, however, will have access to GPT-3's underlying code, allowing it to embed, repurpose, and modify the model as it pleases.
{{cite web}}
: CS1 maint: archived copy as title (link)GPT-2, is a 1.5B parameter Transformer
If you've ever wanted to try out OpenAI's vaunted machine learning toolset, it just got a lot easier. The company has released an API that lets developers call its AI tools in on "virtually any English language task."
{{cite news}}
: CS1 maint: numeric names: authors list (link){{cite news}}
: CS1 maint: numeric names: authors list (link)