GPT-2

Last updated
Generative Pre-trained Transformer 2 (GPT-2)
Original author(s) OpenAI
Initial release14 February 2019;5 years ago (14 February 2019)
Repository https://github.com/openai/gpt-2
Predecessor GPT-1
Successor GPT-3
Type
License MIT [1]
Website openai.com/blog/gpt-2-1-5b-release/

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. [2] It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019. [3] [4] [5] [6] [7]

Contents

GPT-2 was created as a "direct scale-up" of GPT-1 [8] with a ten-fold increase in both its parameter count and the size of its training dataset. [7] It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence, [2] [9] which enabled it to translate texts, answer questions about a topic from a text, summarize passages from a larger text, [9] and generate text output on a level sometimes indistinguishable from that of humans, [10] however it could become repetitive or nonsensical when generating long passages. [11] It was superseded by GPT-3 and GPT-4 models, which are not open source anymore.

GPT-2 has, like its predecessor GPT-1 and its successors GPT-3 and GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model, [8] which uses attention instead of older recurrence- and convolution-based architectures. [12] [13] Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. [14] [15] This model allows for greatly increased parallelization, and outperforms previous benchmarks for RNN/CNN/LSTM-based models. [8]

Training

Since the transformer architecture enabled massive parallelization, GPT models could be trained on larger corpora than previous NLP (natural language processing) models. While the GPT-1 model demonstrated that the approach was viable, GPT-2 would further explore the emergent properties of networks trained on extremely large corpora. CommonCrawl , a large corpus produced by web crawling and previously used in training NLP systems, [16] was considered due to its large size, but was rejected after further review revealed large amounts of unintelligible content. [2] [16] Instead, OpenAI developed a new corpus, known as WebText ; rather than scraping content indiscriminately from the World Wide Web, WebText was generated by scraping only pages linked to by Reddit posts that had received at least three upvotes prior to December 2017. The corpus was subsequently cleaned; HTML documents were parsed into plain text, duplicate pages were eliminated, and Wikipedia pages were removed (since their presence in many other datasets could have induced overfitting). [2]

While the cost of training GPT-2 is known to have been $256 per hour, [17] [18] the amount of hours it took to complete training is unknown; therefore, the overall training cost cannot be estimated accurately. [19] However, comparable large language models using transformer architectures have had their costs documented in more detail; the training processes for BERT and XLNet consumed, respectively, $6,912 and $245,000 of resources. [18]

Release

GPT-2 was first announced on 14 February 2019. A February 2019 article in The Verge by James Vincent said that, while "[the] writing it produces is usually easily identifiable as non-human", it remained "one of the most exciting examples yet" of language generation programs: [20]

Give it a fake headline, and it’ll write the rest of the article, complete with fake quotations and statistics. Feed it the first line of a short story, and it’ll tell you what happens to your character next. It can even write fan fiction, given the right prompt. [20]

The Guardian described this output as "plausible newspaper prose"; [11] Kelsey Piper of Vox said "one of the coolest AI systems I’ve ever seen may also be the one that will kick me out of my job". [21] GPT-2's flexibility was described as "impressive" by The Verge ; specifically, its ability to translate text between languages, summarize long articles, and answer trivia questions were noted. [20]

A study by the University of Amsterdam employing a modified Turing test found that at least in some scenarios, participants were unable to distinguish poems generated by GPT-2 from those written by humans. [22]

Restrictions and partial release

While "Skub" is not a real product, even the reduced-size model used in DistilGPT2 is capable of creating plausible arguments both for and against it. GPT-2-ProSkub-AntiSkub.png
While "Skub" is not a real product, even the reduced-size model used in DistilGPT2 is capable of creating plausible arguments both for and against it.

While previous OpenAI models had been made immediately available to the public, OpenAI initially refused to make a public release of GPT-2's source code when announcing it in February, citing the risk of malicious use; [11] limited access to the model (i.e. an interface that allowed input and provided output, not the source code itself) was allowed for selected press outlets on announcement. [11] One commonly-cited justification was that, since generated text was usually completely novel, it could be used by spammers to evade automated filters; OpenAI demonstrated a version of GPT-2 fine-tuned to "generate infinite positive – or negative – reviews of products". [11]

Another justification was that GPT-2 could be used to generate text that was obscene or racist. Researchers such as Jeremy Howard warned of "the technology to totally fill Twitter, email, and the web up with reasonable-sounding, context-appropriate prose, which would drown out all other speech and be impossible to filter". [20] The Allen Institute for Artificial Intelligence, in response to GPT-2, announced a tool to detect "neural fake news". [23]

However, opinion was divided. A February 2019 article in The Verge argued that the threat posed by GPT-2 had been exaggerated; [24] Anima Anandkumar, a professor at Caltech and director of machine learning research at Nvidia, said that there was no evidence that GPT-2 had the capabilities to pose the threats described by OpenAI, and that what they did was the "opposite of open", characterizing their refusal to release the full model as "malicious BS". [24] The Gradient published an open letter to OpenAI requesting that they release the model publicly, comparing the threat posed by text-generation AI to the threat posed by the printing press, and giving Photoshop as an example of "a technology that has (thankfully) not destroyed modern society despite its potential for chaos": [25]

Thirty years later, society has emerged relatively unscathed despite Photoshop being simple enough for high school students to use and ubiquitous enough to commandeer its own verb. Why? Precisely because everyone knows about Photoshop. [25]

774M release

While OpenAI did not release the fully-trained model or the corpora it was trained on, description of their methods in prior publications (and the free availability of underlying technology) made it possible for GPT-2 to be replicated by others as free software; one such replication, OpenGPT-2, was released in August 2019, in conjunction with a freely licensed version of WebText called OpenWebText. The cloud compute costs for OpenGPT-2 were given as approximately $50,000. [26]

On August 20, 2019, OpenAI released a partial version of GPT-2, with 774 million parameters (roughly half the size of the full 1.5 billion parameter model). [6]

Full 1.5B release

Initial concerns that GPT-2 would lend itself to widespread misuse did not come to pass; The Verge said that "there are reasons to be skeptical about claims that AI technology will usher in some sort of ‘infopocalypse.’ For a start, we already have programs that can generate plausible text at high volume for little cost: humans." [27] By November 2019, OpenAI said that they had "seen no strong evidence of misuse so far", and the full version, with 1.5 billion parameters trained with forty gigabytes of data, "about eight thousand times larger than the collected works of Shakespeare", [28] was released on November 5, 2019. [3] [4]

Small and Medium Releases

Two other smaller releases of GPT-2 are available, including the small version of 117M parameters and the medium size of 355M parameters. Both are available to download from Huggingface. [29] [30]

Limitations

GPT-2 can generate thematically-appropriate text for a range of scenarios, even surreal ones like a CNN article about Donald Trump giving a speech praising the anime character Asuka Langley Soryu. Here, the tendency to generate nonsensical and repetitive text with increasing output length (even in the full 1.5B model) can be seen; in the second paragraph, grammar begins to deteriorate, and the output eventually becomes one incoherent sentence repeated over and over. GPT-2-Trump Asuka.png
GPT-2 can generate thematically-appropriate text for a range of scenarios, even surreal ones like a CNN article about Donald Trump giving a speech praising the anime character Asuka Langley Soryu. Here, the tendency to generate nonsensical and repetitive text with increasing output length (even in the full 1.5B model) can be seen; in the second paragraph, grammar begins to deteriorate, and the output eventually becomes one incoherent sentence repeated over and over.

While GPT-2's ability to generate plausible passages of natural language text were generally remarked on positively, its shortcomings were noted as well, especially when generating texts longer than a couple paragraphs; Vox said "the prose is pretty rough, there’s the occasional non-sequitur, and the articles get less coherent the longer they get". [21] The Verge similarly noted that longer samples of GPT-2 writing tended to "stray off topic" and lack overall coherence; [20] The Register opined that "a human reading it should, after a short while, realize something's up", and noted that "GPT-2 doesn't answer questions as well as other systems that rely on algorithms to extract and retrieve information." [17]

GPT-2 deployment is resource-intensive; the full version of the model is larger than five gigabytes, making it difficult to embed locally into applications, and consumes large amounts of RAM. In addition, performing a single prediction "can occupy a CPU at 100% utilization for several minutes", and even with GPU processing, "a single prediction can take seconds". [10] To alleviate these issues, the company Hugging Face created DistilGPT2, using knowledge distillation to produce a smaller model that "scores a few points lower on some quality benchmarks", but is "33% smaller and twice as fast". [10]

Application and subsequent research

Even before the release of the full version, GPT-2 was used for a variety of applications and services, as well as for entertainment. In June 2019, a subreddit named r/SubSimulatorGPT2 was created in which a variety of GPT-2 instances trained on different subreddits made posts and replied to each other's comments, creating a situation where one could observe "an AI personification of r/Bitcoin argue with the machine learning-derived spirit of r/ShittyFoodPorn"; [27] by July of that year, a GPT-2-based software program released to autocomplete lines of code in a variety of programming languages was described by users as a "game-changer". [31]

In 2019, AI Dungeon was launched, which used GPT-2 to generate dynamic text adventures based on user input. [32] AI Dungeon now offers access to the largest release of GPT-3 API as an optional paid upgrade, the free version of the site uses the 2nd largest release of GPT-3. [33] Latitude, the company formed around AI Dungeon, raised $3.3 million in seed funding in 2021. [34] Several websites host interactive demonstrations of different instances of GPT-2 and other transformer models. [35] [36] [37]

In February 2021, a crisis center for troubled teens announced that they would begin using a GPT-2-derived chatbot to help train counselors by allowing them to have conversations with simulated teens (this use was purely for internal purposes, and did not involve having GPT-2 communicate with the teens themselves). [38]

On May 9, 2023, OpenAI released a mapped version of GPT-2. OpenAI used successor model, GPT-4, to map each neuron of GPT-2 to determine their functions. [39]

Performance and evaluation

GPT-2 writing a fictional news article about Edward Snowden's actions after winning the 2020 United States presidential election (all highlighted text is machine-generated). While Snowden had (at the time of generation) never been elected to public office, the generated sample is grammatically and stylistically valid. GPT-2-PresidentSnowden.PNG
GPT-2 writing a fictional news article about Edward Snowden's actions after winning the 2020 United States presidential election (all highlighted text is machine-generated). While Snowden had (at the time of generation) never been elected to public office, the generated sample is grammatically and stylistically valid.

GPT-2 became capable of performing a variety of tasks beyond simple text production due to the breadth of its dataset and technique: answering questions, summarizing, and even translating between languages in a variety of specific domains, without being instructed in anything beyond how to predict the next word in a sequence. [20] [21]

One example of generalized learning is GPT-2's ability to perform machine translation between French and English, for which task GPT-2's performance was assessed using WMT-14 translation tasks. GPT-2's training corpus included virtually no French text; non-English text was deliberately removed while cleaning the dataset prior to training, and as a consequence, only 10MB of French of the remaining 40,000MB was available for the model to learn from (mostly from foreign-language quotations in English posts and articles). [2]

Despite this, GPT-2 achieved 5 BLEU on the WMT-14 English-to-French test set (slightly below the score of a translation via word-for-word substitution). It was also able to outperform several contemporary (2017) unsupervised machine translation baselines on the French-to-English test set, where GPT-2 achieved 11.5 BLEU. This remained below the highest-performing contemporary unsupervised approach (2019), which had achieved 33.5 BLEU. [2] However, other models used large amounts of French text to achieve these results; GPT-2 was estimated to have used a monolingual French corpus approximately 1/500 the size of comparable approaches. [2]

architectureparameter counttraining data
GPT-1 12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax.0.12 billion BookCorpus: [40] 4.5 GB of text, from 7000 unpublished books of various genres.
GPT-2GPT-1, but with modified normalization1.5 billionWebText: 40 GB [41] of text, 8 million documents, from 45 million webpages upvoted on Reddit.
GPT-3 GPT-2, but with modification to allow larger scaling.175 billion570 GB plaintext, 300 billion tokens of CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2).

GPT-2 was to be followed by the 175-billion-parameter GPT-3, [42] revealed to the public in 2020 [43] (whose source code has never been made available). Access to GPT-3 is provided exclusively through APIs offered by OpenAI and Microsoft. [44] That was then later followed by GPT-4.

Related Research Articles

A language model is a probabilistic model of a natural language. In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘Shannon-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

<span class="mw-page-title-main">OpenAI</span> Artificial intelligence research organization

OpenAI is a U.S. based artificial intelligence (AI) research organization founded in December 2015, researching artificial intelligence with the goal of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As one of the leading organizations of the AI spring, it has developed several large language models, advanced image generation models, and previously, released open-source models. Its release of ChatGPT has been credited with starting the AI spring.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Machine learning algorithm used for natural-language processing

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". It has no recurrent units, and thus requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets, such as the Wikipedia corpus and Common Crawl. Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.

Synthetic media is a catch-all term for the artificial production, manipulation, and modification of data and media by automated means, especially through the use of artificial intelligence algorithms, such as for the purpose of misleading people or changing an original meaning. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.

<i>AI Dungeon</i> Text adventure game generated by artificial intelligence

AI Dungeon is a single-player/multiplayer text adventure game which uses artificial intelligence (AI) to generate content and allows players to create and share adventures and custom prompts. The game's first version was made available in May 2019, and its second version was released on Google Colaboratory in December 2019. It was later ported that same month to its current cross-platform web application. The AI model was then reformed in July 2020.

Seq2seq is a family of machine learning approaches used for natural language processing. Applications include language translation, image captioning, conversational models, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context, float16 (16-bit) precision, and a hitherto-unprecedented 175 billion parameters, requiring 350GB of storage space as each parameter takes 2 bytes of space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.

<span class="mw-page-title-main">DALL-E</span> Image-generating deep-learning model

DALL·E, DALL·E 2, and DALL·E 3 are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions, called "prompts."

GitHub Copilot is a code completion tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021, and works best for users coding in Python, JavaScript, TypeScript, Ruby, and Go. In March 2023 GitHub announced plans for "Copilot X", which will incorporate a chatbot based on GPT-4, as well as support for voice commands, into Copilot.

<span class="mw-page-title-main">GPT-1</span> 2018 text-generating language model

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a generative pre-trained transformer.

Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

LaMDA is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year. In June 2022, LaMDA gained widespread attention when Google engineer Blake Lemoine made claims that the chatbot had become sentient. The scientific community has largely rejected Lemoine's claims, though it has led to conversations about the efficacy of the Turing test, which measures whether a computer can pass for a human. In February 2023, Google announced Bard, a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT.

<span class="mw-page-title-main">ChatGPT</span> Chatbot developed by OpenAI

ChatGPT is a chatbot developed by OpenAI and launched on November 30, 2022. Based on a large language model, it enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by OpenAI, and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot. As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">GPT-J</span> Open source artificial intelligence text generating language model developed by EleutherAI

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

<span class="mw-page-title-main">EleutherAI</span> Artificial intelligence research collective

EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 to organize a replication of GPT-3. In early 2023, it formally incorporated as the EleutherAI Foundation, a non-profit research institute.

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

<span class="mw-page-title-main">Generative artificial intelligence</span> AI system capable of generating content in response to prompts

Generative artificial intelligence is artificial intelligence capable of generating text, images or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

References

  1. "gpt-2". GitHub. Archived from the original on 11 March 2023. Retrieved 13 March 2023.
  2. 1 2 3 4 5 6 7 Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). "Language models are unsupervised multitask learners" (PDF). OpenAI. 1 (8). Archived (PDF) from the original on 6 February 2021. Retrieved 19 December 2020.
  3. 1 2 Vincent, James (7 November 2019). "OpenAI has published the text-generating AI it said was too dangerous to share". The Verge . Archived from the original on 11 June 2020. Retrieved 19 December 2020.
  4. 1 2 "GPT-2: 1.5B Release". OpenAI. 2019-11-05. Archived from the original on 2019-11-14. Retrieved 2019-11-14.
  5. Piper, Kelsey (15 May 2019). "A poetry-writing AI has just been unveiled. It's ... pretty good". Vox . Archived from the original on 7 November 2020. Retrieved 19 December 2020.
  6. 1 2 Johnson, Khari (20 August 2019). "OpenAI releases curtailed version of GPT-2 language model". VentureBeat. Archived from the original on 18 December 2020. Retrieved 19 December 2020.
  7. 1 2 "Better Language Models and Their Implications". OpenAI. 14 February 2019. Archived from the original on 19 December 2020. Retrieved 19 December 2020.
  8. 1 2 3 Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
  9. 1 2 Hegde, Chaitra; Patil, Shrikumar (9 June 2020). "Unsupervised Paraphrase Generation using Pre-trained Language Models". arXiv: 2006.05477 [cs.CL].
  10. 1 2 3 Kaiser, Caleb (31 January 2020). "Too big to deploy: How GPT-2 is breaking servers". Towards Data Science. Archived from the original on 15 February 2020. Retrieved 27 February 2021.
  11. 1 2 3 4 5 Hern, Alex (14 February 2019). "New AI fake text generator may be too dangerous to release, say creators". The Guardian . Archived from the original on 14 February 2019. Retrieved 19 December 2020.
  12. Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need" (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 30.
  13. Olah, Chris; Carter, Shan (8 September 2016). "Attention and Augmented Recurrent Neural Networks". Distill. 1 (9). doi: 10.23915/distill.00001 . Archived from the original on 22 December 2020. Retrieved 22 January 2021.
  14. Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (1 September 2014). "Neural Machine Translation by Jointly Learning to Align and Translate". arXiv: 1409.0473 [cs.CL].
  15. Luong, Minh-Thang; Pham, Hieu; Manning, Christopher D. (17 August 2015). "Effective Approaches to Attention-based Neural Machine Translation". arXiv: 1508.04025 [cs.CL].
  16. 1 2 Trinh, Trieu H.; Le, Quoc V. (7 Jun 2018). "A Simple Method for Commonsense Reasoning". arXiv: 1806.02847 [cs.CL].
  17. 1 2 Quach, Katyanna (14 February 2019). "Roses are red, this is sublime: We fed OpenAI's latest chat bot a classic Reg headline". The Register. Archived from the original on 9 March 2021. Retrieved 27 February 2021.
  18. 1 2 "The Staggering Cost of Training SOTA AI Models". Synced. 27 June 2019. Archived from the original on 24 November 2020. Retrieved 27 February 2021.
  19. Wiggers, Kyle (23 March 2020). "Google open-sources framework that reduces AI training costs by up to 80%". VentureBeat. Archived from the original on 26 November 2020. Retrieved 27 February 2021.
  20. 1 2 3 4 5 6 Vincent, James (14 February 2019). "OpenAI's new multitalented AI writes, translates, and slanders". The Verge . Archived from the original on 18 December 2020. Retrieved 19 December 2020.
  21. 1 2 3 Piper, Kelsey (14 February 2019). "An AI helped us write this article". Vox . Archived from the original on 8 November 2020. Retrieved 19 December 2020.
  22. Köbis, Nils; Mossink, Luca D. (1 January 2021). "Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry". Computers in Human Behavior. 114: 106553. doi: 10.1016/j.chb.2020.106553 . hdl: 21.11116/0000-0007-13E5-1 .
  23. Schwartz, Oscar (4 July 2019). "Could 'fake text' be the next global political threat?". The Guardian. Archived from the original on 16 July 2019. Retrieved 16 July 2019.
  24. 1 2 Vincent, James (21 February 2019). "AI researchers debate the ethics of sharing potentially harmful programs". The Verge. Archived from the original on 9 February 2021. Retrieved 27 February 2021.
  25. 1 2 Zhang, Hugh (19 February 2019). "OpenAI: Please Open Source Your Language Model". The Gradient. Archived from the original on 28 January 2021. Retrieved 28 February 2021.
  26. Gokaslan, Aaron; Cohen, Vanya; Pavlick, Ellie; Tellex, Stefanie (22 August 2019). "OpenGPT-2: We Replicated GPT-2 Because You Can Too". Noteworthy. Archived from the original on 29 April 2023. Retrieved 27 February 2021.
  27. 1 2 Vincent, James (6 June 2019). "There's a subreddit populated entirely by AI personifications of other subreddits". The Verge. Archived from the original on 21 February 2021. Retrieved 27 February 2021.
  28. Murati, Ermira (2022-04-13). "Language & Coding Creativity | American Academy of Arts and Sciences". www.amacad.org. Retrieved 2024-03-18.
  29. "GPT-2 Small".
  30. GPT-2 Medium. "Openai-community/Gpt2-medium · Hugging Face".{{cite web}}: CS1 maint: numeric names: authors list (link)
  31. Vincent, James (24 July 2019). "This AI-powered autocompletion software is Gmail's Smart Compose for coders". The Verge. Archived from the original on 9 March 2021. Retrieved 27 February 2021.
  32. Olson, Mathew (17 December 2019). "AI Dungeon 2, the Text Adventure Where You Can do Nearly Anything, Is Now on Mobile". Archived from the original on 20 September 2020. Retrieved 27 February 2021.
  33. Nelius, Joanna (3 August 2020). "This AI-Powered Choose-Your-Own-Adventure Text Game Is Super Fun and Makes No Sense". Gizmodo. Archived from the original on 28 February 2021. Retrieved 27 February 2021.
  34. Ha, Anthony (4 February 2021). "AI Dungeon-maker Latitude raises $3.3M to build games with 'infinite' story possibilities". TechCrunch. Archived from the original on 21 February 2021. Retrieved 27 February 2021.
  35. "Write With Transformer". Archived from the original on December 4, 2019. Retrieved December 4, 2019.
  36. "Talk to Transformer". Archived from the original on December 4, 2019. Retrieved December 4, 2019.
  37. "CreativeEngines". Archived from the original on February 3, 2023. Retrieved June 25, 2021.
  38. Ohlheiser, Abby; Hao, Karen (26 February 2021). "An AI is training counselors to deal with teens in crisis". MIT Technology Review. Archived from the original on 27 February 2021. Retrieved 27 February 2021.
  39. "Language models can explain neurons in language models". OpenAI. Retrieved 13 May 2023.
  40. Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". International Conference on Computer Vision 2015: 19–27. arXiv: 1506.06724 . Archived from the original on 2023-02-05. Retrieved 2023-02-05.
  41. Murati, Ermira (2022-04-13). "Language & Coding Creativity | American Academy of Arts and Sciences". www.amacad.org. Retrieved 2024-03-18.
  42. Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (July 22, 2020). "Language Models are Few-Shot Learners". arXiv: 2005.14165 [cs.CL].
  43. Arram (July 9, 2020). "GPT-3: An AI that's eerily good at writing almost anything". Arram Sabeti. Archived from the original on July 20, 2020. Retrieved July 31, 2020.
  44. Hao, Karen (September 23, 2020). "OpenAI is giving Microsoft exclusive access to its GPT-3 language model". MIT Technology Review . Archived from the original on 2021-02-05. Retrieved 2020-09-25. The companies say OpenAI will continue to offer its public-facing API, which allows chosen users to send text to GPT-3 or OpenAI's other models and receive its output. Only Microsoft, however, will have access to GPT-3's underlying code, allowing it to embed, repurpose, and modify the model as it pleases.