GPT-J

Last updated
GPT-J
Developer(s) EleutherAI
Initial releaseJune 9, 2021;3 years ago (2021-06-09)
Type
License Open-source
Website 6b.eleuther.ai   OOjs UI icon edit-ltr-progressive.svg

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. [1] As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters. [2]

Contents

Architecture

GPT-J is a GPT-3-like model with 6 billion parameters. [3] Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue. [1]

Its architecture differs from GPT-3 in three main ways. [1]

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's. [2] It has a context window size of 2048 tokens. [6]

It was trained on the Pile dataset, [2] [3] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme. [2] [7]

Performance

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task. [2] Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French). [8]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks. [3] It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks. [9] With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks. [1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability. [2]

Applications

The untuned GPT-J is available on EleutherAI's website, [10] NVIDIA's Triton Inference Server, [11] and NLP Cloud's website. [12] Cerebras [1] and Amazon Web Services [13] [14] offer services to fine-tune the GPT-J model for company-specific tasks. Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced. [15] CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants. [16] [17]

In March 2023, Databricks released Dolly, an Apache-licensed, instruction-following model created by fine-tuning GPT-J on the Stanford Alpaca dataset. [18] NovelAI's Sigurd [19] and Genji-JP 6B [20] models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models. [21]

EleutherAI has received praise from Cerebras, [1] GPT-3 Demo, [3] NLP Cloud, [12] and Databricks [18] for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use. [9] [15] [22]

Related Research Articles

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company founded by the original creators of Apache Spark.

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Deep learning architecture for modelling sequential data

A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism, proposed in the 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. It is notable for its dramatic improvement over previous state-of-the-art models, and as an early example of a large language model. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.

<span class="mw-page-title-main">Cerebras</span> American semiconductor company

Cerebras Systems Inc. is an American artificial intelligence (AI) company with offices in Sunnyvale, San Diego, Toronto, and Bangalore, India. Cerebras builds computer systems for complex AI deep learning applications.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

<span class="mw-page-title-main">GPT-1</span> 2018 text-generating language model

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a generative pre-trained transformer.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem in the style of Edgar Allan Poe about leaves falling", or a longer statement including context, instructions, and conversation history.

Hugging Face, Inc. is an American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">Stable Diffusion</span> Image-generating machine learning model

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an artificial neural network that is used in natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content. As of 2023, most LLMs had these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">EleutherAI</span> Artificial intelligence research collective

EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 by Connor Leahy, Sid Black, and Leo Gao to organize a replication of GPT-3. In early 2023, it formally incorporated as the EleutherAI Institute, a non-profit research institute.

A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.

In deep learning, fine-tuning is an approach to transfer learning in which the parameters of a pre-trained neural network model are trained on new data. Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen". A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter-efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.

The Pile is an 886.03 GB diverse, open-source dataset of English text created as a training dataset for large language models (LLMs). It was constructed by EleutherAI in 2020 and publicly released on December 31 of that year. It is composed of 22 smaller datasets, including 14 new ones.

Llama is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3.2, released in September 2024.

Ashish Vaswani is a computer scientist working in deep learning, who is known for his significant contributions to the field of artificial intelligence (AI) and natural language processing (NLP). He is one of the co-authors of the seminal paper "Attention Is All You Need" which introduced the Transformer model, a novel architecture that uses a self-attention mechanism and has since become foundational to many state-of-the-art models in NLP. Transformer architecture is the core of language models that power applications such as ChatGPT. He was a co-founder of Adept AI Labs and a former staff research scientist at Google Brain.

Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for hosting hateful and extremist content.

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.

References

  1. 1 2 3 4 5 6 Vassilieva, Natalia (22 June 2022). "Cerebras Makes It Easy to Harness the Predictive Power of GPT-J". Cerebras . Retrieved 14 June 2023.
  2. 1 2 3 4 5 6 "GPT-J 6B". Hugging Face. 3 May 2023. Retrieved 13 June 2023.
  3. 1 2 3 4 "GPT-J". GPT-3 Demo. Retrieved 13 June 2023.
  4. Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021). "Rotary Embeddings: A Relative Revolution". EleutherAI . Retrieved 14 June 2023. In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
  5. Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv: 2104.09864 [cs.CL].
  6. "GPT-J". GitHub . Hugging Face . Retrieved 23 June 2023.
  7. Wang, Ben; Komatsuzaki, Aran (May 2021). "Mesh Transformer JAX". GitHub . Retrieved 13 June 2023.
  8. Forefront (14 October 2021). "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". Medium . Forefront. Retrieved 13 June 2023.
  9. 1 2 "GPT-J Reviews". Slashdot . Retrieved 23 June 2023.
  10. "Test the EAI models". EleutherAI . 2021. Retrieved 30 June 2023.
  11. Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022). "Deploying GPT-J and T5 with NVIDIA Triton Inference Server". NVIDIA . Retrieved 30 June 2023.
  12. 1 2 Vettier, Pauline (16 September 2021). "NLP Cloud now supports GPT-J, the open-source GPT-3 alternative" (Press release). Grenoble, France: NLP Cloud. Retrieved 30 June 2023.
  13. Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023). "Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library". Amazon Web Services . Retrieved 30 June 2023.
  14. Schmid, Philipp (11 January 2022). "Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker". Hugging Face . Retrieved 30 June 2023.
  15. 1 2 Liguori, Sofia (9 June 2023). "Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks". Graphcore . Retrieved 23 June 2023.
  16. "GPT-J-6B". CoreWeave. 23 June 2023. Retrieved 30 June 2023.
  17. Hjelm, Max. "CoreWeave Powers a World of Possibility with GPT-J". CoreWeave. Retrieved 30 June 2023.
  18. 1 2 Conover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023). "Hello Dolly: Democratizing the magic of ChatGPT with open models". Databricks . Retrieved 18 June 2023.
  19. NovelAI (9 May 2022). "The faces of NovelAI's AI Models: Part 1". Medium . Retrieved 1 July 2023.
  20. NovelAI (3 November 2021). "Data Efficient Language Transfer with GPT-J". Medium . Retrieved 1 July 2023.
  21. NovelAI (29 July 2021). "Introducing Custom AI Modules". Medium . Retrieved 1 July 2023.
  22. Shiraly, Karthik (26 February 2023). "See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks". Width.ai. Retrieved 23 June 2023.