GPT-J

Last updated
GPT-J
Developer(s) EleutherAI
Initial releaseJune 9, 2021;2 years ago (2021-06-09)
Type
License Open-source
Website 6b.eleuther.ai   OOjs UI icon edit-ltr-progressive.svg

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. [1] As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters. [2]

Contents

Architecture

GPT-J is a GPT-3-like model with 6 billion parameters. [3] Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue. [1]

Its architecture differs from GPT-3 in three main ways. [1]

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's. [2] It has a context window size of 2048 tokens. [6]

It was trained on the Pile dataset, [2] [3] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme. [2] [7]

Performance

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task. [2] Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French). [8]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks. [3] It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks. [9] [10] With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks. [1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability. [2]

Applications

The untuned GPT-J is available on EleutherAI's website, [11] NVIDIA's Triton Inference Server, [12] and NLP Cloud's website. [13] Cerebras [1] and Amazon Web Services [14] [15] offer services to fine-tune the GPT-J model for company-specific tasks. Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced. [16] CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants. [17] [18]

In March 2023, Databricks released Dolly, an Apache-licensed, instruction-following model created by fine-tuning GPT-J on the Stanford Alpaca dataset. [19] NovelAI's Sigurd [20] and Genji-JP 6B [21] models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models. [22]

EleutherAI has received praise from Cerebras, [1] GPT-3 Demo, [3] NLP Cloud, [13] and Databricks [19] for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use. [10] [16] [23]

Related Research Articles

A language model is a probabilistic model of a natural language. In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘Shannon-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.

<span class="mw-page-title-main">Databricks</span> American software company

Databricks, Inc. is a global data, analytics and artificial intelligence company founded by the original creators of Apache Spark.

<span class="mw-page-title-main">Transformer (deep learning architecture)</span> Machine learning algorithm used for natural-language processing

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.

Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context, float16 (16-bit) precision, and a hitherto-unprecedented 175 billion parameters, requiring 350GB of storage space as each parameter takes 2 bytes of space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.

<span class="mw-page-title-main">Cerebras</span> American semiconductor company

Cerebras Systems Inc. is an American artificial intelligence company with offices in Sunnyvale and San Diego, Toronto, Tokyo and Bangalore, India. Cerebras builds computer systems for complex artificial intelligence deep learning applications.

<span class="mw-page-title-main">GPT-2</span> 2019 text-generating language model

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

<span class="mw-page-title-main">GPT-1</span> 2018 text-generating language model

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a generative pre-trained transformer.

Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

Hugging Face, Inc. is a French-American company based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

<span class="mw-page-title-main">NovelAI</span> Online service for AI media creation

NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware.

<span class="mw-page-title-main">Generative pre-trained transformer</span> Type of large language model

Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent framework for generative artificial intelligence. They are artificial neural networks that are used in natural language processing tasks. GPTs are based on the transformer architecture, pre-trained on large data sets of unlabelled text, and able to generate novel human-like content. As of 2023, most LLMs have these characteristics and are sometimes referred to broadly as GPTs.

<span class="mw-page-title-main">EleutherAI</span> Artificial intelligence research collective

EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 to organize a replication of GPT-3. In early 2023, it formally incorporated as the EleutherAI Foundation, a non-profit research institute.

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

In deep learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data. Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen". A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter–efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.

The Pile is an 886.03 GB diverse, open-source dataset of English text created as a training dataset for large language models (LLMs). It was constructed by EleutherAI in 2020 and publicly released on December 31 of that year. It is composed of 22 smaller datasets, including 14 new ones.

LLaMA is a family of autoregressive large language models (LLMs), released by Meta AI starting in February 2023.

Open-source artificial intelligence is the application of open-source practices to the development of artificial intelligence resources.

Generative Pre-trained Transformer 4Chan (GPT4-Chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for its hateful and extremist content.

Mistral AI is a French company selling artificial intelligence (AI) products. It was founded in April 2023 by previous employees of Meta Platforms and Google DeepMind. The company raised €385 million in October 2023, and in December 2023, it was valued at more than $2 billion.

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.

References

  1. 1 2 3 4 5 6 Vassilieva, Natalia (22 June 2022). "Cerebras Makes It Easy to Harness the Predictive Power of GPT-J". Cerebras . Retrieved 14 June 2023.
  2. 1 2 3 4 5 6 "GPT-J 6B". Hugging Face . Retrieved 13 June 2023.
  3. 1 2 3 4 "GPT-J". GPT-3 Demo. Retrieved 13 June 2023.
  4. Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021). "Rotary Embeddings: A Relative Revolution". EleutherAI . Retrieved 14 June 2023. In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
  5. Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv: 2104.09864 [cs.CL].
  6. "GPT-J". GitHub . Hugging Face . Retrieved 23 June 2023.
  7. Wang, Ben; Komatsuzaki, Aran (May 2021). "Mesh Transformer JAX". GitHub . Retrieved 13 June 2023.
  8. Forefront (14 October 2021). "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". Medium . Forefront. Retrieved 13 June 2023.
  9. Mueller, Vincent (26 August 2021). "How you can use GPT-J". Medium . Retrieved 23 June 2023.
  10. 1 2 "GPT-J Reviews". Slashdot . Retrieved 23 June 2023.
  11. "Test the EAI models". EleutherAI . 2021. Retrieved 30 June 2023.
  12. Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022). "Deploying GPT-J and T5 with NVIDIA Triton Inference Server". NVIDIA . Retrieved 30 June 2023.
  13. 1 2 Vettier, Pauline (16 September 2021). "NLP Cloud now supports GPT-J, the open-source GPT-3 alternative" (Press release). Grenoble, France: NLP Cloud. Retrieved 30 June 2023.
  14. Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023). "Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library". Amazon Web Services . Retrieved 30 June 2023.
  15. Schmid, Philipp (11 January 2022). "Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker". Hugging Face . Retrieved 30 June 2023.
  16. 1 2 Liguori, Sofia (9 June 2023). "Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks". Graphcore . Retrieved 23 June 2023.
  17. "GPT-J-6B". CoreWeave. 23 June 2023. Retrieved 30 June 2023.
  18. Hjelm, Max. "CoreWeave Powers a World of Possibility with GPT-J". CoreWeave. Retrieved 30 June 2023.
  19. 1 2 Conover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023). "Hello Dolly: Democratizing the magic of ChatGPT with open models". Databricks . Retrieved 18 June 2023.
  20. NovelAI (9 May 2022). "The faces of NovelAI's AI Models: Part 1". Medium . Retrieved 1 July 2023.
  21. NovelAI (3 November 2021). "Data Efficient Language Transfer with GPT-J". Medium . Retrieved 1 July 2023.
  22. NovelAI (29 July 2021). "Introducing Custom AI Modules". Medium . Retrieved 1 July 2023.
  23. Shiraly, Karthik (26 February 2023). "See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks". Width.ai. Retrieved 23 June 2023.