GPT-J

Last updated
GPT-J
Developer(s) EleutherAI
Initial releaseJune 9, 2021;4 years ago (2021-06-09)
Type
License Apache License 2.0
Website 6b.eleuther.ai   OOjs UI icon edit-ltr-progressive.svg

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. [1] As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters. [2] The model is available on GitHub, but the web interface no longer communicates with the model. Development stopped in 2021. [3]

Contents

Architecture

GPT-J is a GPT-3-like model with 6 billion parameters. [4] Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue. [1]

Its architecture differs from GPT-3 in three main ways. [1]

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's. [2] It has a context window size of 2048 tokens. [7]

It was trained on the Pile dataset, [2] [4] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme. [2] [8]

Performance

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task. [2] Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French). [9]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks. [4] It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks. [10] With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks. [1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability. [2]

Applications

The untuned GPT-J is available on EleutherAI's website, [11] NVIDIA's Triton Inference Server, [12] and NLP Cloud's website. [13] Cerebras [1] and Amazon Web Services [14] [15] offer services to fine-tune the GPT-J model for company-specific tasks. Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced. [16] CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants. [17] [18]

In March 2023, Databricks released Dolly, an Apache-licensed, instruction-following model created by fine-tuning GPT-J on the Stanford Alpaca dataset. [19] NovelAI's Sigurd [20] and Genji-JP 6B [21] models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models. [22]

EleutherAI has received praise from Cerebras, [1] GPT-3 Demo, [4] NLP Cloud, [13] and Databricks [19] for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use. [10] [16] [23]

References

  1. 1 2 3 4 5 6 Vassilieva, Natalia (22 June 2022). "Cerebras Makes It Easy to Harness the Predictive Power of GPT-J". Cerebras . Retrieved 14 June 2023.
  2. 1 2 3 4 5 6 "GPT-J 6B". Hugging Face. 3 May 2023. Retrieved 13 June 2023.
  3. Wang, Ben (2025-01-25), kingoflolz/mesh-transformer-jax , retrieved 2025-01-27
  4. 1 2 3 4 "GPT-J". GPT-3 Demo. Retrieved 13 June 2023.
  5. Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021). "Rotary Embeddings: A Relative Revolution". EleutherAI . Retrieved 14 June 2023. In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
  6. Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv: 2104.09864 [cs.CL].
  7. "GPT-J". GitHub . Hugging Face . Retrieved 23 June 2023.
  8. Wang, Ben; Komatsuzaki, Aran (May 2021). "Mesh Transformer JAX". GitHub . Retrieved 13 June 2023.
  9. Forefront (14 October 2021). "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". Medium . Forefront. Retrieved 13 June 2023.
  10. 1 2 "GPT-J Reviews". Slashdot . Retrieved 23 June 2023.
  11. "Test the EAI models". EleutherAI . 2021. Retrieved 30 June 2023.
  12. Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022). "Deploying GPT-J and T5 with NVIDIA Triton Inference Server". NVIDIA . Retrieved 30 June 2023.
  13. 1 2 Vettier, Pauline (16 September 2021). "NLP Cloud now supports GPT-J, the open-source GPT-3 alternative" (Press release). Grenoble, France: NLP Cloud. Retrieved 30 June 2023.
  14. Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023). "Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library". Amazon Web Services . Retrieved 30 June 2023.
  15. Schmid, Philipp (11 January 2022). "Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker". Hugging Face . Retrieved 30 June 2023.
  16. 1 2 Liguori, Sofia (9 June 2023). "Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks". Graphcore . Retrieved 23 June 2023.
  17. "GPT-J-6B". CoreWeave. 23 June 2023. Retrieved 30 June 2023.
  18. Hjelm, Max. "CoreWeave Powers a World of Possibility with GPT-J". CoreWeave. Retrieved 30 June 2023.
  19. 1 2 Conover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023). "Hello Dolly: Democratizing the magic of ChatGPT with open models". Databricks . Retrieved 18 June 2023.
  20. NovelAI (9 May 2022). "The faces of NovelAI's AI Models: Part 1". Medium . Retrieved 1 July 2023.
  21. NovelAI (3 November 2021). "Data Efficient Language Transfer with GPT-J". Medium . Retrieved 1 July 2023.
  22. NovelAI (29 July 2021). "Introducing Custom AI Modules". Medium . Retrieved 1 July 2023.
  23. Shiraly, Karthik (26 February 2023). "See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks". Width.ai. Retrieved 23 June 2023.