This article contains promotional content .(December 2025) |
| GPT-J | |
|---|---|
| Logo | |
| Developer | EleutherAI |
| Initial release | June 9, 2021 |
| Type | |
| License | Apache License 2.0 |
| Website | 6b |
GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. [1] As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters. [2] The model is available on GitHub, but the web interface no longer communicates with the model. Development stopped in 2021. [3]
GPT-J is a GPT-3-like model with 6 billion parameters. [4] Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue. [1]
Its architecture differs from GPT-3 in three main ways. [1]
Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's. [2] It has a context window size of 2048 tokens. [7]
It was trained on the Pile dataset, [2] [4] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme. [2] [8]
GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task. [2]
When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks. [4] It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks. [9] With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks. [1]
Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability. [2]
The untuned GPT-J is available on EleutherAI's website, [10] NVIDIA's Triton Inference Server, [11] and NLP Cloud's website. [12] Cerebras [1] and Amazon Web Services [13] [14] offer services to fine-tune the GPT-J model for company-specific tasks. Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced. [15] CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants. [16] [17]
In March 2023, Databricks released Dolly, an Apache-licensed, instruction-following model created by fine-tuning GPT-J on the Stanford Alpaca dataset. [18] NovelAI's Sigurd [19] and Genji-JP 6B [20] models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models. [21]
EleutherAI has received praise from Cerebras, [1] GPT-3 Demo, [4] NLP Cloud, [12] and Databricks [18] for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use. [9] [15] [22]
In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.