Jamba (language model)

Jamba
Developer(s)	AI21 Labs
Initial release	28 March 2024
Type	Large language model ; Generative pre-trained transformer ; Mamba (deep learning architecture) ; Mixture of experts ; Foundation model ;
License	Apache 2.0 License

Last updated July 02, 2024

Jamba is an open-weights large language model (LLM) developed by AI21 Labs.^[1]^[2] It utilizes a Mamba-based model built on a novel state space model (SSM) and transformer hybrid architecture.^[3]^[1]^[4] It is a 52 billion parameter model trained using a mixture-of-experts (MoE) technique with 12B active parameters (number of parameters active per token).^[2]^[1] Jamba can fit up to 256K tokens in its context window and is the largest Mamba-variant LLM created, or 140k tokens in a single 80GB GPU.^[2]^[3]

Jamba performs well across a number of key measurements including throughput and efficiency while outperforming or matching other state-of-the-art models in its class on a wide range of performance benchmarks while having significantly greater context limits enabling use-cases that require increased context.^[1]^[2] The model is released with open weights under an Apache 2.0 license.^[5]^[4]

The company plans to release a beta-version instruct-tuned version on the AI21 Platform in the near future.^[6]

Characteristics

Context window size: 256k tokens^[6]
Parameters: 52 billion^[6]
Architecture: Hybrid Mamba (SSM) Transformer using Mixture of Experts (MoE)^[6]

Related Research Articles

Multimodal learning, in the context of machine learning, is a type of deep learning using multiple modalities of data, such as text, audio, or images.

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

Mixture of experts (MoE) is a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from ensemble techniques in that for MoE, typically only one or a few expert models are run for each input, whereas in ensemble techniques, all models are run on every input.

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation, and the Fast Weight Controller, similar to a transformer, proposed in 1992.

Bidirectional Encoder Representations from Transformers (BERT) is a language model based on the transformer architecture, notable for its dramatic improvement over previous state of the art models. It was introduced in October 2018 by researchers at Google. A 2020 literature survey concluded that "in a little over a year, BERT has become a ubiquitous baseline in Natural Language Processing (NLP) experiments counting over 150 research publications analyzing and improving the model."

The machine learning-based attention method simulates how human attention works by assigning varying levels of importance to different words in a sentence. It assigns importance to each word by calculating "soft" weights for the word's numerical representation, known as its embedding, within a specific section of the sentence called the context window to determine its importance. The calculation of these weights can occur simultaneously in models called transformers, or one by one in models known as recurrent neural networks. Unlike "hard" weights, which are predetermined and fixed during training, "soft" weights can adapt and change with each use of the model.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

AI21 Labs is an Israeli company specializing in Natural Language Processing (NLP), which develops AI systems that can understand and generate natural language.

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.

A large language model (LLM) is a computational model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification. Based on language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a computationally intensive self-supervised and semi-supervised training process. LLMs can be used for text generation, a form of generative AI, by taking an input text and repeatedly predicting the next token or word.

Llama is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama 3, released in April 2024.

PaLM is a 540 billion parameter transformer-based large language model developed by Google AI. Researchers also trained smaller versions of PaLM, 8 and 62 billion parameter models, to test the effects of model scale.

Wordtune is an AI powered reading and writing companion capable of fixing grammatical errors, understanding context and meaning, suggesting paraphrases or alternative writing tones, and generating written text based on context. It is developed by the Israeli AI company AI21 Labs.

In machine learning, a neural scaling law is a scaling law relating parameters of a family of neural networks.

Google Gemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, Gemini Flash, and Gemini Nano, it was announced on December 6, 2023, positioned as a competitor to OpenAI's GPT-4. It powers the chatbot of the same name.

Mistral AI is a French company specializing in artificial intelligence (AI) products. Founded in April 2023 by former employees of Meta Platforms and Google DeepMind, the company has quickly risen to prominence in the AI sector.

Mamba is a deep learning architecture focused on sequence modeling. It was developed by researchers from Carnegie Mellon University and Princeton University to address some limitations of transformer models, especially in processing long sequences. It is based on the Structured State Space sequence (S4) model.

Claude is a family of large language models developed by Anthropic. The first model was released in March 2023. Claude 3, released in March 2024, can also analyze images.

Huawei PanGu, PanGu, PanGu-Σ or PanGu-π is a multimodal large language model developed by Huawei. It was announced on July 7, 2023, positioned as a contender to other multimodal large language models.

DBRX is an open-sourced large language model (LLM) developed by Mosaic ML team at Databricks, released on March 27, 2024. It is a mixture-of-experts Transformer model, with 132 billion parameters in total. 36 billion parameters are active for each token. The released model comes in either a base foundation model version or an instruct-tuned variant.

References

1 2 3 4 "Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model". www.ai21.com. Retrieved 2024-03-29.
1 2 3 4 Kerner, Sean Michael (2024-03-28). "AI21 Labs juices up gen AI transformers with Jamba". VentureBeat. Retrieved 2024-03-29.
1 2 "AI21 Labs' Jamba infuses Mamba to bring more context to transformer-based LLMs". SiliconANGLE. 2024-03-28. Retrieved 2024-03-29.
1 2 "MLTimes - Time To Learn AI". mltimes.se. Retrieved 2024-03-29.
↑ AI21. "Unveiling Jamba: AI21's Groundbreaking Hybrid SSM-Transformer Open-Source Model". www.prnewswire.com. Retrieved 2024-03-29.{{cite web}}: CS1 maint: numeric names: authors list (link)
1 2 3 4 "AI21 Labs enhances the capabilities of gen AI transformers through Jamba integration". Global Village Space | Technology. 2024-03-28. Retrieved 2024-03-29.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 3 4 "Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model". www.ai21.com. Retrieved 2024-03-29.

[:1-2] 1 2 3 4 Kerner, Sean Michael (2024-03-28). "AI21 Labs juices up gen AI transformers with Jamba". VentureBeat. Retrieved 2024-03-29.

[:3-3] 1 2 "AI21 Labs' Jamba infuses Mamba to bring more context to transformer-based LLMs". SiliconANGLE. 2024-03-28. Retrieved 2024-03-29.

[:2-4] 1 2 "MLTimes - Time To Learn AI". mltimes.se. Retrieved 2024-03-29.

[5] AI21. "Unveiling Jamba: AI21's Groundbreaking Hybrid SSM-Transformer Open-Source Model". www.prnewswire.com. Retrieved 2024-03-29.{{cite web}}: CS1 maint: numeric names: authors list (link)

[:4-6] 1 2 3 4 "AI21 Labs enhances the capabilities of gen AI transformers through Jamba integration". Global Village Space | Technology. 2024-03-28. Retrieved 2024-03-29.

[1]

[2]

[3]

[4]

[5]

[6]

Jamba (language model)

Contents

Characteristics

See also

Related Research Articles

References