Gemma | |
---|---|
Developer(s) | Google DeepMind |
Initial release | February 21, 2024 [1] |
Stable release | Gemma 3 / March 12, 2025 [2] |
Type | Large language model |
License | Gemma License |
Website | deepmind |
Gemma is a series of open-source large language models developed by Google DeepMind. It is based on similar technologies as Gemini. The first version was released in February 2024, followed by Gemma 2 in June 2024 and Gemma 3 in March 2025. Variants of Gemma have also been developed, such as the vision-language model PaliGemma and the model DolphinGemma for understanding dolphin communication.
In February 2024, Google debuted Gemma, a family of free and open-source LLMs that serve as a lightweight version of Gemini. They come in two sizes, with a neural network with two and seven billion parameters, respectively. Multiple publications viewed this as a response to Meta and others open-sourcing their AI models, and a stark reversal from Google's longstanding practice of keeping its AI proprietary. [3] [4] [5]
Gemma 2 was released on June 27, 2024, [6] and Gemma 3 was released on March 12, 2025. [2] [7]
Based on similar technologies as the Gemini series of models, Gemma is described by Google as helping support its mission of "making AI helpful for everyone." [8] Google offers official Gemma variants optimized for specific use cases, such as MedGemma for medical analysis and DolphinGemma for studying dolphin communication. [9]
Since its release, Gemma models have had over 150 million downloads, with 70,000 variants available on Hugging Face. [10]
The latest generation of models is Gemma 3, offered in 1, 4, 12, and 27 billion parameter sizes with support for over 140 languages. As multimodal models, they support both text and image input. [11] Google also offers Gemma 3n, smaller models optimized for execution on consumer devices like phones, laptops, and tablets. [12]
The latest version of Gemma, Gemma 3, is based on a decoder-only transformer architecture with grouped-query attention (GQA) and the SigLIP vision encoder. Every model has a context length of 128K, with the exception of Gemma 3 1B, which has a context length of 32K. [13]
Quantized versions fine-tuned using quantization-aware training (QAT) are also available, [13] offering sizable memory usage improvements with some negative impact on accuracy and precision. [14]
Google develops official variants of Gemma models designed for specific purposes, like medical analysis or programming. These include:
Generation | Release date | Parameters | Context length | Multimodal | Notes |
---|---|---|---|---|---|
Gemma 1 | 21 February 2024 | 2B, 7B | 8,192 | No | 2B distilled from 7B. 2B uses multi-query attention while 7B uses multi-head attention. |
CodeGemma | 2B, 7B | 8,192 | No | Gemma 1 finetuned for code generation. | |
RecurrentGemma | 11 April 2024 | 2B, 9B | Unlimited (trained on 8,192) | No | Griffin-based, instead of Transformer-based. [21] |
Gemma 2 | 27 June 2024 | 2B, 9B, 27B | 8,192 | No | 27B trained from web documents, code, science articles. Gemma 2 9B was distilled from 27B. Gemma 2 2B was distilled from a 7B model that remained unreleased. Uses Grouped-Query Attention. [22] |
PaliGemma | 10 July 2024 | 3B | 8,192 | Image | A vision-language model that takes text and image inputs, and outputs text. It is made by connecting a SigLIP-So400m image encoder with Gemma v1.0 2B. [23] [24] |
PaliGemma 2 | 4 December 2024 | 3B, 10B, 28B | 8,192 | Image | Made by mating SigLIP-4o400m with Gemma v2.0 2B, 9B, and 27B. Capable of more vision-language tasks. [25] [26] |
Gemma 3 | 12 March 2025 | 1B, 4B, 12B, 27B | 131,072 | Image | All models trained with distillation. Post-training focuses on math, coding, chat, instruction following, and multilingual (supports 140 languages). Capable of function calling. 1B is not capable of vision. [27] |
Note: open-weight models can have their context length rescaled at inference time. With Gemma 1, Gemma 2, PaliGemma, and PaliGemma 2, the cost is a linear increase of kv-cache size relative to context window size. With Gemma 3 there is an improved growth curve due to the separation of local and global attention. With RecurrentGemma the memory use is unchanged after 2,048 tokens.