![]() An image generated with Imagen 3. Partial prompt: Softly illuminated afternoon valley with meandering river | |
Developer(s) | Google DeepMind |
---|---|
Stable release | Imagen 3 / 13 August 2024 |
Type | Text-to-image model |
Website | deepmind |
Part of a series on |
Artificial intelligence (AI) |
---|
![]() |
Imagen, Imagen 2, and Imagen 3 are text-to-image models developed by Google DeepMind. They were developed by Google Brain until the company's merger with DeepMind in April 2023. [1] Imagen is primarily used to generate images from text prompts, similar to Stability AI's Stable Diffusion, OpenAI's DALL-E, or Midjourney.
The original version of the model was first discussed in a paper from May 2022. [2] The tool produces high-quality images and is available to all users with a Google account through services including Gemini, ImageFX, and Vertex AI. [3]
Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language. [2] The second version, Imagen 2 was released in December 2023. [4] The standout feature was text and logo generation. [5] Imagen 3 was released in August 2024. [6] Google claims that the newest version provides better detail and lighting on generated images. [7]
Imagen uses two key technologies. The first is the use of transformer-based large language models, notably T5, to understand text and subsequently encode text for image synthesis. The second is the use of diffusion models that provide high-fidelity image generation. [2]
Imagen can generate photorealistic images from text prompts. [3] . It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts. [7]