Jump to content

Imagen (text-to-image model)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Unis Tankos (talk | contribs) at 16:47, 21 May 2025 (History). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Imagen
Developer(s)Google DeepMind
Stable release
Imagen 4 / 20 May 2025; 15 days ago (2025-05-20)
TypeText-to-image model
WebsiteImagen website

Imagen, Imagen 2, and Imagen 3 are text-to-image models developed by Google DeepMind. They were developed by Google Brain until the company's merger with DeepMind in April 2023.[1] Imagen is primarily used to generate images from text prompts, similar to Stability AI's Stable Diffusion, OpenAI's DALL-E, or Midjourney.

The original version of the model was first discussed in a paper from May 2022.[2] The tool produces high-quality images and is available to all users with a Google account through services including Gemini, ImageFX, and Vertex AI.[3]

History

Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language.[2] The second version, Imagen 2 was released in December 2023.[4] The standout feature was text and logo generation.[5] Imagen 3 was released in August 2024.[6] Google claims that the newest version provides better detail and lighting on generated images.[7] On 7 May 2025, Mistral AI released Mistral Medium 3.[3]

Imagen 4

On 20 May 2025, Google DeepMind is rolling out a new image-generating AI model, Imagen 4, that the company claims delivers higher-quality results than its previous image generator, Imagen 3.

Unveiled at Google I/O 2025 on Tuesday, Imagen 4 is capable of rendering “fine details” like fabrics, water droplets, and animal fur, Google says. The model can handle both photorealistic and abstract styles, creating images in a range of aspect ratios and up to 2K resolution.

“Imagen 4 is a huge step forward in quality,” Josh Woodward, who leads Google’s Labs group, said during a press briefing. “We’ve also paid a lot of attention and fixes around how it generates text and topography, so it’s wonderful for creating slides or invitations, or any other thing where you might need to blend imagery and text.”

According to Google, Imagen 4 is fast — faster than Imagen 3. And it’ll soon get faster. In the near future, Google plans to release a variant of Imagen 4 that’s up to 10x quicker than Imagen 3. Imagen 4 is available in the Gemini app, Google’s Whisk and Vertex AI platforms, and across Google Slides, Vids, Docs, and more in Google Workspace.[7]

Technology

Imagen uses two key technologies. The first is the use of transformer-based large language models, notably T5, to understand text and subsequently encode text for image synthesis. The second is the use of cascaded diffusion models providing high-fidelity image generation. It generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024.[2]

Capabilities

Imagen can generate photorealistic images from text prompts.[3] It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.[7]

See also

References

  1. ^ Roth, Emma; Peters, Jay (April 20, 2023). "Google's big AI push will combine Brain and DeepMind into one team". The Verge. Archived from the original on April 20, 2023. Retrieved March 18, 2025.
  2. ^ a b c Saharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Seyed Kamyar Seyed Ghasemipour; Burcu Karagol Ayan; Sara Mahdavi, S.; Rapha Gontijo Lopes; Salimans, Tim; Ho, Jonathan; David J Fleet; Norouzi, Mohammad (2022). "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding". arXiv:2205.11487 [cs.CV].
  3. ^ a b c Kyle Wiggers (2025-05-20). "Imagen 4 is Google's newest AI image generator". techcrunch.com. Retrieved 2025-03-18. Cite error: The named reference ":2" was defined multiple times with different content (see the help page).
  4. ^ "Imagen 2 - our most advanced text-to-image technology". Google DeepMind. 2025-03-12. Retrieved 2025-03-18.
  5. ^ Wiggers, Kyle (2023-12-13). "Google debuts Imagen 2 with text and logo generation". TechCrunch. Retrieved 2025-03-18.
  6. ^ Schoon, Ben (2024-08-16). "Google opens access to Imagen 3, its latest model for AI image generation". 9to5Google. Archived from the original on 2024-08-18. Retrieved 2025-03-18.
  7. ^ a b c Christian Rowlands (2025-02-26). "Some of the most realistic AI images you'll see were created with this free tool". TechRadar. Retrieved 2025-03-18.