Diffusion model
In machine learning, diffusion models, also known as diffusion probabilistic models, are a class of latent variable models. These models are Markov chains trained using variational inference.[1] The goal of diffusion models is to learn the latent structure of a dataset by modeling the way in which data points diffuse through the latent space. In computer vision, this means that a neural network is trained to denoise images blurred with Gaussian noise by learning to reverse the diffusion process.[2]
Diffusion models can be applied to a variety of tasks, including image denoising, inpainting, super-resolution, and image generation. For example, an image generation model would start with a random noise image and then, after having been trained reversing the diffusion process on natural images, the model would be able to generate new natural images. A recent example of this is OpenAI's text-to-image model DALL-E 2, which uses diffusion models for both the model's prior (which produces an image embedding given a text caption) and the decoder that generates the final image.[3]
See also
References
- ^ Ho, Jonathan; Jain, Ajay; Abbeel, Pieter (19 June 2020). "Denoising Diffusion Probabilistic Models". doi:10.48550/arXiv.2006.11239.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Song, Yang; Ermon, Stefano (2020). "Improved Techniques for Training Score-Based Generative Models". doi:10.48550/arXiv.2006.09011.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Ramesh, Aditya; Dhariwal, Prafulla; Nichol, Alex; Chu, Casey; Chen, Mark (2022). "Hierarchical Text-Conditional Image Generation with CLIP Latents". doi:10.48550/arXiv.2204.06125.
{{cite journal}}
: Cite journal requires|journal=
(help)