Diffusion model

In machine learning, diffusion models, also known as diffusion probabilistic models, are a class of latent variable models. These models are Markov chains trained using variational inference.^[1] The goal of diffusion models is to learn the latent structure of a dataset by modeling the way in which data points diffuse through the latent space. In computer vision, this means that a neural network is trained to denoise images blurred with Gaussian noise by learning to reverse the diffusion process.^[2]

Diffusion models can be applied to a variety of tasks, including image denoising, inpainting, super-resolution, and image generation. For example, an image generation model would start with a random noise image and then, after having been trained reversing the diffusion process on natural images, the model would be able to generate new natural images. A recent example of this is OpenAI's text-to-image model DALL-E 2, which uses diffusion models for both the model's prior (which produces an image embedding given a text caption) and the decoder that generates the final image.^[3]

References

^ Ho, Jonathan; Jain, Ajay; Abbeel, Pieter (19 June 2020). "Denoising Diffusion Probabilistic Models". doi:10.48550/arXiv.2006.11239. {{cite journal}}: Cite journal requires |journal= (help)
^ Song, Yang; Ermon, Stefano (2020). "Improved Techniques for Training Score-Based Generative Models". doi:10.48550/arXiv.2006.09011. {{cite journal}}: Cite journal requires |journal= (help)
^ Ramesh, Aditya; Dhariwal, Prafulla; Nichol, Alex; Chu, Casey; Chen, Mark (2022). "Hierarchical Text-Conditional Image Generation with CLIP Latents". doi:10.48550/arXiv.2204.06125. {{cite journal}}: Cite journal requires |journal= (help)

[1] Ho, Jonathan; Jain, Ajay; Abbeel, Pieter (19 June 2020). "Denoising Diffusion Probabilistic Models". doi:10.48550/arXiv.2006.11239. {{cite journal}}: Cite journal requires |journal= (help)

[2] Song, Yang; Ermon, Stefano (2020). "Improved Techniques for Training Score-Based Generative Models". doi:10.48550/arXiv.2006.09011. {{cite journal}}: Cite journal requires |journal= (help)

[3] Ramesh, Aditya; Dhariwal, Prafulla; Nichol, Alex; Chu, Casey; Chen, Mark (2022). "Hierarchical Text-Conditional Image Generation with CLIP Latents". doi:10.48550/arXiv.2204.06125. {{cite journal}}: Cite journal requires |journal= (help)

[1]

[2]

[3]

See also

References