How-Diffusion-Models-Work

Notes from How Diffusion Models Work by DeepLearning.ai

Notes

Taught By Sharon Zhou

Noted by Atul

Missing Prerequisite: Backprop

Example used throughout the course: Generate 16X16 size sprites for video games.

Intuition

Goal : Given a lot of sprite images, generate even more sprite images

What does the network learn?
- Fine details
- General outline
- Everything in between
Noising Process (bob as ink drop analogy)

Denoising Process (what should the NN think?)
- If its' Bob the sprite, keep it as it is
- If its likely to be Bob, suggest more details to be filled
- If its just an outline of a sprite, suggest general details for likely sprite(bob/fred/...)
- If its nothing, suggest outline of a sprite
Give the NN input noise, whose pixels are obtained from Normal distribution, and get a completely new sprite !

Sampling

Assume you have a trained NN
At each denoising step, it predicts noise, and subtracts it to get a better image
NOTE: At each denoising step, some random noise is added again to prevent "mode collapse"

Neural Network

UNet Architecture
- Input and output of same size
- First used for image segmentation

Takes a noisy image, embeds into small space by downsampling, and upsamples to predict noise
Can take more info. in form of embeddings
- Time: related to timestep, and noise level added
- Context: guides generation process
Checkout forward() in sampling notebook

Training

Learns the distribution of what is "not noise"

Sample training image, timestep t, and noise, randomly
- Timestep helps control level of noise
- randomisation ensures a stable model
Add noise to image
Input this into NN, which predicts the noise
Compute loss between actual and predicted noise
Backprop and learn

Control

Embeddings are vectors , for instance, strings represented as number vectors
Given as input to NN along with training image
Get associated with a training example, and its properties
Uses: Generate funky mixtures by combining embeddings
Context formats
- Text
- Categories, one hot encoded (Eg. hero, non-hero, spells ...)

Fast Sampling : DDIM

DDPM is slow!
- Multiple timesteps, and markovian nature
Skips steps, making the process deterministic
Lower quality than DDPM

Summary

Other applications : Music, Inpainting, Textual Inversion

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Fast Sampling		Fast Sampling
Sampling		Sampling
Training		Training
README.md		README.md
diffusion_utilities.py		diffusion_utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How-Diffusion-Models-Work

Contents

Intuition

Sampling

Training

Context Embedding

Faster Sampling

Notes

Intuition

Sampling

Neural Network

Training

Control

Fast Sampling : DDIM

Summary

About

Releases

Packages

Languages

Liyuan-Chen-1024/How-Diffusion-Models-Work

Folders and files

Latest commit

History

Repository files navigation

How-Diffusion-Models-Work

Contents

Intuition

Sampling

Training

Context Embedding

Faster Sampling

Notes

Intuition

Sampling

Neural Network

Training

Control

Fast Sampling : DDIM

Summary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages