Skip to content

exploration-based project for gaining a deeper understanding of diffusion models

Notifications You must be signed in to change notification settings

henryhmko/diffusion_deepdive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diffusion Deepdive

Exploration-based project aimed at understanding how diffusion models work by implementing various diffusion models.

Approach:

  1. Single-Image Generation (to verify that diffusion does work)
  2. Single-Class Generation (Landscape data from Kaggle)
  3. Multi-Class Conditional Generation (CIFAR 10)
  4. Latent Diffusion (CIFAR 10)

Single-Image Generation

This is to verify that diffusion does work by having the entire dataset be a single image. More of a sanity check before moving to more complex tasks. All Training done on 1 x RTX 2080

Oski (32 x 32)

VLSB Santa Dino (64 x 64)

Rare creature only to be spotted during christmas seasons in the VLSB building

Single-Class Generation

Introduction: Extend the Single-Image Generation into Single-Class Generation. Test with the landscape data first.

Landscape Data

1. Training Run #1

epochs = 500
lr = 1e-3
bs = 16 #gpu is 2080...
diffusion_timesteps = 1000

Training Time: ~9 hours

Summary Result

Detailed Sampling

Question:

One question came up when I was looking at the outputs:

  • Why does the model seem to generate complicated parts of a scene(e.g. mountains, grass) first?
    • For example, the model seemed to start generating the mountains rather than the sky or the grass before the sky.
    • Is something with texture easier for a model to start generating from (i.e. easiest way to optimize for lower loss is constructing prominent features first)?

2. Training Run #2

epochs = 500
lr = 1e-4
bs = 16
diffusion_timesteps = 1000

Training Time: ~9 hours

Summary Result

Detailed Sampling

3. Training Run #3

epochs = 500 #should have increased epochs for this lr
lr = 1e-5 
bs = 16
diffusion_timesteps = 1000

Training Time: ~9 hours

Summary Result

Detailed Sampling

Italian Greyhound Data

just an extension since i like them

1. Training Run #1

epochs = 3000
lr = 1e-4
bs = 12
diffusion_timesteps = 1000

Training Time: ~ (err look at tf log) hours

Summary Result

Detailed Sampling

Interesting Papers

About

exploration-based project for gaining a deeper understanding of diffusion models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages