PyTorch implementation of InfoGAN Paper
- numpy
- python
- pytorch
- pytorch-vision
- pickle
- cuda(highly recommended)
- Clone the repo to your local system :
git clone link-to-repo
NOTE : For Colab, use the command below.
!git clone link-to-repo
- Load the tensorboard (optional):
load_ext tensorboard
tensorboard --logdir=runs
NOTE : For Colab, use the command below.
%load_ext tensorboard
%tensorboard --logdir=runs
- Run the following command :
python train.py
NOTE : Use python3 instead in case of Linux/Mac . For Colab, use %run .
usage: train.py [-h] [-num_epoch NUM_EPOCH] [-batch_size BATCH_SIZE]
[-num_workers NUM_WORKERS] [-lrD LRD] [-lrG LRG]
[-beta1 BETA1] [-beta2 BETA2] [-recog_weight RECOG_WEIGHT]
[-model_path MODEL_PATH] [-save_epoch SAVE_EPOCH]
optional arguments:
-h, --help show this help message and exit
--num_epoch NUM_EPOCH Number of epochs,default: 50
--batch_size BATCH_SIZE
Size of each batch, default: 100
--num_workers NUM_WORKERS
Number of processes that generate batches in parallel,
default: 2
--lrD LRD Adam optimizer discriminator learning rate, default :
2e-4 (0.0002)
--lrG LRG Adam optimizer generator learning rate, default : 1e-3
(0.001)
--beta1 BETA1 Momentum1 of Adam, default : 0.5
--beta2 BETA2 Momentum2 of Adam, default : 0.999
--recog_weight RECOG_WEIGHT
Weight given to continuous Latent codes in loss
calculation, default: 0.5
--model_path MODEL_PATH
Default : 'trained_model'+ current datetime (datetime
is added itself)
--save_epoch SAVE_EPOCH
Epoch at which model checkpoint is saved, default: 5
- Title : Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
- Dated : 12.06.2016
- Authors : Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
- University : UC Berkley
- Field : Deep Learning, Generative Adversarial Networks
Since the inception of generative adversarial networks, popularly called GANs, they revolutionised the genrative models understanding, as well as the whole Deep Learning in general. GAN is basically a police-forgerer fight between discriminator and generator, where discriminator tries to classify between real and fake, which serves as a constant purpose for generator to come back hard for discriminator the next time. Over course of time, discriminator has a hard time in telling fake ones from real, as the data has been imitated so well, all unsupervisedly. DCGAN used deep convolution layers and a certain architecture to improve this.
But still, it seemed that all learning was kinda like black box, and model didnt really get know various features of image, for example, sunglasses, curves, angle. The learning seemed, and was indeed, entangled. InfoGAN proposed disentagled representation of data to acquire these features with the help of latent codes and information maximisation.
Latent : something existing but hidden, not manifested, concealed .eg. latent heat of fusion
The main difference is the addition of latent code 'c' to the tranditional noise vector 'z' fed into the Generator. So now G(z) looks like G(z,c). To hope that the network understands these latent codes in an unsupervised manner, an information-theoretic regularisation is proposed: there should be high mutual information between latent codes c and generator distribution G(z,c). Formally, I(c; G(z, c)) should be high. So basically, an additional regularisation term is added to original GAN objective.
Now, as the fairyland it seems, practically maximizing this I(c; G(z,c)) is hard as it requires knowledge of the posterior P(c|x), but we can still find a lower bound solution. This consists of introducing an “auxiliary” distribution Q(c|x), which is modeled by a parameterized neural network, and is meant to approximate the real P(c|x). They then use a re-parameterization trick to make it such that you can merely sample from a user-specified prior (i.e. uniform distribution) instead of the unknown posterior.
Auxiliary network Q is modelled as a neural network, and shares most of the structure with that of Discriminator except the last layer, since their purpose are different. For MNIST, linear input of 74 variables is fed, consisting of 62 random noise variables, 10 for the categories we hope would match to each of the digit, and 2 latent codes, 1 for width and other for the rotation of digits, random values between -1 and +1. Even though InfoGAN introduces an extra hyperparameter λ, it’s easy to tune and simply setting to 1 is sufficient for discrete latent codes. Knowing the difficult training of GAN, the paper copies the layers from an existing architecture, DCGAN. So, to ease things out, InfoGAN basically adds a few components to the DCGAN, latent code 'c', an auxiliary network Q and all the training to estimate c unsupervisedly.
Training Started!
Epoch[1/50] Loss_D: 0.1175 Loss_G: 7.1921 Time:0.27
Epoch[2/50] Loss_D: 0.1191 Loss_G: 7.2145 Time:0.27
Epoch[3/50] Loss_D: 0.1186 Loss_G: 6.6265 Time:0.27
Epoch[4/50] Loss_D: 0.1201 Loss_G: 6.5022 Time:0.27
Epoch[5/50] Loss_D: 0.1237 Loss_G: 6.7592 Time:0.27
Epoch[6/50] Loss_D: 0.1197 Loss_G: 7.2505 Time:0.29
Epoch[7/50] Loss_D: 0.1234 Loss_G: 6.3390 Time:0.27
Epoch[8/50] Loss_D: 0.1640 Loss_G: 4.8824 Time:0.27
Epoch[9/50] Loss_D: 0.1219 Loss_G: 6.9811 Time:0.28
Epoch[10/50] Loss_D: 0.1160 Loss_G: 4.9793 Time:0.27
Epoch[11/50] Loss_D: 0.1142 Loss_G: 4.7543 Time:0.29
Epoch[12/50] Loss_D: 0.1172 Loss_G: 4.4431 Time:0.28
Epoch[13/50] Loss_D: 0.1162 Loss_G: 4.3410 Time:0.27
Epoch[14/50] Loss_D: 0.1350 Loss_G: 5.6601 Time:0.27
Epoch[15/50] Loss_D: 0.1222 Loss_G: 4.4171 Time:0.28
Epoch[16/50] Loss_D: 0.2083 Loss_G: 4.1196 Time:0.29
Epoch[17/50] Loss_D: 0.2229 Loss_G: 3.6437 Time:0.28
Epoch[18/50] Loss_D: 0.4226 Loss_G: 2.6176 Time:0.28
Epoch[19/50] Loss_D: 0.4480 Loss_G: 2.5218 Time:0.27
Epoch[20/50] Loss_D: 0.4998 Loss_G: 2.5010 Time:0.27
Epoch[21/50] Loss_D: 0.6554 Loss_G: 2.2479 Time:0.28
Epoch[22/50] Loss_D: 0.7237 Loss_G: 2.1017 Time:0.28
Epoch[23/50] Loss_D: 0.7636 Loss_G: 2.5741 Time:0.27
Epoch[24/50] Loss_D: 0.7692 Loss_G: 2.0070 Time:0.27
Epoch[25/50] Loss_D: 0.7076 Loss_G: 1.7769 Time:0.28
Epoch[26/50] Loss_D: 0.7990 Loss_G: 1.8808 Time:0.29
Epoch[27/50] Loss_D: 0.1518 Loss_G: 4.5639 Time:0.29
Epoch[28/50] Loss_D: 0.4372 Loss_G: 4.0119 Time:0.27
Epoch[29/50] Loss_D: 0.3616 Loss_G: 3.2473 Time:0.30
Epoch[30/50] Loss_D: 0.2219 Loss_G: 4.4581 Time:0.29
Epoch[31/50] Loss_D: 0.4572 Loss_G: 3.4499 Time:0.30
Epoch[32/50] Loss_D: 0.5197 Loss_G: 2.5418 Time:0.29
Epoch[33/50] Loss_D: 0.3907 Loss_G: 2.6922 Time:0.28
Epoch[34/50] Loss_D: 0.6162 Loss_G: 1.9162 Time:0.28
Epoch[35/50] Loss_D: 0.4139 Loss_G: 3.0718 Time:0.29
Epoch[36/50] Loss_D: 0.4458 Loss_G: 3.2686 Time:0.29
Epoch[37/50] Loss_D: 0.5357 Loss_G: 2.7983 Time:0.28
Epoch[38/50] Loss_D: 0.5137 Loss_G: 3.2597 Time:0.28
Epoch[39/50] Loss_D: 0.3200 Loss_G: 3.5945 Time:0.29
Epoch[40/50] Loss_D: 0.1403 Loss_G: 6.7252 Time:0.30
Epoch[41/50] Loss_D: 0.1399 Loss_G: 6.5192 Time:0.29
Epoch[42/50] Loss_D: 0.3973 Loss_G: 5.3002 Time:0.28
Epoch[43/50] Loss_D: 0.2442 Loss_G: 4.6178 Time:0.28
Epoch[44/50] Loss_D: 0.2592 Loss_G: 3.9474 Time:0.28
Epoch[45/50] Loss_D: 0.2502 Loss_G: 4.5859 Time:0.28
Epoch[46/50] Loss_D: 0.1453 Loss_G: 6.8319 Time:0.29
Epoch[47/50] Loss_D: 0.2114 Loss_G: 4.1357 Time:0.28
Epoch[48/50] Loss_D: 0.2629 Loss_G: 3.7725 Time:0.28
Epoch[49/50] Loss_D: 0.4998 Loss_G: 4.0496 Time:0.28
Epoch[50/50] Loss_D: 0.2323 Loss_G: 4.0389 Time:0.29
NOTE : Images get saved in trained_model directory in png format.
- Going down, generated images are at 10,20,30,40 and 50 epochs. Im wondering what the hell happened at 40 epochs!