Assignment 1 submission for the course CS6910 Fundamentals of Deep Learning.
Team members: N Sowmya Manojna (BE17B007), Shubham Kashyapi (MM16B027)
The code for question 1 can be accessed here. The program, reads the data from keras.datasets
, picks one example from each class and logs the same to wandb
.
The neural network is implemented by the class NeuralNetwork
, present in the network.py
file.
An instance of NeuralNetwork
is as follows:
model = NeuralNetwork(layers=layers, batch_size=2000, optimizer="Normal", \
initialization="RandomNormal", loss="CrossEntropy", \
epochs=int(100), t=t, X_val=X_val_scaled, t_val=t_val, \
use_wandb=False)
It can be implemented by passing the following values:
-
layers
An example of layer is as follows:layers = [ Input(data=X_scaled), Dense(size=64, activation="Sigmoid", name="HL1"), Dense(size=10, activation="Sigmoid", name="OL") ]
Here,
Input
andDense
are layer classes, that can be accessed in thelayers.py
file.- An instance of the class
Input
can be created by passing the input to the function call, as shown above. - An instance of the class
Dense
can be created by passing the size of the layer, the activation and an optional name to identify the layer.
- An instance of the class
-
batch_size
The Batch Size is passed as an integer that determines the size of the mini batch to be taken into consideration. -
optimizer
The optimizer value is passed as a string, that is internally converted into an instance of the specified optimizer class. The optimizer classes are present inside the fileoptimizers.py
. An instance of the class can be created by passing the corresponding parameters:- Normal: eta
(default: eta=0.01) - Momentum: eta, gamma
(default: eta=1e-3, gamma=0.9) - Nesterov: eta, gamma
(default: eta=1e-3, gamma=0.9) - AdaGrad: eta, eps
(default: eta=1e-2, eps=1e-7) - RMSProp: beta, eta , eps
(default: beta=0.9, eta = 1e-3, eps = 1e-7) - Adam: beta1, beta2, eta, eps
(default: beta1=0.9, beta2=0.999, eta=1e-2, eps=1e-8) - Nadam: beta1, beta2, eta, eps
(default: beta1=0.9, beta2=0.999, eta=1e-3, eps=1e-7)
- Normal: eta
-
intialization: A string -
"RandomNormal"
or"XavierUniform"
can be passed to change the initialization of the weights in the model. -
epochs: The number of epochs is passed as an integer to the neural network.
-
t:
t
is theOneHotEncoded
matrix of the vectory_train
, of size (10,n), where n is the number of sample. -
loss: The loss type is passed as a string, that is internally converted into an instance of the specified loss class. The optimizer classes are present inside the file
loss.py
. -
X_val: The validation dataset, used to validate the model.
-
t_val:
t_val
is theOneHotEncoded
matrix of the vectory_val
, of size (10,n), where n is the number of sample. -
use_wandb: A flag that lets the user choose whether they want to use wandb for the run or not.
-
optim_params: Optimization parameters to be passed to the optimizers.
The model can be trained by calling the member function: forward_propogation
, followed by backward_propogation
. It is done as follows:
model.forward_propogation()
model.backward_propogation()
The model can be tested by calling the check_test
member function, with the testing dataset and the expected y_test
. The y_test
values are only used for calculating the test accuracy. It is done in the following manner:
acc_test, loss_test, y_test_pred = model.check_test(X_test_scaled, t_test)
The confusion matrix is logged using the following code:
wandb.log({"conf_mat" : wandb.plot.confusion_matrix(
probs=None,
y_true=y_test[:9000],
preds=y_test_pred,
class_names=["T-shirt/top","Trouser","Pullover",\
"Dress","Coat","Sandal","Shirt","Sneaker",\
"Bag","Ankle boot"])})
Three hyperparameter sets were selected and were run on the MNIST dataset. The configurations choosen are as follows:
-
Configuration 1:
optimizer
= Adam,init
= XavierUniform,activation
= tanh,hidden_layer_size
= 64,batch_size
= 1024,num_hidden_layers
= 1
-
Configuration 2:
optimizer
= Adam,init
= XavierUniform,activation
= tanh,hidden_layer_size
= 32,batch_size
= 128,num_hidden_layers
= 1
-
Configuration 3:
optimizer
= Adam,init
= XavierUniform,activation
= relu,hidden_layer_size
= 32,batch_size
= 1024,num_hidden_layers
= 1
The codes are organized as follows:
Question | Location | Function |
---|---|---|
Question 1 | Question-1 | Logging Representative Images |
Question 2 | Question-2 | Feedforward Architecture |
Question 3 | Question-3 | Complete Neural Network |
Question 4 | Question-4 | Hyperparameter sweeps using wandb |
Question 7 | Question-7 | Confusion Matrix logging for the best Run |
Question 10 | Question-10 | Hyperparameter configurations for MNIST data (Q10) |