The project is a comparative study of 4 SOTA Models in which we have performed exploratory comparison of human facial images of different Age
, Gender
and Ethnicity
. The models which we have used are ResNet50
, VGG19
, MobileNetV2
and AlexNet
. These are compared in terms of classification performance
and heatmap generation
. This study provides explainability to the model predictions in addition to their efficiency in the said classification task.
The following label columns were present alongside their corresponding variations:-
Age
:- Discretized Integer Quantity, ranging from 1 to 100s.
Gender
:- String based Classes, either Male or Female.
Ethnicity
:- String based Classes; White, Black, Indian, Asian or Hispanic.
This repository focuses on the internal working of Pre-Trained Convolutional Neural Networks (CNNs), with different architectures as follows:-
-
AlexNet
:- It has 8 layers with learnable parameters. The model consists of 5 layers with a combination of Max Pooling followed by 3 fully connected layers and they use Relu activation in each of these layers, except the output layer. -
VGG-19
:- It was proposed by Karen Simonyan and Andrew Zisserman in 2014 in the paper "Very Deep Convolutional Networks for Large Scale Image Recognition". -
MobileNetV2
:- An architecture with depthwise and pointwise convolutions in order to enrich the features of the input data. -
ResNet50
:- It is a convolutional neural network that is 50 layers deep. ResNet includes several residual blocks that consist of convolutional layers, batch normalization layers and ReLU activation functions. We used the pretrained ResNet50 model to extract features from the Human Faces.
As far as the fine-tuning configurations are in concern, the following were performed:-
-
Original Image size was
48x48
. It was then resized to224x224
, since most of the Neural Network Architectures follow the same input convention. -
According to the Mean, Variance and Standard Deviation desired for the proper functionality of the pipeline, the input data was
normalized
(To the centralized Single-Peak Traditional Gaussian Distribution). -
Random_Seed
was set to129
,Training Epochs
were set to10
and theTrain-Test Split
was decided to be kept as80-20
. -
If
GPU is enabled
, then Batch-Size of64
was defined, else32
.
For witnessing the internal working of the Pre-Trained Convolutional Neural Networks, one of the renowned methods is Grad-CAM
, which utilizes the activations of the Neural Network Layers, when backpropagated with the class label's stimuli signal. This procedure leads to the impression of certain activation functions and assorted components like poolings, convolution kernels, etc. on the Input Data in the form of Probabilistic Heatmaps (RGB, in decreasing order of attention).