This repository is an attempt to visually represent the inner workings of convolutional neural networks. This work is by no means revolutionary, however, the goal is to illustrate various methods for representing how a CNN makes decisions. In this effort I hope to understand the fine details of CNNs. Deep neural networks do not have to be black boxes. It may seem that it is some miracle that a model can identify a cat in an image, but believe me, it's not. It's just really complicated math under the hood. I believe that every ML engineer should understand how their model makes decisions, which ultimatly should answer questions related to bias. I'm new at this so bare with me...
- Running Notebook Locally
- Filter Visualization
- Activation Map Visualization
- Activation Maximization
- References
If would like to tinker feel free to install locally and make it your own.
-
Install dependencies. I generally use Conda for my environment and package management.
conda install -c conda-forge jupyterlab
pip install requirments.txt
-
The following Jupyter notebooks outline various visualization methods:
cnn_filter_vis.ipynb
Jupyter notebookmax_activations_vis.ipynb
Jupyter notebook
Generally speaking, filters in a CNN are used to extract information from an image that is then passed through the network to make predictions. These filters are called kernels. Mathmatically they perform operations on pixels that reduce an image to basic features. Each CNN layer can have hundreds of layers (kernels). These layers make up the depth of a CNN. The following gif[1] illustrates how a filter is applied to an an image:
In order to visualize the various filters and feature maps of a neural netork we first need to load a pre-trained network from Pytorch. We will use the VGG16[2] neural network and extract each corresponding convolutional layer. We will not performing backpropagation. Instead, we will use each layer's weights to help visualize the filters used and the resulting image processing.
Taking a look at 3 of the 13 convolutional layers in the VGG16 model we see that there is increased depth as we move through the model. The following images illustrate each filter in the respective layers. Note: The filters are displayed in grayscale for readability.
Layer 1: 3x3 Kernel: Depth 64 | Layer 5: 3x3 Kernel: Depth 256 | Layer 10: 3x3 Kernel: Depth 512 |
When we pass an image into the pre-trained network we process it at each layer and save the respective image representation. This is essentially what the image looks like after each filter is applied. First we will pass in an adorable picture of a black lab. Yea, I know.
When we pass the image through the first convolutional layer we will essentially get 64 corresponding activation maps. Let's take a look at when kernel 17 is applied to the image on layer 1. Note: There is some preprocessing that was done which is why the image looks squished.
After some pre-processing the below block of code takes an image and applies it to each torch.nn.Conv2d
layer. The output of one layer is the input to the next.
# Pass image through the first convolutional layer
# save the outpue
conv_out = [conv_layers[0](image)]
# Iteratively pass image through all convolutional layers
for i in range(1, len(conv_layers)):
conv_out.append(conv_layers[i](conv_out[-1]))
The depth of Layer 1 is 64. You can see how each filter extracts different details from the image. Layer 1 feature maps are fairly clear. As we move deeper into the model we can see how the detail in the image starts to degrade. Can you pick out what the feature maps are representing? Sometimes the outline of the image is clear, sometimes dark colors are emphesized, and sometimes it is hard to tell it what the image is originally of.
Layer 1: 3x3 Kernel | Layer 1: Filtered Images |
Layer 2 | Layer 4 | Layer 6 |
Layer 8 | Layer 10 | Layer 12 |
Activation Maximization was first proposed by Erhan et al.[3] in 2009 as a way to communicate CNN behavior. Specifically as a way to intepret or visualize learned feature maps. This learned feature map can be represented by an active state of particular neurons. By looking at the maximimum activation of particular neurons we can visualize what patters are larned in particular filters.
We start with a pretrained Vgg16 model and a noisy image as seen below. This image is passed through the network. At a particular layer the gradient with respect to the noisy image is calculated at each neuron.[4] This is calculted using backpropagation, while keeping the parameters of the model fixed. The hook_fn
in the ActivationMaximizationVis()
class captures the calculated gradients. Each pixel in the original noisy image is then iteratively changed to maximize the activation of the neuron. In otherwords, each pixel in the noisy image is iteratively changed to push the gradient to a maximum for that particular value. The pixel values are updated until a desired image is found.
We can visualize the activation map of each layer after a noisy image is passed through the network. Using the activation maximization technique we can see that patterns emerge at each layer/filter combination. If you look at the earlier layers in the network you can see that simplier patterns emerge. We start to notice that the activation map pulls out simpler patters and colors. Vertical and horizontal elements can be seen.
As we move deeper in the network you can see that more complex patters emerge. Some of the activation maps of later layers look like trees, eyes, and feathers. Well, at least that's what it looks like to me. We all may see something different.
Layer 1 - Filter 1 | Layer 1 - Filter 5 | Layer 1 - Filter 6 | Layer 1 - Filter 6 |
Layer 3 - Filter 1 | Layer 3 - Filter 5 | Layer 3 - Filter 28 | Layer 3 - Filter 38 |
Layer 10 - Filter 5 | Layer 10 - Filter 10 | Layer 10 - Filter 65 | Layer 10 - Filter 165 |
Layer 12 - Filter 5 | Layer 12 - Filter 10 | Layer 12 - Filter 65 | Layer 12 - Filter 165 |
Layer 14 - Filter 28 | Layer 14 - Filter 58 | Layer 14 - Filter 158 | Layer 14 - Filter 178 |
Layer 15 - Filter 40 | Layer 15 - Filter 65 | Layer 15 - Filter 165 | Layer 15 - Filter 220 |
Layer 16 - Filter 17 | Layer 16 - Filter 128 | Layer 16 - Filter 156 | Layer 16 - Filter 157 |
[1] https://github.com/vdumoulin/conv_arithmetic
[2] Very Deep Convolutional Networks for Large-Scale Image Recognition. Simonyan, K.,
Zisserman, A. 2015.
https://arxiv.org/abs/1409.1556
[3] D. Erhan, Y. Bengio, A. Courville and P. Vincent. Visualizing higher-layer features of a deep network
Technical report, University of Montreal, 1341 (2009), p3.
[4] Z. Qin, F. Yu, C. Liu, and X. Chen, How convolutional neural network see the world - A survey of
convolutional neural network visualization methods. 2018. https://arxiv.org/abs/1804.11191