Skip to content

Latest commit

 

History

History
281 lines (237 loc) · 14.3 KB

README.md

File metadata and controls

281 lines (237 loc) · 14.3 KB

Convolutional Neural Network Visualizations

This repository is an attempt to visually represent the inner workings of convolutional neural networks. This work is by no means revolutionary, however, the goal is to illustrate various methods for representing how a CNN makes decisions. In this effort I hope to understand the fine details of CNNs. Deep neural networks do not have to be black boxes. It may seem that it is some miracle that a model can identify a cat in an image, but believe me, it's not. It's just really complicated math under the hood. I believe that every ML engineer should understand how their model makes decisions, which ultimatly should answer questions related to bias. I'm new at this so bare with me...

Navigation

Installing Locally

If would like to tinker feel free to install locally and make it your own.

  1. Install dependencies. I generally use Conda for my environment and package management.

    conda install -c conda-forge jupyterlab

    pip install requirments.txt

  2. The following Jupyter notebooks outline various visualization methods:

    • cnn_filter_vis.ipynb Jupyter notebook
    • max_activations_vis.ipynb Jupyter notebook

Filter Visualization

Generally speaking, filters in a CNN are used to extract information from an image that is then passed through the network to make predictions. These filters are called kernels. Mathmatically they perform operations on pixels that reduce an image to basic features. Each CNN layer can have hundreds of layers (kernels). These layers make up the depth of a CNN. The following gif[1] illustrates how a filter is applied to an an image:

Model Architecture

In order to visualize the various filters and feature maps of a neural netork we first need to load a pre-trained network from Pytorch. We will use the VGG16[2] neural network and extract each corresponding convolutional layer. We will not performing backpropagation. Instead, we will use each layer's weights to help visualize the filters used and the resulting image processing.

Filter Layers

Taking a look at 3 of the 13 convolutional layers in the VGG16 model we see that there is increased depth as we move through the model. The following images illustrate each filter in the respective layers. Note: The filters are displayed in grayscale for readability.

Layer 1: 3x3 Kernel: Depth 64 Layer 5: 3x3 Kernel: Depth 256 Layer 10: 3x3 Kernel: Depth 512

Activation Map Visualization

When we pass an image into the pre-trained network we process it at each layer and save the respective image representation. This is essentially what the image looks like after each filter is applied. First we will pass in an adorable picture of a black lab. Yea, I know.

When we pass the image through the first convolutional layer we will essentially get 64 corresponding activation maps. Let's take a look at when kernel 17 is applied to the image on layer 1. Note: There is some preprocessing that was done which is why the image looks squished.

Processing Through Mulitple Layers

After some pre-processing the below block of code takes an image and applies it to each torch.nn.Conv2d layer. The output of one layer is the input to the next.

    # Pass image through the first convolutional layer 
    # save the outpue
    conv_out = [conv_layers[0](image)]
    # Iteratively pass image through all convolutional layers
    for i in range(1, len(conv_layers)):
        conv_out.append(conv_layers[i](conv_out[-1]))

The depth of Layer 1 is 64. You can see how each filter extracts different details from the image. Layer 1 feature maps are fairly clear. As we move deeper into the model we can see how the detail in the image starts to degrade. Can you pick out what the feature maps are representing? Sometimes the outline of the image is clear, sometimes dark colors are emphesized, and sometimes it is hard to tell it what the image is originally of.

Layer 1: 3x3 Kernel Layer 1: Filtered Images
Layer 2 Layer 4 Layer 6
Layer 8 Layer 10 Layer 12

Activation Maximization

Activation Maximization was first proposed by Erhan et al.[3] in 2009 as a way to communicate CNN behavior. Specifically as a way to intepret or visualize learned feature maps. This learned feature map can be represented by an active state of particular neurons. By looking at the maximimum activation of particular neurons we can visualize what patters are larned in particular filters.

The Algorithm

We start with a pretrained Vgg16 model and a noisy image as seen below. This image is passed through the network. At a particular layer the gradient with respect to the noisy image is calculated at each neuron.[4] This is calculted using backpropagation, while keeping the parameters of the model fixed. The hook_fn in the ActivationMaximizationVis() class captures the calculated gradients. Each pixel in the original noisy image is then iteratively changed to maximize the activation of the neuron. In otherwords, each pixel in the noisy image is iteratively changed to push the gradient to a maximum for that particular value. The pixel values are updated until a desired image is found.

Layer Vis

We can visualize the activation map of each layer after a noisy image is passed through the network. Using the activation maximization technique we can see that patterns emerge at each layer/filter combination. If you look at the earlier layers in the network you can see that simplier patterns emerge. We start to notice that the activation map pulls out simpler patters and colors. Vertical and horizontal elements can be seen.

As we move deeper in the network you can see that more complex patters emerge. Some of the activation maps of later layers look like trees, eyes, and feathers. Well, at least that's what it looks like to me. We all may see something different.

Layer 1 - Filter 1 Layer 1 - Filter 5 Layer 1 - Filter 6 Layer 1 - Filter 6
Now if we take a look more layers you can see...
Layer 3 - Filter 1 Layer 3 - Filter 5 Layer 3 - Filter 28 Layer 3 - Filter 38
Layer 10 - Filter 5 Layer 10 - Filter 10 Layer 10 - Filter 65 Layer 10 - Filter 165
Layer 12 - Filter 5 Layer 12 - Filter 10 Layer 12 - Filter 65 Layer 12 - Filter 165
Layer 14 - Filter 28 Layer 14 - Filter 58 Layer 14 - Filter 158 Layer 14 - Filter 178
Layer 15 - Filter 40 Layer 15 - Filter 65 Layer 15 - Filter 165 Layer 15 - Filter 220
Layer 16 - Filter 17 Layer 16 - Filter 128 Layer 16 - Filter 156 Layer 16 - Filter 157

References

[1] https://github.com/vdumoulin/conv_arithmetic

[2] Very Deep Convolutional Networks for Large-Scale Image Recognition. Simonyan, K., Zisserman, A. 2015.
      https://arxiv.org/abs/1409.1556

[3] D. Erhan, Y. Bengio, A. Courville and P. Vincent. Visualizing higher-layer features of a deep network
      Technical report, University of Montreal, 1341 (2009), p3.

[4] Z. Qin, F. Yu, C. Liu, and X. Chen, How convolutional neural network see the world - A survey of
      convolutional neural network visualization methods
. 2018. https://arxiv.org/abs/1804.11191