Skip to content

🔥 Pytorch implementation of an image captioning model that uses attention.

Notifications You must be signed in to change notification settings

MoezAbid/Image-Captioning-Attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image captioning with Attention

This repository contains the Pytorch 🔥 implementation of an image captioning model that uses attention. Demo.

Model Architecture

Usage 💻

To try it, run the following commands :

  1. Install the necessary python packages :
pip install -r requirements.txt
  1. Change the DATA_PATH, caption_file & images_directory paths in the data.py file.

  2. Train the model with :

python train.py
  1. Then open a terminal and run it :
python app.py

The app should be usable on localhost in the browser.

Details

Dataset

Neural Network

Model Architecture

Model Architecture Description :

The model contains 3 main components:

  1. Encoder to extract features with the pre-trained ResNet50 model (trained on the imagenet dataset).
  2. An Attention Mechanism implementation so that the neural network knows on which part of the input image to focus on when when decoding certain words.
  3. An LSTM decoder to generate captions adn return the attentions alphas along with it.

Training

The model has been trained for 25 epochs and took 2 hours and a 30 minutes to learn. The performance can be increased with more data, more evolved neural network and more iterations. Training loss

Example

We can make a prediction on the following example image to get the corresponding caption :

Predicition

Then we can visualize the attention values on different spots of the image according to the different word tokens generated.

Attention

About

🔥 Pytorch implementation of an image captioning model that uses attention.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages