Image-Captioning

Uses a CNN Encoder and a RNN Decoder to generate captions for input images.

Here's a sumary of the steps involved.

Dataset used is the COCO data set by Microsoft.
Feature vectors for images are generated using a CNN based on the ResNet architecture by Google.
Word embeddings are generated from captions for training images. NLTK was used for working with processing of captions.
Implemented an RNN decoder using LSTM cells.
Trained the network for nearly 9 hrs using GPU to achieve average loss of about 2.8.
Obtained outputs for some test images to understand efficiency of the trained network.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
models		models
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback