Skip to content

Latest commit

 

History

History
17 lines (14 loc) · 1015 Bytes

README.md

File metadata and controls

17 lines (14 loc) · 1015 Bytes

ImageCaptioning

Image Captioning can be useful for physically disabled people like semi-blind or blind people if voice output is added to the generated captions. It can also be used in virtual assistants such as Sirior Cortana to help searching images of a particular type. For e.g.:- ”Show me pictures of myselfwearing a blue shirt.” Thus, we can see that there is plenty of motivation and usefulness involved in the image captioning task.

In this project, we implemented three different techniques used for Image Captioning:

  1. CNN-RNN (Google’s Implementation as ourbaseline)
  2. CNN-BRNN (Deep Visual-Semantic alignments for Gen-erating Image Description - Andrej Karpathy)
  3. Attention-based mode (Show Tell and Attend)

The various evaluation metrics used are:

  1. BLEU (Bilingual Evaluation Understudy)score
  2. METEOR
  3. Cider

Datasets Used were:

  1. Flickr8k (8000 im-ages comprising of 1GB)
  2. Flickr30K (31K imagescomprising of 6GB)
  3. MSCOCO (123K im-ages comprising of 18GB)