ImageCaptioning using Token Embeddings and inception_resnet_v2

The main idea behind this research is to try improving the Image captioning results, the change that I made is using BERT token embeddings for the text (captions) and I used Inception_resnet_v2 for the images and I successfully implemented them. Also, It is proposed to use BERTScore[2] for captions evaluation, but this is not included in the implementation.

Dataset: I downloaded it using torrent because it's quite big and the provided forum for downloading it is not working.

Presentarion: https://docs.google.com/presentation/d/10kZi5rQVZ-Tkrt5IQh_1J5iZoWbjTUtJMPaCpHAkMlk/edit#slide=id.g57487e59e2_0_26

Documentation: https://docs.google.com/document/d/1W1FD2-nDMSW6x4HnuopuOBJO4qPl3-lSSklyPb5dRiQ/edit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ImageCaptioning using Token Embeddings and inception_resnet_v2

Files

README.md

Latest commit

History

README.md

File metadata and controls

ImageCaptioning using Token Embeddings and inception_resnet_v2