In this submission, work conducted for METU MMI727 course project is demonstrated. In this project, Show, Attend, and Tell image captioning model is examined. Implementation available in A PyTorch Tutorial to Image Captioning is taken as a baseline and different improvement strategies are examined. The baseline model and 5 modified models are trained on MSCOCO image captioning dataset. These 6 models are benchmarked and compared on the test set of MSCOCO.
Please refer to Technical Report for technical details and Jupyter Notebook for practical details. Link to a mini-dataset and checkpoints are in the link below:
https://drive.google.com/drive/folders/1QveOj2T6krf6p10JdRgWcxk2-OU1qyWT?usp=sharing