GitHub - qtw1998/ImageCaption-Paper-Learning: Team with Learning the ImageCaption

Show, Attend and Tell via Python 3 Version by using Tensorflow

Process

Team with Learning the ImageCaption
5.14 qtw1998: Review RNN & CNN - rich1889: Keep reviewing RNN materials
5.15 qtw1998: Review NLP & Sequence - rich1889: Studying LSTM unit & Bidirectional under RNN
5.16 qtw1998: show and tell model - rich1889: Studying Computer Vision & Edge Detaction under CNN
5.17 qtw1998: utils model - rich1889: Studying Padding & Convolutions under CNN
5.19 qtw1998: Attention | Show and Tell Model - rich1889: Reviewing materials that discussed this afternoon
5.20 qtw1998: training model - rich1889: Keep reviewing CNN materials that discussed yesterday afternoon

5.20 qtw1998: README & CODES INSTRUCTIONS - rich1889: Writing & Rectify the Paper Report

Environments:

Python 3.6
Tensorflow 1.8.0

Original readme below

Introduction

This neural system for image captioning is roughly based on the paper "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" by Xu et al. (ICML2015). The input is an image, and the output is a sentence describing the content of the image. It uses a convolutional neural network to extract visual features from the image, and uses a LSTM recurrent neural network to decode these features into a sentence. A soft attention mechanism is incorporated to improve the quality of the caption. This project is implemented using the Tensorflow library, and allows end-to-end training of both CNN and RNN parts.

And we deploy the model file on the Server and programe a App based on Android devices to implement the function.We will display the function avilable by a video.

Prerequisites

Tensorflow (instructions)
NumPy (instructions)
OpenCV (instructions)
Natural Language Toolkit (NLTK) (instructions)
Pandas (instructions)
Matplotlib (instructions)
tqdm (instructions)
./vocabulary.csv(download)
./train/captions_val2014 .json (download)
utils/coco(download&unzip)

Usage

Preparation: Download the COCO train2014 and val2014 data here. Put the COCO train2014 images in the folder train/images, and put the file captions_train2014.json in the folder train. Similarly, put the COCO val2014 images in the folder val/images, and put the file captions_val2014.json in the folder val. Furthermore, download the pretrained VGG16 net here or ResNet50 net here if you want to use it to initialize the CNN part.
Training: To train a model using the COCO train2014 data, first setup various parameters in the file config.py and then run a command like this:

python main.py --phase=train \
    --load_cnn \
    --cnn_model_file='./vgg16_no_fc.npy'\
    [--train_cnn]

Turn on --train_cnn if you want to jointly train the CNN and RNN parts. Otherwise, only the RNN part is trained. The checkpoints will be saved in the folder models. If you want to resume the training from a checkpoint, run a command like this:

python main.py --phase=train \
    --load \
    --model_file='./models/xxxxxx.npy'\
    [--train_cnn]

To monitor the progress of training, run the following command:

tensorboard --logdir='./summary/'

Evaluation: To evaluate a trained model using the COCO val2014 data, run a command like this:

python main.py --phase=eval \
    --model_file='./models/xxxxxx.npy' \
    --beam_size=3

The result will be shown in stdout. Furthermore, the generated captions will be saved in the file val/results.json.

Inference: You can use the trained model to generate captions for any JPEG images! Put such images in the folder test/images, and run a command like this:

python main.py --phase=test \
    --model_file='./models/xxxxxx.npy' \
    --beam_size=3

The generated captions will be saved in the folder test/results.

Results

A pretrained model with default configuration can be downloaded my BaiduNetDisk. This model was trained solely on the COCO train2014 data. It achieves the following BLEU scores on the COCO val2014 data (with beam size=3):

BLEU-1 = 70.3%
BLEU-2 = 53.6%
BLEU-3 = 39.8%
BLEU-4 = 29.5%

Here are some captions generated by this model:

References

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. ICML 2015.
The original implementation in Theano
An earlier implementation in Tensorflow
Microsoft COCO dataset

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
MyProcessSite		MyProcessSite
__pycache__		__pycache__
examples		examples
models		models
summary		summary
train		train
utils		utils
val/images		val/images
-		-
A Comprehensive Survey of Deep Learning for Image Captioning.pdf		A Comprehensive Survey of Deep Learning for Image Captioning.pdf
README.md		README.md
a.c		a.c
base_model.py		base_model.py
config.py		config.py
dataset.py		dataset.py
eval.sh		eval.sh
main.py		main.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Show, Attend and Tell via Python 3 Version by using Tensorflow

Process

Environments:

Original readme below

Introduction

Prerequisites

Usage

Results

References

About

Releases

Packages

Contributors 2

Languages

qtw1998/ImageCaption-Paper-Learning

Folders and files

Latest commit

History

Repository files navigation

Show, Attend and Tell via Python 3 Version by using Tensorflow

Process

Environments:

Original readme below

Introduction

Prerequisites

Usage

Results

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages