This repository mainly services our final project of computer animation class this term.
-
Our group member: FeiSun and I.
-
Andrej Karpathy's code can be found here
-
Andrej Karpathy's paper, Deep Visual-Semantic Alignments for Generating Image Descriptions, can be found here
Because We need to use tensorflow(it may not support Python2.7 in windows), We ues Python 3.5.
- argparse
- tensorflow
- keras
- numpy
Most of these are okay to install with pip. To install all dependencies at once, run the command pip install -r requirements.txt
We use pretrained model VGG16 and LSTM, so we need to import their parameters at first. You also can train these models by yourself.
- step1: put yourself's images into 'self_pic/img/'
- step2: download 'vgg16_weights_tf_dim_ordering_tf_kernels.h5' and put it into 'VGG16_weights/'
- step3: download 'model_checkpoint_coco_visionlab43.stanford.edu_lstm_11.14.p' and put it into 'cv/'
- step4: in terminal, run
python get_img_features_VGG16.py
to get 'vgg_feats.npy' and 'self_img_dataset.json' file - step5: in terminal, run
python predict_on_images.py "cv/model_checkpoint_coco_visionlab43.stanford.edu_lstm_11.14.p" -r self_pic
, then you can see the result in 'result.html'
You can download above files from here
Some of images get pretty well result, like this one
and this one
This one is not bad
But this image get a very strange description, may because she dressed a cat-like sweater.
- Instead of using matlab code written by Andrej Karpathy, we create ourselves VGG16 model to extract the features of each image by keras, you can see the source code in file 'get_img_features_VGG16.py'.
- We also can predict the classes of each image and have a look at the relation between 'prediction of image class' and 'generation of image description'. Our fundamental hypothesis is if we can get a very precise prediction on image classification, this may imply that we extract most important features about this image, then we can use these features to generate a better image description.