Visual-Question-Answering

This repository contains an AI system for the task of Visual Question Answering: given an image and a question related to the image in natural language, the systems answer the question in natural language from the image scene. The system can be configured to use one of 3 different underlying models:

VQA: This is the baseline model given in the paper VQA: Visual Question Answering. It encodes the image by a CNN and the question by an LSTM and then combines these for VQA task. It uses pretrained vgg16 to get the image embedding (may be further normalised), and a 1 or 2-layered LSTM for the question embedding.
SAN: This is an attention based model described in the paper Stacked Attention Networks for Image Question Answering. It incorporates attention on the input image.
MUTAN: This is a variant of the VQA model where instead of a simple of pointwise-product, the image and question embedding are combined using a a special Multimodal Tucker fusion technique described in the paper MUTAN: Multimodal Tucker Fusion for Visual Question Answering.

Usage

First download the datasets from http://visualqa.org/download.html - all items under Balanced Real Images except Complementary Pairs List.

python main.py --config <config_file_path>

The system takes its arguments from the config file that it takes as input. Sample config files have been provided in config/.

In order to speed up the training, it's possible to preprocess the images in the dataset and store the image embeddings by setting the emb_dir and preprocess flag.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
config		config
experiments		experiments
results		results
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
main.py		main.py
preprocess.py		preprocess.py
san.py		san.py
scheduler.py		scheduler.py
train.py		train.py
vqa.py		vqa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual-Question-Answering

Usage

About

Releases

Packages

Contributors 3

Languages

Shivanshu-Gupta/Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

Visual-Question-Answering

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages