Document Modeling with External Information for Sentence Extraction

This repository contains the code neccesary to reproduce the results in the paper:

Document Modeling with External Attention for Sentence Extraction, Shashi Narayan, Ronald Cardenas, Nikos Papasarantopoulos, Shay B. Cohen, Mirella Lapata, Jiangsheng Yu and Yi Chang, ACL 2018, Melbourne, Australia.

Extractive Summarization

To train XNet+ (Title + Caption), run:

python document_summarizer_gpu2.py --max_title_length 1 --max_image_length 10 --train_dir --model_to_load 8 --exp_mode train

from extractive_summ/.

Answer Selection

Datasets and Resources

a) NewsQA

Download the combined dataset from: https://datasets.maluuba.com/NewsQA/dl

Download splitting scripts from NewsQA repo: https://github.com/Maluuba/newsqa

b) SQuAD: https://rajpurkar.github.io/SQuAD-explorer/

c) WikiQA: https://www.microsoft.com/en-us/download/details.aspx?id=52419

d) MarcoMS: http://www.msmarco.org/dataset.aspx

e) 1 billion words benchmark: http://www.statmt.org/lm-benchmark/

Preprocessing

First, train word embeddings on the 1BW benchmark using word2vec and place the files on answer_selection/datasets/word_emb.

Generate the score files (IDF, ISF, word counts) for each dataset by running

python reformat_corpus.py

from answer_selection/datasets//

The preprocessed files will be placed in the folder: answer_selection/datasets/preprocessed_data/

Training

Run the scripts run_ in each model folder for training.

Evaluation

Run the scripts eval_ in each model folder for training.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
answer_selection		answer_selection
common		common
extractive_summ		extractive_summ
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Modeling with External Information for Sentence Extraction

Extractive Summarization

Answer Selection

About

Releases

Packages

Contributors 2

Languages

License

shashiongithub/Document-Models-with-Ext-Information

Folders and files

Latest commit

History

Repository files navigation

Document Modeling with External Information for Sentence Extraction

Extractive Summarization

Answer Selection

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages