Skip to content

Latest commit

 

History

History
56 lines (35 loc) · 2.23 KB

File metadata and controls

56 lines (35 loc) · 2.23 KB

Keyphrase Generation with Transformer

This repo contains the source code for keyphrase generation with Transformer architecture. Based on the paper entitled "Keyphrase Generation with Cross-Document Attention". Our implementation is built on the source code from OpenNMT-kpg-release and OpenNMT-py.

CDKGen is a Transformer-based keyphrase generator, which expands the Transformer to global attention with cross-document attention networks to incorporate available documents as references so as to generate better keyphrases with the guidance of topic information. On top of the proposed Transformer + cross-document attention architecture, we also adopt a copy mechanism to enhance our model via selecting appropriate words from documents to deal with out-of-vocabulary words in keyphrases. The structure of CDKGen is illustrated in the figure below.

ZEN_model

Citation

If you use or extend our work, please cite the following paper:

@article{Sinovation2020CDKGen,
  title="{Keyphrase Generation with Cross-Document Attention}",
  author={Shizhe Diao, Yan Song, Tong Zhang},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.09800}
}

Dependencies

python 3.6

pytorch 1.1

torchtext 0.4 (important)

sentence-transformers 0.2.4

Datasets

All datasets used in this paper are provided by Rui Meng and they can be downloaded here.

Preprocess the data

python preprocess.py -config config/preprocess/config-preprocess-keyphrase-kp20k.yml

Training

python train.py -config config/train/config-transformer-keyphrase-memory.yml

Testing

python kp_gen_eval.py -tasks pred eval report -config config/test/config-test-keyphrase-one2seq.yml -data_dir data/keyphrase/meng17/ -ckpt_dir ./models/kp20k/ -output_dir output/cdkgen/ -testsets duc inspec semeval krapivin nus kp20k -gpu 0 --verbose --beam_size 10 --batch_size 32 --max_length 40 --onepass --beam_terminate topbeam --eval_topbeam