diff --git a/README.md b/README.md index 704468e..68630cb 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,35 @@ # Variational-Transformer -This code has been written using PyTorch >= 0.4.1. + [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) + + + +This is the PyTorch implementation of the paper: + +**MoEL: Mixture of Empathetic Listeners**. [**Zhaojiang Lin**](https://zlinao.github.io/), Genta Indra Winata, Peng Xu, Zihan Liu, Pascale Fung [[PDF]](https://arxiv.org/pdf/2003.12738.pdf) + +This code has been written using PyTorch >= 0.4.1. If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below: +
+@misc{lin2020variational,
+    title={Variational Transformers for Diverse Response Generation},
+    author={Zhaojiang Lin and Genta Indra Winata and Peng Xu and Zihan Liu and Pascale Fung},
+    year={2020},
+    eprint={2003.12738},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+
+ +## Global Variational Transformer (GVT): +

+ +

+The GVT is the extension of CVAE in Zhao et al. (2017), which modeling the discourse-level diversity with a global latent variable. + +## Sequential Variational Transformer (SVT): +

+ +

+SVT, inspired by variational autoregressive models (Goyal et al., 2017; Du et al., 2018), incorporates a sequence of latent variables into decoding process by using a novel variational decoder layer. Unlike previous approaches (Zhao et al., 2017; Goyal et al., 2017; Du et al., 2018), SVT uses Non-causal Multi-head Attention, which attend to future tokens for computing posterior latent variables instead of using an additional encoder. ## Dependency Check the packages needed or simply run the command