diff --git a/README.md b/README.md index 704468e..68630cb 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,35 @@ # Variational-Transformer -This code has been written using PyTorch >= 0.4.1. + [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) + + + +This is the PyTorch implementation of the paper: + +**MoEL: Mixture of Empathetic Listeners**. [**Zhaojiang Lin**](https://zlinao.github.io/), Genta Indra Winata, Peng Xu, Zihan Liu, Pascale Fung [[PDF]](https://arxiv.org/pdf/2003.12738.pdf) + +This code has been written using PyTorch >= 0.4.1. If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below: +
+@misc{lin2020variational, + title={Variational Transformers for Diverse Response Generation}, + author={Zhaojiang Lin and Genta Indra Winata and Peng Xu and Zihan Liu and Pascale Fung}, + year={2020}, + eprint={2003.12738}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} ++ +## Global Variational Transformer (GVT): +
+ +
+The GVT is the extension of CVAE in Zhao et al. (2017), which modeling the discourse-level diversity with a global latent variable. + +## Sequential Variational Transformer (SVT): ++ +
+SVT, inspired by variational autoregressive models (Goyal et al., 2017; Du et al., 2018), incorporates a sequence of latent variables into decoding process by using a novel variational decoder layer. Unlike previous approaches (Zhao et al., 2017; Goyal et al., 2017; Du et al., 2018), SVT uses Non-causal Multi-head Attention, which attend to future tokens for computing posterior latent variables instead of using an additional encoder. ## Dependency Check the packages needed or simply run the command