Skip to content

Latest commit

 

History

History
14 lines (12 loc) · 2.41 KB

NLP_modelling.md

File metadata and controls

14 lines (12 loc) · 2.41 KB

NLP - Language Modelling

Paper Conference Remarks
Affect-LM: A Neural Language Model for Customizable Affective Text Generation ACL 2017 1. The problem of integrating state-of-the-art neural language models with affective information remains an unexplored area. 2. The proposed model Affect-LM enables us to customize the degree of emotional content in generated sentences through an additional design parameter
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model ICLR 2017 1. Formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. 2. Propose a simple mixture of Softmax to address this issue
On the State of the Art of Evaluation in Neural Language Models ICLR 2018 Re-evaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models.
Regularizing and Optimizing LSTM Language Models ICLR 2018 1. Consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. 2. Propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. 3. Introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user.
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Arxiv 2019 1. Propose a novel neural architecture, Transformer-XL, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence. 2. Transformer-XL learns dependency that is about 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformer during evaluation.

Back to index