Learning Word Vectors for Sentiment Analysis |
HLT 2011 |
1. Present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term--document information as well as rich sentiment content. 2. Instantiate the model to utilize the document-level sentiment polarity annotations present in many online documents |
Efficient Estimation of Word Representations in Vector Space |
Arxiv 2013 |
A pioneering work that proposes two novel model architectures, namely CBOW and SkipGram, for computing continuous vector representations of words from very large data sets. |
Distributed Representations of Words and Phrases and their Compositionality |
NIPS 2013 |
Present several extensions that improve both the quality of the vectors and the training speed of SkipGram architecture. Subsampling of frequent words obtain faster training and better performance. Negative sampling uses a few negative samples when computing errors to accelerate optimization and improve performance. |
Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification |
ACL 2014 |
Use a simple MLP with one hidden layer and one softmax layer to predict the sentiment of a sliding ngram of each sentence. The loss function is the weighed sum of cross-entropy loss and hinge loss. |
GloVe: Global Vectors for Word Representation |
EMNLP 2014 |
1. A new global log-bilinear regression model that combines the advantages of two major model families in the literature: global matrix factorization and local context window methods |
Specializing Word Embeddings for Similarity or Relatedness |
EMNLP 2015 |
1. Demonstrate the advantage of specializing semantic word embeddings for either similarity or relatedness. 2. Find that retrofitting and joint-learning approaches yield specialized semantic spaces and perform better than unspecialized spaces |
Retrofitting Word Vectors to Semantic Lexicons |
NAACL 2015 |
By utilizing external semantic lexicons such as WordNet, the model aims to compute new word embeddings that are both close to their counterparts in the original word embeddings and to adjacent words in external semantic lexicons. |
A Simple But Tough-to-Beat Baseline for Sentence Embeddings |
ICLR 2016 |
1. Proposed a completely unsupervised sentence embedding method. 2. Use word embeddings computed using one of the popular methods on unlabeled corpus like Wikipedia, represent the sentence by a weighted average of the word vectors, and then modify them using PCA/SVD. 3. This weighting improves performance by about 10% to 30% in textual similarity tasks. 4. This paper also gives a theoretical explanation of the success using a latent variable generative model for sentences |
Sentiment Embeddings with Applications to Sentiment Analysis |
IEEE TKDE 2016 |
1. Encode sentiment information of texts (e.g., sentences and words) together with contexts of words in sentiment embeddings. 2. Develop a number of neural networks with tailoring loss functions, and collect massive texts automatically with sentiment signals like emoticons as the training data. |
Towards Building Affect Sensitive Word Distributions |
N.A. |
1. To incorporate affect lexica, which capture fine-grained information about a word's psycholinguistic and emotional orientation, into the training process of Word2Vec and GloVe using a joint learning approach. 2. The proposed method outperforms previous work on standard tasks such as word similarity detection, outlier detection and sentiment detection. |
Enriching Word Vectors with Subword Information |
ACL 2017 |
Present a variant of SkipGram model to take subword information into account when training word embeddings. Specifically, each n-gram is associated with one vector and each word is the sum of its comprising n-gram vectors. The similarity score between two words is calculated as the sum of scores between a word and the n-grams of the other word. |
A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings |
EMNLP 2017 |
Present a simple yet effective method for learning word embeddings based on text from different domains. |
Can Word Embeddings Help Find Latent Emotions in Text |
AAAI 2017 |
The results conclude that existing word embeddings are unable to deliver emotion information. For example, the arithmetic joy + fear = guilt does not hold. Also, emotionally similar words are far apart in the word embedding space. |
A Structured Self-attentive Sentence Embedding |
ICLR 2017 |
1. Proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. 2. Use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. |
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks |
ICLR 2017 |
1. Propose a framework that facilitates better understanding of the encoded representations of sentences. 2. Define prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input. |
Dict2vec : Learning Word Embeddings using Lexical Dictionaries |
EMNLP 2017 |
1. Propose a new approach, Dict2vec, based on one of the largest yet refined datasource for describing words – natural language dictionaries. 2. The proposed approach builds new word pairs from dictionary entries so that semantically-related words are moved closer, and negative sampling filters out pairs whose words are unrelated in dictionaries |
FRAGE: Frequency-Agnostic Word Representation |
NIPS 2018 |
1. Identifies the problem that the embeddings for popular and rare words lie in different subregions in the vector space. 2. Proposes an adversarial training method that adds an additional loss that tries to train word embeddings to fool the discriminator - a popular/rare word classifier. |
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms |
ACL 2018 |
1. Conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. 2. Propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences |
Exploring Semantic Properties of Sentence Embeddings |
ACL 2018 |
1. Assess to what extent prominent sentence embedding methods exhibit select semantic properties. 2. Propose a framework that generate triplets of sentences to explore how changes in the syntactic structure or semantics of a given sentence affect the similarities obtained between their sentence embeddings. |
What you can cram into a single vector: Probing sentence embeddings for linguistic properties |
ACL 2018 |
1. Introduce here 10 probing tasks designed to capture simple linguistic features of sentences. 2. Use these tasks to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods. |
Domain Adapted Word Embeddings for Improved Sentiment Classification |
ACL 2018 |
Use Canonical Correlation Analysis (CCA) to combine generic word embeddings with domain specific word embeddings. The domain specific embeddings are obtained via LSA on domain specific data. |
Learning Domain-Sensitive and Sentiment-Aware Word Embeddings |
ACL 2018 |
Create embeddings for generic domain and each specific domain. A latent variable is introduced for each word to indicate its probability of being in common domain. This paper extends skip-gram model to predict the polarity of each word. All embeddings are learned via EM algorithm. |
Dissecting Contextual Word Embeddings: Architecture and Representation |
EMNLP 2018 |
1. Present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. 2. Show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. 3. Show that all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. |
CARER: Contextualized Affect Representations for Emotion Recognition |
EMNLP 2018 |
Propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text |
Deep contextualized word representations |
NAACL 2018 |
1. Propose ELMo, word vectors learned from the internal states of a deep bidirectional language model pretrained on a large text corpus. 2. The proposed word vectors achieved SOTA in six challenging NLP tasks. |
Querying Word Embeddings for Similarity and Relatedness |
NAACL 2018 |
1. Demonstrate the usefulness of context embeddings in predicting asymmetric association between words from a recently published dataset of production norms. 2. Suggest that humans respond with words closer to the cue within the context embedding space (rather than the word embedding space), when asked to generate thematically related words |
Learning Emotion-enriched Word Representations |
COLING 2018 |
The emotion-rich embeddings are learned by training LSTM with cross-entropy loss to predict the emotion label of each document. Each word in the document is fed sequentially into the LSTM model. The initial embeddings are either randomly initialized or loaded from pre-trained embeddings. |
Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation |
CoNLL 2018 |
1. Argues that word embeddings have two aspects: semantics/syntax axis and similarity/relatedness axis. Each aspect has somewhat incompatible features. They conduct experiments showing that existing popular word embeddings such as Word2Vec, GloVe and FastText all have captured these information but have different surface forms. 2. Proposes a simple linear transformation as a post-processing technique that can adjust existing word embeddings towards specific axes (e.g., more semantic info vs more syntax info, or more similarity info vs more relateness info). |
All-but-the-Top: Simple and Effective Postprocessing for Word Representations |
ICLR 2018 |
Demonstrate a very simple, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations even stronger. |
Learning Sentiment-Specific Word Embedding via Global Sentiment Representation |
AAAI 2018 |
Extensions to CBoW model, which predicts center word based on context words. The proposed model additionally considers document vector when predicting the center word, where the document vector is weighed sum of word vectors. The overall loss is weighted sum of cross-entropy losses for center word prediction and sentiment polarity prediction. |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
Arxiv 2018 |
1. Propose BERT to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. 2. BERT is trained to perform two tasks: masked language modelling and next sentence prediction to learn word-level representation and sentence-level representation. 3. BERT achieves SOTA in 11 NLP tasks. |
A Survey of Word Embeddings Evaluation Methods |
Arxiv 2018 |
1. Explores issues in existing evaluation methods for word embeddings: obsecureness of the notion of semantics, lack of proper training data, absence of correlation between intrinsic and extrinsic methods, etc. 2. Reviews 16 intrinsic methods and 12 extrinsic methods for word embedding evaluation. 3. Summarizes common datasets used for each evaluation method. |
Evaluation of sentence embeddings in downstream and linguistic probing tasks |
Arxiv 2018 |
1. Perform a comprehensive evaluation of recent methods using a wide variety of downstream and linguistic feature probing tasks. 2. Show that a simple approach using bag-of-words with a recently introduced language model for deep context-dependent word embeddings proved to yield better results in many tasks when compared to sentence encoders trained on entailment datasets. |
Evaluating Compositionality in Sentence Embeddings |
Arxiv 2018 |
1. Present a new dataset for one such task, `natural language inference' (NLI), that cannot be solved using only word-level knowledge and requires some compositionality. 2. Find that augmenting training with our dataset improves test performance on our dataset without loss of performance on the original training dataset |
Universal Sentence Encoder |
Arxiv 2018 |
1. Present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. 2. Investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. |
Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations |
Arxiv 2018 |
1. Generalize the concept of average word embeddings to power mean word embeddings. 2. Show that the concatenation of different types of power mean word embeddings considerably closes the gap to state-of-the-art methods monolingually and substantially outperforms these more complex techniques cross-lingually. 3. Outperforms different recently proposed baselines such as SIF and Sent2Vec by a solid margin, thus constituting a much harder-to-beat monolingual baseline. |
Improving Language Understanding by Generative Pre-Training |
Arxiv 2018 |
1. Demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. 2. Make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. 3. Obtained SOTA in 9 out of the 12 tasks studied |
DON’T SETTLE FOR AVERAGE, GO FOR THE MAX: FUZZY SETS AND MAX-POOLED WORD VECTORS |
ICLR 2019 |
1. Proposes a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership. 2. Shows that max-pooled word vectors are only a special case of fuzzy BoW and should be compared via fuzzy Jaccard index rather than cosine similarity. 3. Proposes DynaMax, a completely unsupervised and non-parametric similarity measure that dynamically extracts and max-pools good features depending on the sentence pair outperforming strong baselines on STS tasks. |
No Training Required: Exploring Random Encoders for Sentence Classification |
ICLR 2019 |
1. Explore various methods for computing sentence representations from pre-trained word embeddings without any training, i.e., using nothing but random parameterizations. 2. Show that existing modern sentence embeddings gain over random methods is little.3. Provide the field with more appropriate strong baselines going forward. |
What do you learn from context? Probing for sentence structure in contextualized word representations |
ICLR 2019 |
1. Introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. 2. Probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. 3. Find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline. |