This repository contains additional material for the paper "A Semantic Similarity-Based Perspective of Affect Lexicons for Sentiment Analysis".
The additional material for the paper can be found here. It contains the results on all the datasets for the different WordNet-based similarity metrics considered.
The implementation of SIMON is included in the gsitk package. An example of use follows.
gsitk includes the implementation of the SIMON feature extractor. To use it, two things are needed:
- A sentiment lexicon
- A word embeddings model that is gensim compatible.
For example, using only the lexicon from Bing Liu and a embeddings model that is in the current directory:
from gsitk.features import simon
from nltk.corpus import opinion_lexicon
from gensim.models.keyedvectors import KeyedVectors
lexicon = [list(opinion_lexicon.positive()), list(opinion_lexicon.negative())]
embedding_model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
simon_transformer = simon.Simon(lexicon=lexicon, n_lexicon_words=200, embedding=embedding_model)
# simon_transformer has the fit() and transform() methods, so it can be used in a Pipeline
To enhance performance, it is recommendable to use a more complete scikit-learn pipe that implements normalization and feature selection in conjuction with the SIMON feature extraction.
from gsitk.features import simon
simon_model = simon.Simon(lexicon=lexicon, n_lexicon_words=200, embedding=embedding_model)
model = simon.simon_pipeline(simon_transformer=simon_model, percentile=25)
# model also implemtens fit() and transform()
If you use this work, please cite the following paper: Oscar Araque, Ganggao Zhu, Carlos A. Iglesias, A semantic similarity-based perspective of affect lexicons for sentiment analysis, Knowledge-Based Systems, Volume 165, 2019, Pages 346-359, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2018.12.005. (http://www.sciencedirect.com/science/article/pii/S0950705118305926) Keywords: Sentiment analysis; Sentiment lexicon; Semantic similarity; Word embeddings
@article{ARAQUE2019346,
title = "A semantic similarity-based perspective of affect lexicons for sentiment analysis",
journal = "Knowledge-Based Systems",
volume = "165",
pages = "346 - 359",
year = "2019",
issn = "0950-7051",
doi = "https://doi.org/10.1016/j.knosys.2018.12.005",
url = "http://www.sciencedirect.com/science/article/pii/S0950705118305926",
author = "Oscar Araque and Ganggao Zhu and Carlos A. Iglesias",
keywords = "Sentiment analysis, Sentiment lexicon, Semantic similarity, Word embeddings",
}