Aspect Based Sentiment Analysis(ABSA) on Restaurant User Reviews
We have a file containing anonymized user reviews. For each review, we are also provided the aspect
term or the word of interest. Our goal for each review is to predict the polarity (POSITIVE
, NEGATIVE
or NEUTRAL
) of the user's opinion for the given aspect term.
For example, if the <user review, aspect term> is
"The soup at this restaurant is awful!", "soup"
our task is to predict NEGATIVE
. It is possible that a single review might have contrasting reviews about different aspects.
For example,
"The soup at this restaurant is awful, however the service is excellent!", "service"
the output for the above user review when aspect term is service
should be POSITIVE
.
Two csv files containing user reviews, aspect term, aspect category and review polarity can be found here. We train on traindata.csv
and perform validation on devdata.csv
. The final test set is hidden.
Create /data
at the same level as /src
. Place traindata.csv
and devdata.csv
inside /data
. Then,
python src/testing.py
calls the classifier, trains the model and displays results on dev set.
- Basic preprocessing - converting to lowercase, punctuation and stop-word removal, lemmatization and spell correction.
- Some target words are phrases so combine all words in the phrase into a new word. Apply transformation on original sentences.
- Use polarity lexicons to assign higher weightage to words with repetitive letters like : 'looooove' or 'gooooood' (indicators of emotion?)
- Dependency graphs to parse sentences and find modifiers (ADJ/ADV/amod/advmod/attr) of target word. Add pre-trained word embeddings for modifier as feature (300 long vector, Google word embeddings)
- Split Aspect Category columns into categories and sub-categories and one-hot encode
- Look for negations present within 3 preceding words of target modifiers and assign a polarity score using Vader polarity lexicon
- Binary variable that is 1 when there is a CAPITAL CASE in the sentence and 0 otherwise.
- tf-idf representation of the sentence
Tried deep vs non-deep models. The non-deep model performed slightly better than the Deep Neural Network model on average.
Ensemble approach - combining predictions from different (weak & strong) classifiers. VotingClassifier over RandomForest Classifier and a Support Vector Classifier.
Mean accuracy of 82.82%
on dev dataset