IMDB-reviews

Data Cleaning

Removing HTML tags
Removing Punctuations
Converting to lower case
Lemmatization
Removing stop words

TF-IDF approach accuracy

SVM - 90.44%
Logistic Regression - 88.9%
Naive Bayes - 88.51%
Decision Tree - 69.33%
Random Forest - 85.35%
Gradient Boosting Classifier - 81.01%
XGBoost Classifier - 84.58%

Neural network with Embedding layer

Accuracy with 10 epochs - 87%-88%
Adding a Convolution layer - 87%-89% (The time required to train decreases significantly in this case)

Neural network with Pre-Trained 100 dimension GloVe vectors

Accuracy with 10 epochs - 86%-88%
Adding a Convolution layer - 85%-87% (The time required to train decreases significantly in this case)

NBSVM

NBSVM is an approach to text classification proposed by Wang and Manning (https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf) that takes a linear model such as SVM (or logistic regression) and infuses it with Bayesian probabilities by replacing word count features with Naive Bayes log-count ratios.
Accuracy - 91.5%

(Note - This code was just for practice and I did not consider hyperparameter tuning for most cases)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

IMDB-reviews

Data Cleaning

TF-IDF approach accuracy

Neural network with Embedding layer

Neural network with Pre-Trained 100 dimension GloVe vectors

NBSVM

Files

README.md

Latest commit

History

README.md

File metadata and controls

IMDB-reviews

Data Cleaning

TF-IDF approach accuracy

Neural network with Embedding layer

Neural network with Pre-Trained 100 dimension GloVe vectors

NBSVM