Skip to content

Applying various algorithms on the IMDB reviews dataset.

Notifications You must be signed in to change notification settings

ayushkothari27/IMDB-reviews

Repository files navigation

IMDB-reviews

Data Cleaning

Removing HTML tags
Removing Punctuations
Converting to lower case
Lemmatization
Removing stop words

TF-IDF approach accuracy

SVM - 90.44%
Logistic Regression - 88.9%
Naive Bayes - 88.51%
Decision Tree - 69.33%
Random Forest - 85.35%
Gradient Boosting Classifier - 81.01%
XGBoost Classifier - 84.58%

Neural network with Embedding layer

Accuracy with 10 epochs - 87%-88%
Adding a Convolution layer - 87%-89% (The time required to train decreases significantly in this case)

Neural network with Pre-Trained 100 dimension GloVe vectors

Accuracy with 10 epochs - 86%-88%
Adding a Convolution layer - 85%-87% (The time required to train decreases significantly in this case)

NBSVM

NBSVM is an approach to text classification proposed by Wang and Manning (https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf) that takes a linear model such as SVM (or logistic regression) and infuses it with Bayesian probabilities by replacing word count features with Naive Bayes log-count ratios.
Accuracy - 91.5%

(Note - This code was just for practice and I did not consider hyperparameter tuning for most cases)

About

Applying various algorithms on the IMDB reviews dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published