This repo consists of Sentiment Analysis and EDA on Movie Reviews by NLTK Data.
Link to data is Here
Structure :
- Sentiment analyis
- Movie Reviews
- Neg
- Pos
- Side Stuff
- Movie Reviews
NLP has always been a challenging topic in machine learning. The best use of NLP is in detecting what the majority wishes for and hence, SENTIMENT ANALYSIS.
- Cleaned the data
- Performed EDA
- Created encodings (self, using frequency and Bert Encodings)
- Implemented :
- Ranfom Forest Regressor
- Logistic Regression
- Neural Network
- Encoding Model
Various plots like word cloud, pie, donut, histograms etc were used for performing EDA on the dataset. Top common words from negative sentiment and positive sentiment were displayed in form of donut, bars. For common words, pie chart was additionally made. Various plots of words cloud were included in the begining of the notebook. Plots of the validation and training loss are also displayed.
Various models were implemented :
- Logistic Regression - Using Bert Embedding, it gave 50 % accuracy
- Random Forest Regressor - Using Bert Embedding, it gave 40-45 % accuracy
- EMbedding models - Using encoding made of top 1000 words and series of length of 350, the accuracy achieved was 50 %
- Bert Encoding - Using Bert Encoding and class model with various Dense layers and embedding and glove method, 79-80 % accuracy was achieved
Tried to generate polarities and build plots to show the distribution of negative and positive sentiments. It used top 500 negative words and 500 positve words such that any word is not present in both of the encoding.
A drastic reduction in the frequency of words was observed when the threshold was set to 5.
- Sarcasm detection was not taken in account.
- The Bert encoding was padded to maximum length and thus lots of 0s were there for Logistic Regression and Random Forest methos, making the apporpiate padding can help achieve better base models.
- SVM and many other methods can be combined to give a better accuracy.
WORD ClOUD - https://towardsdatascience.com/create-word-cloud-into-any-shape-you-want-using-python-d0b88834bc32 Make WORD CLOUD with different fonts - https://www.dafont.com