Sentiment-Analysis

This repo consists of Sentiment Analysis and EDA on Movie Reviews by NLTK Data.

Data

Link to data is Here

Structure :

Sentiment analyis
- Movie Reviews
  - Neg
  - Pos
  - Side Stuff

Motivation

NLP has always been a challenging topic in machine learning. The best use of NLP is in detecting what the majority wishes for and hence, SENTIMENT ANALYSIS.

Work Done

Cleaned the data
Performed EDA
Created encodings (self, using frequency and Bert Encodings)
Implemented :
- Ranfom Forest Regressor
- Logistic Regression
- Neural Network
- Encoding Model

EDA

Various plots like word cloud, pie, donut, histograms etc were used for performing EDA on the dataset. Top common words from negative sentiment and positive sentiment were displayed in form of donut, bars. For common words, pie chart was additionally made. Various plots of words cloud were included in the begining of the notebook. Plots of the validation and training loss are also displayed.

Models

Various models were implemented :

Logistic Regression - Using Bert Embedding, it gave 50 % accuracy
Random Forest Regressor - Using Bert Embedding, it gave 40-45 % accuracy
EMbedding models - Using encoding made of top 1000 words and series of length of 350, the accuracy achieved was 50 %
Bert Encoding - Using Bert Encoding and class model with various Dense layers and embedding and glove method, 79-80 % accuracy was achieved

Key Work

Tried to generate polarities and build plots to show the distribution of negative and positive sentiments. It used top 500 negative words and 500 positve words such that any word is not present in both of the encoding.

Key Observation

A drastic reduction in the frequency of words was observed when the threshold was set to 5.

Further Improvements that can be made

Sarcasm detection was not taken in account.
The Bert encoding was padded to maximum length and thus lots of 0s were there for Logistic Regression and Random Forest methos, making the apporpiate padding can help achieve better base models.
SVM and many other methods can be combined to give a better accuracy.

Some links

WORD ClOUD - https://towardsdatascience.com/create-word-cloud-into-any-shape-you-want-using-python-d0b88834bc32 Make WORD CLOUD with different fonts - https://www.dafont.com

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Notebook.ipynb		Notebook.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-Analysis

Data

Motivation

Work Done

EDA

Models

Key Work

Key Observation

Further Improvements that can be made

Some links

About

Releases

Packages

Languages

guljain/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment-Analysis

Data

Motivation

Work Done

EDA

Models

Key Work

Key Observation

Further Improvements that can be made

Some links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages