-
-
Notifications
You must be signed in to change notification settings - Fork 213
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #478 from p1utoze/main
Reddit Comments Analysis during CWC Final 2023
- Loading branch information
Showing
26 changed files
with
155,016 additions
and
0 deletions.
There are no files selected for viewing
79,651 changes: 79,651 additions & 0 deletions
79,651
Reddit Comments Analysis during CWC Final 2023/dataset/comments.csv
Large diffs are not rendered by default.
Oops, something went wrong.
72,439 changes: 72,439 additions & 0 deletions
72,439
Reddit Comments Analysis during CWC Final 2023/dataset/comments_with_scores.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+42.5 KB
Reddit Comments Analysis during CWC Final 2023/images/plot11_asent.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+41.6 KB
Reddit Comments Analysis during CWC Final 2023/images/plot11_bert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+42.1 KB
Reddit Comments Analysis during CWC Final 2023/images/plot11_textblob.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+52.4 KB
Reddit Comments Analysis during CWC Final 2023/images/plot2_asent_label.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+55.1 KB
Reddit Comments Analysis during CWC Final 2023/images/plot2_bert_label.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+56 KB
Reddit Comments Analysis during CWC Final 2023/images/plot2_txtblob_label.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+58.1 KB
Reddit Comments Analysis during CWC Final 2023/images/plot2_vader_label.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 84 additions & 0 deletions
84
Reddit Comments Analysis during CWC Final 2023/model/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
**PROJECT TITLE**: | ||
Reddit Comments Analysis during CWC Final 2023 | ||
|
||
**GOAL** | ||
|
||
The goal of this project is to analyze the comments of the reddit users during the CWC Final 2023 and gain insights about the emotions of the audience during the match. | ||
|
||
**DATASET** | ||
|
||
https://www.kaggle.com/datasets/hitman69/reddit-comments-60k-ind-vs-aus-wc-final | ||
|
||
**DESCRIPTION** | ||
|
||
The dataset contains the comments of the reddit users during the CWC Final 2023. The dataset contains 60k comments. The dataset contains the following columns: | ||
UserID: The unique ID of the reddit user | ||
body: The comment of the user | ||
author: The username of the user | ||
upvotes: The number of upvotes the comment has received | ||
timestamp: The time at which the comment was posted | ||
|
||
The body column contains the comments which was the target feature to be analyzed. The comments were analyzed using the NLP techniques. The comments were cleaned and then the sentiment analysis was performed on the comments. The comments were classified into 3 categories: Positive, Negative and Neutral. The comments were then visualized using the wordcloud. The wordcloud is a visualization technique which shows the most frequent words in the comments. The wordcloud was generated for the positive, negative and neutral comments separately. The wordclouds were then compared to gain insights about the emotions of the audience during the match. | ||
|
||
**WHAT I HAD DONE** | ||
|
||
I had performed the following steps in this project: | ||
|
||
1. Importing the libraries | ||
2. Importing the dataset and changing data types of the columns. | ||
3. Cleaning and transforming the dataset removing meaningless words and null values | ||
4. Preprocessing the comments using the NLP techniques provided by the Spacy library | ||
5. Performing the sentiment analysis on the comments using various libraries. | ||
6. Populating the comments into 3 categories: Positive, Negative and Neutral and its sentiment scores. | ||
7. Visualizing the classification of the comments using the various statistical plots. | ||
8. Comparing the sentiment scores of the comments predicted by various models. | ||
7. Visualizing the comments using the wordcloud | ||
|
||
**MODELS USED** | ||
|
||
1. Spacy with asent: It is a community project which uses rule-based learning to perform the sentiment analysis | ||
2. Textblob: It is a library which uses the Naive Bayes algorithm to perform the sentiment analysis | ||
3. Vader: It is a library which uses the lexicon and rule-based learning to perform the sentiment analysis. Most widely used method | ||
4. Roberta: It is a pretrained and finetuned BERT algorithm to perform the sentiment analysis task used with transformer pipeline. | ||
|
||
**LIBRARIES NEEDED** | ||
|
||
- asent | ||
- spacy | ||
- textblob | ||
- vaderSentiment | ||
- transformers | ||
- matplotlib | ||
- seaborn | ||
- wordcloud | ||
- numpy | ||
- pandas | ||
- plotly | ||
|
||
**VISUALIZATION** | ||
|
||
The following key insights were gained from the visualization of the certain plots: | ||
|
||
![image](../images/plot1.png) | ||
1. The number of positive comments were more than the negative comments. | ||
2. The number of neutral comments were more than the positive and negative comments. | ||
|
||
![image](../images/plot8.png) | ||
The top 10 most frequent words in the positive comments were: | ||
India, win, great, match, team, world, cup, final, played, well | ||
|
||
**CONCLUSION** | ||
|
||
I made the following conclusions with the help of this analysis: | ||
|
||
1. There was drastic change in the emotions of the audience during the match. The audience was happy at the start of the match but as the match progressed the audience became sad and angry. | ||
2. Clearly, the audience was not happy with the performance of the Indian team during the match. The audience was angry with the performance of the Indian team during the match. | ||
3. Text blob and roberta gave the best results for the sentiment analysis task depicting that most of the comments were neutral. | ||
4. Travis Head was the most talked about player during the match. | ||
5. The audience had most number of mixed reaction comments on Indian players specially on Virat Kohli, Mohammed Shami and Rohit Sharma. | ||
6. The negative comments on word India was 10% more than the negative comments on word Australia. | ||
|
||
**YOUR NAME** | ||
|
||
Adithya Awati<br> | ||
[linkedin](https://www.linkedin.com/in/adithya-awati-87b7541a3/) |
2,828 changes: 2,828 additions & 0 deletions
2,828
Reddit Comments Analysis during CWC Final 2023/model/ind-vs-aus-wc-sentiment-analysis.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
14 changes: 14 additions & 0 deletions
14
Reddit Comments Analysis during CWC Final 2023/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
spacy==3.7.2 | ||
asent==0.8.3 | ||
vaderSentiment==3.3.2 | ||
textblob==0.17.1 | ||
transformers==4.35.2 | ||
pandas==2.0.3 | ||
numpy==1.24.3 | ||
matplotlib==3.7.4 | ||
seaborn==0.12.2 | ||
plotly==5.16.1 | ||
plotly-express==0.4.1 | ||
wordcloud==1.9.2 | ||
|
||
|