-
-
Notifications
You must be signed in to change notification settings - Fork 347
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #388 from aindree-2005/disaster
Disaster Twitter Sentiment Analysis using NLP
- Loading branch information
Showing
16 changed files
with
91 additions
and
0 deletions.
There are no files selected for viewing
2 changes: 2 additions & 0 deletions
2
Disaster Tweets Prediction using Deep Learning/Dataset/Readme.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
https://www.kaggle.com/competitions/nlp-getting-started/data | ||
Dataset |
7 changes: 7 additions & 0 deletions
7
Disaster Tweets Prediction using Deep Learning/Images/Readme.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
EDA was used to compare tweets based on number of words, lettercount and the keywords mentioned. Bar Charts were used. | ||
|
||
Confusion matrix was used to compare the performance of standard ML models | ||
|
||
Graph of training and test accuracy were used for comparing performance of transformer based models | ||
|
||
Now, we also use wordclouds to graphically depict the keywords and words with highest frequency in both kinds of tweets. |
Binary file added
BIN
+63.2 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (308).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+35.7 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (310).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+33.4 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (311).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+26.1 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (312).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+26.9 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (313).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+21.2 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (314).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+20.7 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (315).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+903 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (316).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+908 KB
Disaster Tweets Prediction using Deep Learning/Images/Screenshot (317).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 71 additions & 0 deletions
71
Disaster Tweets Prediction using Deep Learning/Models/Readme.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Disaster Twitter Sentiment Analysis NLP | ||
|
||
## PROJECT TITLE | ||
|
||
Disaster Twitter Sentiment Analysis NLP | ||
|
||
## GOAL | ||
|
||
The main goal of this project is to analyse the tweets on disasters and classify them as fake and real using Transformers | ||
|
||
## DATASET | ||
|
||
https://www.kaggle.com/competitions/nlp-getting-started/data | ||
|
||
## DESCRIPTION | ||
|
||
The main goal of this project is to analyse the tweets on disasters and classify them as fake and real using Transformers. Also, standard ML models like Random forest, SVC, logistic regression are used for this research | ||
|
||
## WHAT I HAD DONE | ||
|
||
This neural network architecture is tailored for Natural Language Processing (NLP) tasks. The Positional Embedding incorporates the sequential order of words, crucial for understanding context. The Transformer Encoder captures contextual information, enabling the model to comprehend relationships within input sequences. Global Max Pooling 1D extracts salient features, reducing dimensionality for efficient processing. Dropout mitigates overfitting, enhancing the model's generalization ability. The Dense layer produces the final prediction, with the architecture designed for tasks like text classification or sentiment analysis. Overall, this combination empowers the model to effectively process and interpret textual data, making it suitable for a range of NLP applications. | ||
|
||
## MODELS USED | ||
|
||
1. Random forest regressor | ||
2. SVC | ||
3. Logistic Regression | ||
4. Decision Tree Regression | ||
5. Transformer based Neural Network | ||
|
||
## LIBRARIES NEEDED | ||
- numpy | ||
- pandas | ||
- sklearn | ||
- tensorflow | ||
- keras | ||
- scipy | ||
|
||
## VISUALIZATION | ||
|
||
EDA was used to compare tweets based on number of words, lettercount and the keywords mentioned. Bar Charts were used. | ||
Confusion matrix was used to compare the performance of standard ML models | ||
Graph of training and test accuracy were used for comparing performance of transformer based models | ||
Now, we also use wordclouds to graphically depict the keywords and words with highest frequency in both kinds of tweets. | ||
|
||
## EVALUATION METRICS | ||
|
||
Confusion matrix was created and recall, f1 score, precision were used as metrics of accuracy | ||
|
||
## RESULTS | ||
|
||
Transformers provide 95% accuracy which is significantly higher than Logistic Regression and Decision Tree models coming at 81% accuracy | ||
|
||
## CONCLUSION | ||
The provided neural network architecture is well-suited for classifying disaster tweets as fake or real in the context of natural language processing (NLP). Here's how each component contributes to this task: | ||
|
||
Positional Embedding: | ||
|
||
Helps the model understand the order of words in tweets, capturing nuances and context essential for discerning fake or real information during a disaster. | ||
Transformer Encoder: | ||
|
||
Enables the model to process the entire sequence of words, capturing intricate relationships and contextual information, which is crucial for distinguishing between authentic and misleading content in disaster-related tweets. | ||
Global Max Pooling 1D: | ||
|
||
Extracts the most significant features from the encoded sequence, focusing on key information that might indicate whether a tweet is reporting a real disaster or spreading misinformation. | ||
Dropout: | ||
|
||
Mitigates overfitting, enhancing the model's ability to generalize from training data to unseen examples, which is vital for accurately classifying diverse disaster-related tweets. | ||
Dense Layer: | ||
|
||
Produces the final prediction, indicating whether a given tweet is likely to be real or fake based on the features extracted by the preceding layers. |
1 change: 1 addition & 0 deletions
1
Disaster Tweets Prediction using Deep Learning/Models/eda-disaster.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
1 change: 1 addition & 0 deletions
1
Disaster Tweets Prediction using Deep Learning/Models/models-all.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
1 change: 1 addition & 0 deletions
1
Disaster Tweets Prediction using Deep Learning/Models/transformer.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
8 changes: 8 additions & 0 deletions
8
Disaster Tweets Prediction using Deep Learning/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Tensorflow | ||
Keras | ||
NLTK | ||
Numpy | ||
SkLearn | ||
Pandas | ||
Matplotlib | ||
SeaBorn |