The goal is to analyze tweets to classify them into categories of cyberbullying and non-cyberbullying using NLP techniques and machine learning models.
The dataset contains over 47,000 tweets labeled into six categories: Age, Ethnicity, Gender, Religion, Other type of cyberbullying, and Not cyberbullying.
Link to the dataset: Cyberbullying Classification Dataset
- Logistic Regression
- Naive Bayes
- Random Forest Classifier
- Voting Classifier (Ensemble Model): Combines predictions from the above models using a majority voting scheme.
Contributions are welcome! Feel free to submit issues, feature requests, or pull requests to improve the system.