Skip to content

anaya729/Twitter-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Twitter-sentiment-analysis

Here's a README file for your GitHub repository based on the provided code:


Sentiment Analysis on Twitter Dataset

This repository contains code for performing sentiment analysis on a Twitter dataset using logistic regression and other NLP techniques. The dataset used in this project is the Sentiment140 dataset, which contains labeled tweets. The goal is to classify tweets as positive or negative.

Dataset

The dataset used in this project is downloaded from Kaggle's Sentiment140 dataset. It contains 1.6 million labeled tweets.

Prerequisites

  • Python 3.x
  • Kaggle API key (download and configure as kaggle.json)
  • Required Python libraries: numpy, pandas, re, scikit-learn, nltk, and pickle

Installation

  1. Clone the repository: git clone https://github.com/your-username/sentiment-analysis-twitter.git cd sentiment-analysis-twitter

  2. Install required libraries: pip install -r requirements.txt

  3. Download the Sentiment140 dataset from Kaggle: kaggle datasets download -d kazanova/sentiment140

  4. Extract the dataset: unzip sentiment140.zip

  5. Copy your Kaggle API key (kaggle.json) to the appropriate location:

    mkdir -p ~/.kaggle cp kaggle.json ~/.kaggle/ chmod 600 ~/.kaggle/kaggle.json

Project Structure

  • sentiment_analysis.py: The main script containing the data processing, model training, and evaluation code.
  • requirements.txt: Contains the list of dependencies required to run the project.
  • trained_model.sav: Saved trained logistic regression model (optional).
  • README.md: This file.

How to Run

  1. Execute the main script: python sentiment_analysis.py

  2. The script will process the dataset, train a logistic regression model, evaluate the model, and print the accuracy scores.

  3. You can save the trained model to a file for later use.

Features

  • Data Preprocessing: The script includes data preprocessing steps such as stemming, tokenization, and removing non-alphabetic characters.
  • Model Training: Logistic regression is used as the model for sentiment analysis.
  • Model Evaluation: Accuracy score for both training and testing data is calculated and printed.

Model Saving and Loading

  • The trained model is saved to a file (trained_model.sav) and can be loaded back for predictions.

Future Improvements

  • Consider using other advanced models such as neural networks for better performance.
  • Fine-tune hyperparameters of the model for optimal results.
  • Add support for other languages and expand the dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published