Sentiment Analysis on Twitter using Differential Privacy

Environment:

notebooks/baseline contains non-private Sentiment Analysis
notebooks/dp contains private Sentiment Analysis
code in notebooks/learning rate is used to obtain the learning rate of the LR-Model
code in notebooks/preprocessing is used for the preprocessing techniques and for the procedure of saving the resulting datasets to CSV files.

download dataset from https://www.kaggle.com/datasets/kazanova/sentiment140
change the encoding of the dataset to UTF-8
run notebooks/preprocessing/remove_tweets.pynb (set TRAINDATA_PATH to the filepath of the downloaded train dataset). This creates the file train_tweets_removed.csv in notebooks/preprocessing/data
create an empty folder called csv_rows in notebooks/preprocessing/data
run notebooks/preprocessing/all-preprocessing.ipynb
run the desired experiments on the preprocessed datasets (they will be saved in notebooks/preprocessing/data/csv_rows, so you might want to change the FILES_DIRECTORY variable leading inside the folder csv_rows)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
graphics		graphics
models		models
notebooks		notebooks
.gitattributes		.gitattributes
README.md		README.md
thesis.pdf		thesis.pdf