LexiGuard

Detecting toxicity in online comments using an LSTM recurrent neural network

Capstone group project for neue fische Data Science Bootcamp:
Presentation slides (PDF)
Presentation video

Installation

The most convenient way to run the notebooks contained in this repo is probably running them in the cloud, e.g. on Google Colab. To do so open a notebook and click on the Colab badge at the top.

If you would like to run the notebooks locally on your own machine, you may want to install Anaconda distribution and create a virtual environment using the included environment.yml (conda env create -f environment.yml).

To run the Streamlit prototype dashboard, use this command: python -m streamlit run lstm_dashboard.py.

Repo Contents

File	Description
eda.ipynb	Initial exploratory data analysis
data_preprocessing.ipynb	Create data file(s)
baseline_model.ipynb	Baseline model (BOW + logistic regression)
random_forest.ipynb	Random forest experiments
xgboost.ipynb	XGBoost experiments
lstm.ipynb	LSTM final model (TensorFlow)
lstm_dashboard.py	Very basic prototype dashboard using Streamlit
functions.ipynb	Utitlity functions

Notes

The project was the group's first trip into the field of NLP. It was thus foremost about learning and trying out things. Many of these things did not make it into the final project version. Some examples of what we also tinkered around with:

SpaCy (for vectorization)
BERT / Hugging Face Transformers
fastText (for vectorization)
Gensim
Naive Bayes
random undersampling
POS tagging
stemming

Credits

Code based on collaborative work by:
André Oliveira (Bambuzera)
Eric Martinez (ericmartinez1189)
Purvi Parmar (PurviDParmar)
Michael Schickenberg (CalleRosa40)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexiGuard

Installation

Repo Contents

Notes

Credits

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
img		img
.gitignore		.gitignore
README.md		README.md
baseline_model.ipynb		baseline_model.ipynb
data_preprocessing.ipynb		data_preprocessing.ipynb
eda.ipynb		eda.ipynb
environment.yml		environment.yml
functions.ipynb		functions.ipynb
lexiguard_presentation.pdf		lexiguard_presentation.pdf
lstm.ipynb		lstm.ipynb
lstm_dashboard.py		lstm_dashboard.py
random_forest.ipynb		random_forest.ipynb
xgboost.ipynb		xgboost.ipynb

CalleRosa40/lexiguard

Folders and files

Latest commit

History

Repository files navigation

LexiGuard

Installation

Repo Contents

Notes

Credits

About

Resources

Stars

Watchers

Forks

Languages