EMNLP 2020 Covid-19 Workshop Paper: Real-time Classification, Geolocation and Interactive Visualization of COVID-19 Information Shared on Social Media to Better Understand Global Developments (OpenReview).
Winner of McHacks 7 Hackathon: Using Natural Language Processing to categorize and map tweets in real-time during the covid-19 crisis.
Tweets are classified among the following categories:
affected_people
other_useful_information
disease_transmission
disease_signs_or_symptoms
prevention
treatment
not_related_or_irrelevant
deaths_reports
-
Download Trained AllenNLP Model to
/tweet_classifier/experiments/l2_balanced/
-
Setup up
tweepy_auth.json
with Twitter API keys -
Download and run ElasticSearch geonames gazetteer container
docker pull elasticsearch:5.5.2 wget https://s3.amazonaws.com/ahalterman-geo/geonames_index.tar.gz --output-file=wget_log.txt tar -xzf geonames_index.tar.gz docker run -d -p 127.0.0.1:9200:9200 -v $(pwd)/geonames_index/:/usr/share/elasticsearch/data elasticsearch:5.5.2
-
Install requirements
pip install -r requirements.txt
-
Run live twitter scraper/classifier
python stream_twitter.py
-
Run live dashboard
python app.py
allennlp train experiments/l2_balanced/config.json --serialization-dir experiments/l2_balanced/out
These following repositories made my life much easier with working examples of the different components I needed for this project.
- smacawi for tweet scraping / classification
- CrisisNLP for the training data
- mordecai for elasticsearch gazetteer
- dash uber app for dash frontend