This repository contains the official implementation of the Cura system from paper "Cura: Curation at Social Media Scale".
The raw Reddit data and the checkpoint of the best model are available at: Google Drive.
Check out the code from our repository using
git clone https://github.com/Azure-Vision/Curation-Modeling.git
git checkout Trans
Place .env.development
in the root directory.
Install the dependencies using
conda env create -f environment.yml
Then activate the environment with
conda activate cr2
Run the following script
python train.py CONFIG_PATH
The configurations used in the paper “Cura: Curation at Social Media Scale” can be found at configs/subreddit_minority_no_peer_new.yml
, and the configurations for the online experiment that includes more subreddits can be found at configs/subreddit_minority_no_peer_more_subs.yml
.
Evaluate the prediction accuracy and confidence of the curation model under different conditions: run test_model.ipynb
.
Evaluate the change in prediction accuracy when the curation model receives more peer votes: run sim_new_votes.ipynb
.
Perform curation on selected subreddit given selected curators: run curation.ipynb
.
Luanch the interface for administrators to select curators and perform curation using
streamlit run curation_interface.py
Collect and preprocess posting and user data from Curio app: run process_CURIO_data.ipynb
.
Finetune the pretrained curation model on Curio data using
cd trained_models; mkdir finetune_CURIO_full_data; mkdir deploy_CURIO_full_data; cp subreddit_minority_no_peer_new/latest.pt finetune_CURIO_full_data/latest.pt; cd ..; python train.py configs/finetune_CURIO_full_data.yml; cp trained_models/finetune_CURIO_full_data/best.pt trained_models/deploy_CURIO_full_data/best.pt
Launch the curation model backend for Curio with
uvicorn curation_backend:app --port 5000