This is a supplementary code for our paper: "Uncertainty in Gradient Boosting via Ensembles" by Andrey Malinin, Liudmila Prokhorenkova, Aleksei Ustimenko (ICLR 2021)
See also our tutorials on uncertainty estimation with CatBoost: blog post with synthetic regression example, blog post with practical classification example.
Datasets can be found here.
python train_models.py regression 1
First argument options: regression
, classification
, regression_rf
, classification_rf
Second argument (for CatBoost only): 0 or 1 indicates whether to tune hyperparameters (or use already obtained ones)
Regression:
python aggregate_results_regression.py prr_auc
Options: std
, nll_rmse
, prr_auc
, rf_rmse
, rf_prr_auc
Classification:
python aggregate_results_classification.py prr_auc
Options: nll_error
, prr_auc
, rf_nll_error
, rf_prr_auc
synthetic_regression.ipynb
synthetic_classification.ipynb
(not included in the paper)
gbdt_uncertainty/kdd/kdd.sh