Reference for computing bootstrap estimates of confidence intervals? #67

vadori · 2024-11-11T17:47:28Z

Hi!

Thank you so much for sharing this package. It’s so helpful. I have a couple of questions :)

Could you please confirm whether the bootstrap summary table's 5% and 95% columns display the 95% confidence intervals for accuracy, sensitivity, and specificity?
Additionally, could you suggest any references for understanding how to calculate in-bag and out-of-bag confidence intervals using bootstrapping? My current understanding is that out-of-bag estimates involve resampling the dataset N times, recalculating the optimal cutoff each time, and evaluating performance on the samples excluded in each round. Would it be correct to say that this approach does not yield a confidence interval for the metric specifically at the optimal cutoff but rather for the metric when "applying a cutoff on the given biomarker," as the cutoff may vary across bootstrap samples?

Thanks!

Thie1e · 2024-11-24T17:13:51Z

Hi vadori,

I'm glad the package was helpful. Regarding your questions:

No, those are the 5th and 95th percentiles of the bootstrap results, so it's a 90% confidence interval from a nonparametric bootstrap. You can generate the 95% (or any other) confidence interval using the boot_ci function. For example, boot_ci(my_result, optimal_cutpoint, alpha = 0.05) returns the 2.5th and 97.5th percentiles.
Yes, the out-of-bag metrics are a way to estimate the performance on unseen data, as the 'optimal' cutpoint is determined only using the in-bag data. So this is a type of cross-validation. Your interpretation is correct, I would phrase it as "an estimate of the expected performance when optimizing a cutoff on the given biomarker and given a specific optimization method". My suggestion for more details on validation using the bootstrap would be 'Regression Modeling Strategies' by Frank Harrell.

vadori · 2024-11-28T21:13:07Z

Dear @Thie1e

This is amazing, thank you so much for your reply! I am now experimenting. I will let you know how it goes.

Best,
VV

EDIT: Everything is going great, I love your package. I have some questions:

If I wanted to customize the plots (for example, to display the distribution of the two classes on the same plot with different colors, along with a vertical bar indicating the cutoff value) what would be the best approach? Custom code right?
Also, what split is used between in-bag and out-of-bag samples for the bootstrap estimate of the CI? 80-20? I can't find this info online.
VERY IMPORTANT FOR ME: is there a way to increase the number of decimals in the bootstrap summary that I obtained for example with

cp <- cutpointr(dataall, !!sym(predictor),
      direction = "<=", pos_class = 1, !!sym(outcome),
      method = maximize_metric, metric = youden, boot_runs = 100
    )

Thank you!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference for computing bootstrap estimates of confidence intervals? #67

Reference for computing bootstrap estimates of confidence intervals? #67

vadori commented Nov 11, 2024 •

edited

Loading

Thie1e commented Nov 24, 2024 •

edited

Loading

vadori commented Nov 28, 2024 •

edited

Loading

Reference for computing bootstrap estimates of confidence intervals? #67

Reference for computing bootstrap estimates of confidence intervals? #67

Comments

vadori commented Nov 11, 2024 • edited Loading

Thie1e commented Nov 24, 2024 • edited Loading

vadori commented Nov 28, 2024 • edited Loading

vadori commented Nov 11, 2024 •

edited

Loading

Thie1e commented Nov 24, 2024 •

edited

Loading

vadori commented Nov 28, 2024 •

edited

Loading