quantile regression implementation notes #30

topepo · 2024-08-05T17:35:31Z

Added here to discuss

topepo · 2024-08-05T17:36:52Z

@ryantibs @dajmcdon please review and/or add discussion below.

ryantibs

Wrote a couple of thoughts on terminology and evaluation metrics.

ryantibs · 2024-08-06T18:10:10Z

quantile-regression/README.md

+The format of the list column would be 
+
+```
+tibble(.quantile = numeric(), .pred_quantile = numeric())


I'll just note that I find the use of "quantile" here (and in other places, used similarly) a bit confusing. I think that you are referring to a "probability level" for the quantile, such as 0.5 or 0.9. This is sometimes called also "quantile level".

Somewhat confusingly, it's also sometimes called just "quantile". But then that can be easily confused with the associated quantile value itself. So I prefer something like "level" for that reason, to disambiguate.

We can use "level". That should be intuitive for people.

We've been using:

.pred_quantile = tibble(.value = numeric(), .quantile_level = numeric())

for specificity.

It may also be worth considering giving the a class or using vctrs::vctrs_rcrd.

A user might want to, for example, perform math on the .value.

It might also help with the tune methods as well. When tuning, what if we end up comparing models fit with different sets of quantiles (possibly with an overlap)? So evaluation may need to compare one set with .quantile_level = c(.2, .5, .8) and another with .quantile_level = c(.1, .2, .5, .8, .9).

ryantibs · 2024-08-06T18:24:12Z

quantile-regression/README.md

+
+## yardstick
+
+We need one or more model performance metrics for quantile loss that work across all quantiles. They will need a new class too (say `“quantile_metric”`). 


I think the most obvious metric is just to add up the quantile loss over all quantile levels. This is given in (for example) equation (10) of these notes: https://www.stat.berkeley.edu/~ryantibs/statlearn-s23/lectures/calibration.pdf.

As the notes explain, this is also equivalent (up to a factor of 2) to weighted interval score (WIS). That is defined in equation (8) and the equivalence is stated in equation (11).

Some people in the field do not know of this equivalence and I've even seen papers reporting both quantile loss and WIS as separate columns of a predictive perrformance metrics table. So it'd be good for this package to clearly state they are equivalent! Perhaps in the documentation.

WIS is nice because it offers a complementary view on the same equivalent metric. As explained in this paper: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008618, you can decompose the score by inspecting the two summands separately (summed up over all levels), and interpret these loosely as a spread/sharpness measure (first one) and calibration measure (second one). Some people like looking at that as consituents of error.

Other than quantile loss and equivalently WIS, I'm not aware of any other score based on predicted quantiles that is proper. People in forecast scoring care a lot about propriety. However, I'll just note that it would be reasonable in general to additionally look at absolute error (AE) using the predicted 50% quantile (predicted median) if it's available in the list of quantiles, and also to look at coverage (empirical coverage of each quantile versus the nominal coverage). I think people often do this in practice.

Finally, in an off-the-cuff remark, I'll note that I'm becoming more and more bothered by quantile loss/WIS as a general go-to metric. I think it can be too lenient for underdispersed forecasters. I have an ongoing project with students exploring this. I don't have anything concrete to recommend as an alternative at the moment, maybe we will at some future point.

dajmcdon · 2024-08-08T18:56:28Z

quantile-regression/README.md

+The format of the list column would be 
+
+```
+tibble(.quantile = numeric(), .pred_quantile = numeric())


We've been using:

.pred_quantile = tibble(.value = numeric(), .quantile_level = numeric())

for specificity.

It may also be worth considering giving the a class or using vctrs::vctrs_rcrd.

A user might want to, for example, perform math on the .value.

It might also help with the tune methods as well. When tuning, what if we end up comparing models fit with different sets of quantiles (possibly with an overlap)? So evaluation may need to compare one set with .quantile_level = c(.2, .5, .8) and another with .quantile_level = c(.1, .2, .5, .8, .9).

dajmcdon · 2024-08-08T19:03:48Z

quantile-regression/README.md

+* There could be confusion around the type of model and prediction being made. Suppose that `rpart` could make quantile predictions (it cannot). Would use an engine of “part” result in the regular CART model or one optimized via quantile loss? 
+* Since the list of quintiles would be specified upfront, it would be advantageous to do this with the new mode (`set_mode(“quantgreg,” quantiles = (1:9) / 10)`). 
+
+To test using a new mode, there is a [parsnip branch](https://github.com/tidymodels/parsnip/tree/quantile-mode) that enables that so you can use: 


I suspect this is the preferred way to do this, because of the cons listed above. Thanks for the testing branch. I'll try reimplementing some of our engines off this branch in the next few days and link back here describing any difficulties or benefits I encounter.

topepo · 2024-08-15T04:41:01Z

I took a few minutes and built a small hardhat template for making our on quantile regression models: topepo/salmon#1

dajmcdon · 2024-08-15T18:44:23Z

I made a small template for a few engines: https://github.com/dajmcdon/quantrines

topepo · 2024-08-29T21:43:39Z

@dajmcdon 😬 I didn't see your repo and did some implementing this afternoon in that parsnip branch.

I added some infrastructure to parsnip to pick up the quantile levels and insert them into the fit calls (instead of having to pass them in as engine args with potentially different names). I have the api adding them to set_mode() so it can validate them at the same times as the model/engine/mode combination. There's a code example below.

We can combine both sets of work since they don't 100% overlap.

If you are good with it, I'll setup some weekly or every-other-week time to catch up on this.

# pak::pak(c("tidymodels/parsnip@quantile-mode"), ask = FALSE)

library(tidymodels)

qunt_reg_spec <- 
  linear_reg() %>% 
  set_engine("quantreg") %>% 
  set_mode("quantile regression", quantile_level = c(.1, .5, .9))
qunt_reg_spec
#> Linear Regression Model Specification (quantile regression)
#> 
#> Computational engine: quantreg
#> Quantile levels: 0.1, 0.5, and 0.9.

qunt_reg_f_fit <- 
  qunt_reg_spec %>% fit(price ~ sqft, data = Sacramento)
qunt_reg_f_fit
#> parsnip model object
#> 
#> Call:
#> quantreg::rq(formula = price ~ sqft, tau = quantile_level, data = data)
#> 
#> Coefficients:
#>                tau= 0.1  tau= 0.5   tau= 0.9
#> (Intercept) -21815.3846 5946.2931 29181.5323
#> sqft           107.6923  136.0187   190.4032
#> 
#> Degrees of freedom: 932 total; 930 residual

# or use:
qunt_reg_xy_fit <- 
  qunt_reg_spec %>% fit_xy(x =Sacramento %>% select(sqft), y = Sacramento$price)

pred_vals <- predict(qunt_reg_f_fit, Sacramento[1:2,], type = "quantile")
pred_vals$.pred_quantile[[1]]
#> # A tibble: 3 × 2
#>   .pred_quantile .quantile_level
#>            <dbl>           <dbl>
#> 1         68215.             0.1
#> 2        119658.             0.5
#> 3        188359.             0.9

# or just
predict(qunt_reg_xy_fit, Sacramento[1:2,])
#> # A tibble: 2 × 1
#>   .pred_quantile  
#>   <list>          
#> 1 <tibble [3 × 2]>
#> 2 <tibble [3 × 2]>

^{Created on 2024-08-29 with reprex v2.1.0}

dajmcdon · 2024-08-29T22:58:48Z

@topepo That looks great! (I should have probably tagged you on the repo directly). How would you like to proceed from here?

I modified the rand_forest() implementation I have in that package to use your updates, and it works really well!
I removed the .onLoad() for my quantreg version to avoid the

Error in `is_discordant_info()`:
! The combination of engine `quantreg` and mode `quantile regression` "and
   prediction type 'quantile'" already has predict data for model `linear_reg` and
   the new information being registered is different.

error.

One thing with quantreg::rq() is that it produces different output objects with just one quantile_level requested. So the post function in the predictions will need a slight modification.

library(parsnip)
tib <- data.frame(
  y = rnorm(100), x = rnorm(100), z = rnorm(100),
  f = factor(sample(letters[1:3], 100, replace = TRUE))
)
spec <- linear_reg(engine = "quantreg") %>%
  set_mode(mode = "quantile regression", quantile_level = c(.1, .5, .9))
out <- fit(spec, formula = y ~ x + z, data = tib)
predict(out, new_data = tib[1:5, ])
#> # A tibble: 5 × 1
#>   .pred_quantile  
#>   <list>          
#> 1 <tibble [3 × 2]>
#> 2 <tibble [3 × 2]>
#> 3 <tibble [3 × 2]>
#> 4 <tibble [3 × 2]>
#> 5 <tibble [3 × 2]>

spec <- linear_reg(engine = "quantreg") %>%
  set_mode(mode = "quantile regression", quantile_level = c(.5))
out <- fit(spec, formula = y ~ x + z, data = tib)
predict(out, new_data = tib[1:5, ])
#> Warning in rep(quantile_level, each = n): first element used of 'each' argument
#> Error in 1:n: argument of length 0

^{Created on 2024-08-29 with reprex v2.1.1}

topepo · 2024-09-01T21:07:24Z

One thing with quantreg::rq() is that it produces different output objects with just one quantile_level

Yep! I have a TODO in there for that. I always check that and make sure that it works with a 1 row data frame/matrix.

For the random forest engine, do you want to branch off of this branch and make a PR into parsnip (after this one is merged)?

Happy to keep it somewhere else but this puts the support overhead to us. Either is fine with us. I only added quantreg to the PR since I needed an engine to test against. We can keep that in your repo too.

topepo · 2024-09-16T21:57:03Z

Some notes on changes for existing quantile prediction methods: tidymodels/parsnip#1203

jrosell · 2024-09-17T10:44:06Z

I currently use lightgbm engine parameters objective='quantile' and alpha using regression mode. Is the plan to support this approach and quantile mode too?

topepo · 2024-09-17T16:29:57Z

@jrosell I believe that @dajmcdon has an interest in including more engines (#30 (comment)).

lightgbm would require a separate model fit for each requested quantile level.

implementation notes

01b32c6

topepo requested review from hfrick and simonpcouch August 5, 2024 17:36

ryantibs reviewed Aug 6, 2024

View reviewed changes

dajmcdon reviewed Aug 8, 2024

View reviewed changes

This was referenced Aug 8, 2024

Refactor dist_quantiles() cmu-delphi/epipredict#368

Closed

Test quantile reg mode implementation cmu-delphi/epipredict#369

Closed

Grf qr engine cmu-delphi/epipredict#360

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantile regression implementation notes #30

quantile regression implementation notes #30

topepo commented Aug 5, 2024

topepo commented Aug 5, 2024

ryantibs left a comment

ryantibs Aug 6, 2024

topepo Aug 7, 2024

dajmcdon Aug 8, 2024

ryantibs Aug 6, 2024

dajmcdon Aug 8, 2024

dajmcdon Aug 8, 2024

topepo commented Aug 15, 2024

dajmcdon commented Aug 15, 2024

topepo commented Aug 29, 2024

dajmcdon commented Aug 29, 2024

topepo commented Sep 1, 2024

topepo commented Sep 16, 2024

jrosell commented Sep 17, 2024 •

edited

Loading

topepo commented Sep 17, 2024


		## yardstick

		We need one or more model performance metrics for quantile loss that work across all quantiles. They will need a new class too (say `“quantile_metric”`).

quantile regression implementation notes #30

Are you sure you want to change the base?

quantile regression implementation notes #30

Conversation

topepo commented Aug 5, 2024

topepo commented Aug 5, 2024

ryantibs left a comment

Choose a reason for hiding this comment

ryantibs Aug 6, 2024

Choose a reason for hiding this comment

topepo Aug 7, 2024

Choose a reason for hiding this comment

dajmcdon Aug 8, 2024

Choose a reason for hiding this comment

ryantibs Aug 6, 2024

Choose a reason for hiding this comment

dajmcdon Aug 8, 2024

Choose a reason for hiding this comment

dajmcdon Aug 8, 2024

Choose a reason for hiding this comment

topepo commented Aug 15, 2024

dajmcdon commented Aug 15, 2024

topepo commented Aug 29, 2024

dajmcdon commented Aug 29, 2024

topepo commented Sep 1, 2024

topepo commented Sep 16, 2024

jrosell commented Sep 17, 2024 • edited Loading

topepo commented Sep 17, 2024

jrosell commented Sep 17, 2024 •

edited

Loading