Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with unique case.weights and quantile regression #564

Open
lindeloev opened this issue May 10, 2021 · 3 comments
Open

Error with unique case.weights and quantile regression #564

lindeloev opened this issue May 10, 2021 · 3 comments

Comments

@lindeloev
Copy link

Here's an issue I can't quite narrow down. It seems that ranger fails when all three of these are fulfilled: (1) there are unique case.weights, (2) quantreg = TRUE, and keep.inbag = TRUE:

> mtcars$cyl[1] = 99
> ranger::ranger(mpg ~., mtcars, case.weights = mtcars$cyl, quantreg = TRUE, keep.inbag = TRUE)
# Error in ranger::ranger(mpg ~ ., mtcars, case.weights = mtcars$cyl, quantreg = TRUE,  : 
#   Error: Too few trees for out-of-bag quantile regression.

It runs with mtcars$cyl[1:2] = 99.

However, ranger has no problems with unique case.weights here:

ranger::ranger(Species ~., iris, case.weights = iris$Sepal.Width, keep.inbag = TRUE)
ranger::ranger(Sepal.Length ~., iris, case.weights = iris$Sepal.Width, keep.inbag = TRUE)

I'm experiencing the former error in a number of analyses I'm currently running. Thanks for an awesome package!

@mnwright
Copy link
Member

That's expected behavior. The case.weights control the sampling of observations for the trees. If you set mtcars$cyl[1] = 99 and use that as case weights, observations 1 is selected in every tree and never out-of-bag. Thus, we cannot do out-of-bag quantile regression for that observation.

However, thinking about that, shouldn't we make this a warning instead of an error and just assign NA to the affected observations?

@lindeloev
Copy link
Author

Ah, I see. Yes, a warning + NA would be great! I guess I'll just need to do some pmin()/pmax() or scale() on the weights to restrict them to a narrower range. And raise num.tress to increase the probability that some trees include low-weight cases.

BTW, I just installed the dev version and I'm blown away by the speed improvements for quantile regression!

@mgrossouvre
Copy link

mgrossouvre commented Dec 3, 2024

First of all, thank you @mnwright for this package. Its performance is especially noticeable for quantile regression.

Except if I missed something, the issue is still the same. NA's would help a great deal, as suggested. In particular, it can be very tricky to find an appropriate number of trees and, even then, there is no guarantee that it would work. In my case 370 000 observations, with 320 000 having same weight 1 and the rest having weights ranging from 2 to 874.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants