Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: move check_args() up into new_model_spec() #1098

Closed
EmilHvitfeldt opened this issue Apr 2, 2024 · 4 comments
Closed

Idea: move check_args() up into new_model_spec() #1098

EmilHvitfeldt opened this issue Apr 2, 2024 · 4 comments
Labels
feature a feature request or enhancement

Comments

@EmilHvitfeldt
Copy link
Member

EmilHvitfeldt commented Apr 2, 2024

Adding a call to check_args() on this line https://github.com/tidymodels/parsnip/blob/main/R/misc.R#L350 produces the following results. The only questionable use-case is the last one, which people might use.

Still looking at other places where this would fail

library(parsnip)
library(discrim)

discrim_linear(penalty = -2)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

discrim_linear(penalty = tune()) %>% 
  set_args(penalty = 1)
#> Linear Discriminant Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = 1
#> 
#> Computational engine: MASS

discrim_linear(penalty = tune())
#> Linear Discriminant Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = tune()
#> 
#> Computational engine: MASS

discrim_linear(penalty = 1) %>% 
  set_args(penalty = -2)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

discrim_linear(penalty = -2) %>% 
  set_args(penalty = 1)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.
@EmilHvitfeldt EmilHvitfeldt added the feature a feature request or enhancement label Apr 2, 2024
@EmilHvitfeldt
Copy link
Member Author

right now the only blocker we have found for this, is dependent on how we handle #1099

@simonpcouch
Copy link
Contributor

Very much agreed on the linked issue that we can do a better job checking arguments in check_args().

I think it's important to note here that the pattern in check_args() methods is to eval_tidy() each argument:

parsnip/R/boost_tree.R

Lines 167 to 169 in dc9a66e

check_args.boost_tree <- function(object) {
args <- lapply(object$args, rlang::eval_tidy)

If we're able to eval_tidy() each argument to model specification functions as soon as they're inputted, then there's no reason we should be capturing them as quosures in the first place. The reason we capture them as quosures is so that we can evaluate them in an environment that we construct, the eval_env, at fit() time:

parsnip/R/fit.R

Lines 156 to 158 in dc9a66e

# Create an environment with the evaluated argument objects. This will be
# used when a model call is made later.
eval_env <- rlang::env()

This environment supports defining arguments in relation to the data to be fit()ted to, like:

library(parsnip)

boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% fit(mpg ~ ., mtcars)
#> parsnip model object
#> 
#> ##### xgb.Booster
#> raw: 21.6 Kb 
#> call:
#>   xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
#>     colsample_bytree = 1, colsample_bynode = 0.5, min_child_weight = 1, 
#>     subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist, 
#>     verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#>   eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "0.5", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> callbacks:
#>   cb.evaluation.log()
#> # of features: 10 
#> niter: 15
#> nfeatures : 10 
#> evaluation_log:
#>      iter training_rmse
#>     <num>         <num>
#>         1    14.9313149
#>         2    10.9841618
#> ---                    
#>        14     0.6109443
#>        15     0.5196238

...where .x() is a data descriptor. As of now, check_args() knows that it's unable to evaluate such an expression:

boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% check_args()
#> Error in `descr_env$.x()`:
#> ! Descriptor context not set

Created on 2024-04-03 with reprex v2.1.0

Calling check_args() will not be able to give the same output when called inside of new_model_spec() because it doesn't yet have access to the data--the fact that the behavior might differ between these two contexts, to me, signals that we ought not to check_args() until we have access to the data at fit().

There are definitely improvements to be made to argument checking, as in the loophole that you pointed out in the linked issue. I think that's a good place to direct focus for now. :)

@simonpcouch
Copy link
Contributor

Given the implications for data descriptors, I'm going to go ahead and close. Still very much on board for #1095. :)

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants