Idea: move `check_args()` up into `new_model_spec()` #1098

EmilHvitfeldt · 2024-04-02T17:00:57Z

Adding a call to check_args() on this line https://github.com/tidymodels/parsnip/blob/main/R/misc.R#L350 produces the following results. The only questionable use-case is the last one, which people might use.

Still looking at other places where this would fail

library(parsnip)
library(discrim)

discrim_linear(penalty = -2)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

discrim_linear(penalty = tune()) %>% 
  set_args(penalty = 1)
#> Linear Discriminant Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = 1
#> 
#> Computational engine: MASS

discrim_linear(penalty = tune())
#> Linear Discriminant Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = tune()
#> 
#> Computational engine: MASS

discrim_linear(penalty = 1) %>% 
  set_args(penalty = -2)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

discrim_linear(penalty = -2) %>% 
  set_args(penalty = 1)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

The text was updated successfully, but these errors were encountered:

EmilHvitfeldt · 2024-04-03T02:01:59Z

right now the only blocker we have found for this, is dependent on how we handle #1099

simonpcouch · 2024-04-03T14:59:10Z

Very much agreed on the linked issue that we can do a better job checking arguments in check_args().

I think it's important to note here that the pattern in check_args() methods is to eval_tidy() each argument:

parsnip/R/boost_tree.R

Lines 167 to 169 in dc9a66e

    
           check_args.boost_tree <- function(object) { 
        
             args <- lapply(object$args, rlang::eval_tidy)

If we're able to eval_tidy() each argument to model specification functions as soon as they're inputted, then there's no reason we should be capturing them as quosures in the first place. The reason we capture them as quosures is so that we can evaluate them in an environment that we construct, the eval_env, at fit() time:

parsnip/R/fit.R

Lines 156 to 158 in dc9a66e

    
           # Create an environment with the evaluated argument objects. This will be 
        
           # used when a model call is made later. 
        
           eval_env <- rlang::env()

This environment supports defining arguments in relation to the data to be fit()ted to, like:

library(parsnip)

boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% fit(mpg ~ ., mtcars)
#> parsnip model object
#> 
#> ##### xgb.Booster
#> raw: 21.6 Kb 
#> call:
#>   xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
#>     colsample_bytree = 1, colsample_bynode = 0.5, min_child_weight = 1, 
#>     subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist, 
#>     verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#>   eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "0.5", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> callbacks:
#>   cb.evaluation.log()
#> # of features: 10 
#> niter: 15
#> nfeatures : 10 
#> evaluation_log:
#>      iter training_rmse
#>     <num>         <num>
#>         1    14.9313149
#>         2    10.9841618
#> ---                    
#>        14     0.6109443
#>        15     0.5196238

...where .x() is a data descriptor. As of now, check_args() knows that it's unable to evaluate such an expression:

boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% check_args()
#> Error in `descr_env$.x()`:
#> ! Descriptor context not set

^{Created on 2024-04-03 with reprex v2.1.0}

Calling check_args() will not be able to give the same output when called inside of new_model_spec() because it doesn't yet have access to the data--the fact that the behavior might differ between these two contexts, to me, signals that we ought not to check_args() until we have access to the data at fit().

There are definitely improvements to be made to argument checking, as in the loophole that you pointed out in the linked issue. I think that's a good place to direct focus for now. :)

simonpcouch · 2024-09-09T20:06:45Z

Given the implications for data descriptors, I'm going to go ahead and close. Still very much on board for #1095. :)

github-actions · 2024-09-25T01:06:35Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

EmilHvitfeldt added the feature a feature request or enhancement label Apr 2, 2024

simonpcouch closed this as completed Sep 9, 2024

github-actions bot locked and limited conversation to collaborators Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: move `check_args()` up into `new_model_spec()` #1098

Idea: move `check_args()` up into `new_model_spec()` #1098

EmilHvitfeldt commented Apr 2, 2024 •

edited

Loading

EmilHvitfeldt commented Apr 3, 2024

simonpcouch commented Apr 3, 2024

simonpcouch commented Sep 9, 2024

github-actions bot commented Sep 25, 2024

Idea: move check_args() up into new_model_spec() #1098

Idea: move check_args() up into new_model_spec() #1098

Comments

EmilHvitfeldt commented Apr 2, 2024 • edited Loading

EmilHvitfeldt commented Apr 3, 2024

simonpcouch commented Apr 3, 2024

simonpcouch commented Sep 9, 2024

github-actions bot commented Sep 25, 2024

Idea: move `check_args()` up into `new_model_spec()` #1098

Idea: move `check_args()` up into `new_model_spec()` #1098

EmilHvitfeldt commented Apr 2, 2024 •

edited

Loading