Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] adding support for the {mmrm} package #1000

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

kkmann
Copy link

@kkmann kkmann commented Jan 10, 2024

Hi,

this is an attempt to implement support for the {mmrm} package https://github.com/openpharma/mmrm following the steps outlined in the {marginaleffects} documentation https://marginaleffects.com/vignettes/extensions.html#marginaleffects-extension.

I am still unclear about the role of the option setting.

options("marginaleffects_model_classes" = "lm_manual")

Is it expected that the user triggers this everytime they want to use {marginaleffects} with {mmrm}? This does not seem to be the case for other extensions.

My test case is

library(mmrm)
fit <- mmrm(
  formula = FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)
predictions(fit, newdata = insight::get_data(fit))

Also tagging @lang-benjamin @danielinteractive

@kkmann
Copy link
Author

kkmann commented Jan 10, 2024

Currently my test case fails irrespective of me setting

options("marginaleffects_model_classes" = "mmrm")

or not.

Error: Unable to compute predicted values with this model. You can try to supply a
  different dataset to the `newdata` argument.
  
  Bug Tracker: https://github.com/vincentarelbundock/marginaleffects/issues

I did not open a separate issue in the bug tracker since this is likely a bug caused by my faulty implementation of the {mmrm} interface.

@vincentarelbundock vincentarelbundock force-pushed the main branch 2 times, most recently from f28ed19 to 632dfc5 Compare January 10, 2024 15:37
@vincentarelbundock
Copy link
Owner

Thanks for taking the time to do this, I really appreciate it!

The options() call is only necessary for temporary user-defined models. When we merge support in marginaleffects itself, we can do the following:

  1. Add the package to the permanent list of supported models, to avoid calling options() in every session: R/sanitize_model.R
  2. Add the package to the vignette of supported models by editing: data-raw/supported_models.csv
  3. Add a bullet point to the news file: NEWS.md

Does your example work when you just supply data frame directly: newdata=fev_data?

@lang-benjamin
Copy link

Thanks, just adding some thoughts here.

Does your example work when you just supply data frame directly: newdata=fev_data?
Yes, that should work (fev_data has missings for the response and there will then be predictions for it).

The question to me is what the general default strategy regarding missing data should be. The element model$data will not omit missing data, whereas insight:get_data(model) will omit it (i.e. it uses the actual data used to fit the model). If we are interested in the actual data that was used to fit the model, instead of insight::get_data one could just use model.frame(model). This would not rely on the insight package.

@vincentarelbundock
Copy link
Owner

Ideally, on my end, I would like to go through insight::get_data(), and use listwise deletion by default. This would be consistent with how all the other models are treated in the package. In any case, it's trivial for users to supply another dataset to the newdata argument, if they want predictions for observations with missing response.

Also, looking at the code quickly, I suspect that the reason predictions don't work in the current PR is that get_predict() just calls predict(). But that function should return a data frame with two columns: rowid and estimate. If you only return a vector, it will like return this kind of error.

It might also be worth checking if you really need all those methods. Sometimes, the defaults work perfectly well. Just try something like:

library(marginaleffects)
get_predict.default(fit)

@kkmann
Copy link
Author

kkmann commented Jan 24, 2024

Thanks for the feeback @vincentarelbundock, made some adjustments and starting to wrap my head around this. If I see it correctly we still have two things to address here.

  • we still need to add the information into the csv file @lang-benjamin
  • we need to look into the omitting missing data either via insight::get_data() or custom

@vincentarelbundock
Copy link
Owner

I was looking at this PR today and noticed that the current version does not return standard errors. The underlying reason for this is that the Jacobian is full of zeros:

library(marginaleffects)
library(mmrm)
fit <- mmrm(FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
            data = fev_data,
            weights = fev_data$WEIGHTS,
            method = "Kenward-Roger")
p <- predictions(fit)
attr(p, "jacobian")

The reason for that, is that get_predict() does not appear to make different predictions after we change the coefficient values stored in the model object using set_coef(). Under the hood, the way we compute standard errors is by slightly tweaking the coefficient estimates in a model, and making different predictions with the modified model object. This doesn't seem to work at the moment.

One possibility is that this is related to the problem with glmmTMB: #1064

I don't actually know if this is related, but I noticed that both glmmTMB and mmrm use Template Model Builder (TMB). If that's the source of the issue, it might be bad news... See the thread on glmmTMB for links to different attempts and problem definition.

@danielinteractive
Copy link

Thanks @vincentarelbundock and @kkmann and all, just looking at this thread as a bystander - yes, structurally the two packages will work similarly. However I also wonder if in this case, because we use TMB, there would be much better automatic differentiation ways to generate the Jacobian you need.

@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Apr 7, 2024

@danielinteractive Thanks for chiming in!

Yes, that would certainly be more efficient. I think it makes sense for very common quantities like predictions. But the issue is that marginaleffects can compute literally dozens of quantities of interest (slopes, risk differences, marginal means, etc.), and we have to build distinct jacobians for all of these quantities.

My guess is that this would require a ton of work on the backend. But then again, I don't know TMB at all, so...

@clarkliming
Copy link

PR created at kkmann#1 to fix the issue that the jacobian is 0

reason is that predict in mmrm is always returning a conditional prediction (conditional on observed values)

@vincentarelbundock
Copy link
Owner

Sorry I left this PR to linger for so long. I looked into this today, and things mostly seem to work, but there are two lingering issues. The first is probably easy to solve. Not sure about the second.

library("mmrm")

fit <- mmrm(
  formula = FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)

newdata=NULL needs to include the response variable by default

By default, when the user leaves newdata=NULL (the default), marginaleffects uses sanitize_newdata() to extract the original data from the model object. Typically, this does not include the response, but it appears that the predict() method supplied by mmrm requires the response to make predictions.

I’m not sure if that’s required by the method. If it’s not actually required, it would be best if this were fixed upstream in mmrm. If this is a strong requirement, then we can implement a hack in marginaleffects to include the response.

Default breaks:

avg_predictions(fit)
> Error: Unable to compute predicted values with this model. You can try to supply a
>   different dataset to the `newdata` argument. This error was also raised:
>   
>   object 'FEV1' not found
>   
>   Bug Tracker: https://github.com/vincentarelbundock/marginaleffects/issues

Supplying the dataset explicitly is a workaround:

avg_predictions(fit, newdata = fev_data)
> 
>  Estimate Std. Error   z Pr(>|z|)   S 2.5 % 97.5 %
>      42.4      0.329 129   <0.001 Inf  41.8   43.1
> 
> Type:  response 

Duplicate indices

Internally, marginaleffects computes comparisons/slopes for multiple terms or comparisons in one shot, by stacking datasets. That way, we can compute the slope with respect to X and the slope with respect to Z in one shot, which is much more computationally efficient, since it doesn’t require us calling predict() too many times. Unfortunately, this breaks with mmrm, because the predict() method supplied by that package requires unique indices, and “stacking” duplicates the indices.

This means that we can compute the “effect” of a focal variable by specifying that varible explicitly, it

avg_comparisons(fit, 
    variables = "SEX",
    newdata = fev_data)
> 
>  Estimate Std. Error     z Pr(>|z|)   S  2.5 % 97.5 %
>     0.326      0.532 0.613     0.54 0.9 -0.717   1.37
> 
> Term: SEX
> Type:  response 
> Comparison: mean(Female) - mean(Male)

It also works if we specify a comparison between only two levels for a single focal variable:

avg_comparisons(fit, 
    variables = list("RACE" = c("Asian", "White")),
    newdata = fev_data)
> 
>  Estimate Std. Error    z Pr(>|z|)    S 2.5 % 97.5 %
>      5.64      0.666 8.48   <0.001 55.3  4.34   6.95
> 
> Term: RACE
> Type:  response 
> Comparison: mean(White) - mean(Asian)

But it doesn’t work if we try to compute the two possible contrasts at the same time: (1) Asian vs White and (2) Black vs. White.

avg_comparisons(fit, 
    variables = "RACE",
    newdata = fev_data)
>   Error in h_mmrm_tmb_data(object$formula_parts, newdata, weights = rep(1, : time
>   points have to be unique for each subject, detected following duplicates in data:

This error is related to the lines of code after this one: https://github.com/openpharma/mmrm/blob/424810d88b0507e9e88242b61d343852358b2df2/R/tmb.R#L122

I don’t know mmrm to know if this is essential. What happens if we bypass that check? Will the hardcoded ordering yield incorrect estimates?

@danielinteractive
Copy link

Thanks @vincentarelbundock , looking into this now on the mmrm end - thanks for the detailed pointers!

@danielinteractive
Copy link

Thanks again @vincentarelbundock! While we will be able to fix the first problem quickly (indeed this is a bug in mmrm), the second problem cannot easily be fixed on the mmrm side I think. For non-spatial covariance structures we need unique IDs/visit variable combinations here, otherwise downstream computations are not well defined. Is there maybe a point in the marginaleffects stack to define a method for mmrm objects such that in the stacking some kind of dummy renaming of the IDs is performed, ensuring unique IDs/visits for calling mmrm afterwards?

@vincentarelbundock
Copy link
Owner

Cool. Thanks for taking a look and finding a solution for issue 1.

I just spend a bunch of time trying to find a place to insert a hack that would append some random string to the subject ID column. Unfortunately, it's very hard. The problem is that this stacking logic is very "deep" in the marginaleffects. There isn't just one stacking. We stack when computing contrasts for different variables, but we also stack when computing several contrasts for a single variable (ex: different factor levels). And the stacking code is separate for different data types: numeric, logical, factor, etc.

So in order to support this, I would have to insert complex hacks in a dozen places in the code base, and don't think that's sensible design for software that wants to support 100+ model classes.

To be sure, that's less convenient, but maybe it's not the end of the world that users have to define a single contrast they are interested in, rather than get all of the them at once. This can probably be documented.

@danielinteractive
Copy link

Hi @vincentarelbundock , I discussed the issue openpharma/mmrm#461 further with @clarkliming , and it seems that a more principled solution to that will likely also fix the issue 2 here. So let us first sort that out and maybe both issues 1 and 2 will be solved in mmrm.

@vincentarelbundock
Copy link
Owner

Oh wow, OK. I really appreciate your help in working on this. It'll be really cool.if we can push this through the fi ish line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants