[DRAFT] adding support for the {mmrm} package #1000

kkmann · 2024-01-10T11:14:39Z

Hi,

this is an attempt to implement support for the {mmrm} package https://github.com/openpharma/mmrm following the steps outlined in the {marginaleffects} documentation https://marginaleffects.com/vignettes/extensions.html#marginaleffects-extension.

I am still unclear about the role of the option setting.

options("marginaleffects_model_classes" = "lm_manual")

Is it expected that the user triggers this everytime they want to use {marginaleffects} with {mmrm}? This does not seem to be the case for other extensions.

My test case is

library(mmrm)
fit <- mmrm(
  formula = FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)
predictions(fit, newdata = insight::get_data(fit))

Also tagging @lang-benjamin @danielinteractive

kkmann · 2024-01-10T11:16:04Z

Currently my test case fails irrespective of me setting

options("marginaleffects_model_classes" = "mmrm")

or not.

Error: Unable to compute predicted values with this model. You can try to supply a
  different dataset to the `newdata` argument.
  
  Bug Tracker: https://github.com/vincentarelbundock/marginaleffects/issues

I did not open a separate issue in the bug tracker since this is likely a bug caused by my faulty implementation of the {mmrm} interface.

vincentarelbundock · 2024-01-10T15:49:14Z

Thanks for taking the time to do this, I really appreciate it!

The options() call is only necessary for temporary user-defined models. When we merge support in marginaleffects itself, we can do the following:

Add the package to the permanent list of supported models, to avoid calling options() in every session: R/sanitize_model.R
Add the package to the vignette of supported models by editing: data-raw/supported_models.csv
Add a bullet point to the news file: NEWS.md

Does your example work when you just supply data frame directly: newdata=fev_data?

lang-benjamin · 2024-01-11T08:05:00Z

Thanks, just adding some thoughts here.

Does your example work when you just supply data frame directly: newdata=fev_data?
Yes, that should work (fev_data has missings for the response and there will then be predictions for it).

The question to me is what the general default strategy regarding missing data should be. The element model$data will not omit missing data, whereas insight:get_data(model) will omit it (i.e. it uses the actual data used to fit the model). If we are interested in the actual data that was used to fit the model, instead of insight::get_data one could just use model.frame(model). This would not rely on the insight package.

vincentarelbundock · 2024-01-12T01:28:03Z

Ideally, on my end, I would like to go through insight::get_data(), and use listwise deletion by default. This would be consistent with how all the other models are treated in the package. In any case, it's trivial for users to supply another dataset to the newdata argument, if they want predictions for observations with missing response.

Also, looking at the code quickly, I suspect that the reason predictions don't work in the current PR is that get_predict() just calls predict(). But that function should return a data frame with two columns: rowid and estimate. If you only return a vector, it will like return this kind of error.

It might also be worth checking if you really need all those methods. Sometimes, the defaults work perfectly well. Just try something like:

library(marginaleffects)
get_predict.default(fit)

kkmann · 2024-01-24T14:45:24Z

Thanks for the feeback @vincentarelbundock, made some adjustments and starting to wrap my head around this. If I see it correctly we still have two things to address here.

we still need to add the information into the csv file @lang-benjamin
we need to look into the omitting missing data either via insight::get_data() or custom

vincentarelbundock · 2024-04-06T19:37:28Z

I was looking at this PR today and noticed that the current version does not return standard errors. The underlying reason for this is that the Jacobian is full of zeros:

library(marginaleffects)
library(mmrm)
fit <- mmrm(FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
            data = fev_data,
            weights = fev_data$WEIGHTS,
            method = "Kenward-Roger")
p <- predictions(fit)
attr(p, "jacobian")

The reason for that, is that get_predict() does not appear to make different predictions after we change the coefficient values stored in the model object using set_coef(). Under the hood, the way we compute standard errors is by slightly tweaking the coefficient estimates in a model, and making different predictions with the modified model object. This doesn't seem to work at the moment.

One possibility is that this is related to the problem with glmmTMB: #1064

I don't actually know if this is related, but I noticed that both glmmTMB and mmrm use Template Model Builder (TMB). If that's the source of the issue, it might be bad news... See the thread on glmmTMB for links to different attempts and problem definition.

danielinteractive · 2024-04-07T18:11:37Z

Thanks @vincentarelbundock and @kkmann and all, just looking at this thread as a bystander - yes, structurally the two packages will work similarly. However I also wonder if in this case, because we use TMB, there would be much better automatic differentiation ways to generate the Jacobian you need.

vincentarelbundock · 2024-04-07T19:42:46Z

@danielinteractive Thanks for chiming in!

Yes, that would certainly be more efficient. I think it makes sense for very common quantities like predictions. But the issue is that marginaleffects can compute literally dozens of quantities of interest (slopes, risk differences, marginal means, etc.), and we have to build distinct jacobians for all of these quantities.

My guess is that this would require a ton of work on the backend. But then again, I don't know TMB at all, so...

clarkliming · 2024-04-22T07:07:32Z

PR created at kkmann#1 to fix the issue that the jacobian is 0

reason is that predict in mmrm is always returning a conditional prediction (conditional on observed values)

vincentarelbundock · 2024-08-11T02:07:42Z

Sorry I left this PR to linger for so long. I looked into this today, and things mostly seem to work, but there are two lingering issues. The first is probably easy to solve. Not sure about the second.

library("mmrm")

fit <- mmrm(
  formula = FEV1 ~ RACE + SEX + ARMCD * AVISIT + us(AVISIT | USUBJID),
  data = fev_data
)

`newdata=NULL` needs to include the response variable by default

By default, when the user leaves newdata=NULL (the default), marginaleffects uses sanitize_newdata() to extract the original data from the model object. Typically, this does not include the response, but it appears that the predict() method supplied by mmrm requires the response to make predictions.

I’m not sure if that’s required by the method. If it’s not actually required, it would be best if this were fixed upstream in mmrm. If this is a strong requirement, then we can implement a hack in marginaleffects to include the response.

Default breaks:

avg_predictions(fit)
> Error: Unable to compute predicted values with this model. You can try to supply a
>   different dataset to the `newdata` argument. This error was also raised:
>   
>   object 'FEV1' not found
>   
>   Bug Tracker: https://github.com/vincentarelbundock/marginaleffects/issues

Supplying the dataset explicitly is a workaround:

avg_predictions(fit, newdata = fev_data)
> 
>  Estimate Std. Error   z Pr(>|z|)   S 2.5 % 97.5 %
>      42.4      0.329 129   <0.001 Inf  41.8   43.1
> 
> Type:  response

Duplicate indices

Internally, marginaleffects computes comparisons/slopes for multiple terms or comparisons in one shot, by stacking datasets. That way, we can compute the slope with respect to X and the slope with respect to Z in one shot, which is much more computationally efficient, since it doesn’t require us calling predict() too many times. Unfortunately, this breaks with mmrm, because the predict() method supplied by that package requires unique indices, and “stacking” duplicates the indices.

This means that we can compute the “effect” of a focal variable by specifying that varible explicitly, it

avg_comparisons(fit, 
    variables = "SEX",
    newdata = fev_data)
> 
>  Estimate Std. Error     z Pr(>|z|)   S  2.5 % 97.5 %
>     0.326      0.532 0.613     0.54 0.9 -0.717   1.37
> 
> Term: SEX
> Type:  response 
> Comparison: mean(Female) - mean(Male)

It also works if we specify a comparison between only two levels for a single focal variable:

avg_comparisons(fit, 
    variables = list("RACE" = c("Asian", "White")),
    newdata = fev_data)
> 
>  Estimate Std. Error    z Pr(>|z|)    S 2.5 % 97.5 %
>      5.64      0.666 8.48   <0.001 55.3  4.34   6.95
> 
> Term: RACE
> Type:  response 
> Comparison: mean(White) - mean(Asian)

But it doesn’t work if we try to compute the two possible contrasts at the same time: (1) Asian vs White and (2) Black vs. White.

avg_comparisons(fit, 
    variables = "RACE",
    newdata = fev_data)
>   Error in h_mmrm_tmb_data(object$formula_parts, newdata, weights = rep(1, : time
>   points have to be unique for each subject, detected following duplicates in data:

This error is related to the lines of code after this one: https://github.com/openpharma/mmrm/blob/424810d88b0507e9e88242b61d343852358b2df2/R/tmb.R#L122

I don’t know mmrm to know if this is essential. What happens if we bypass that check? Will the hardcoded ordering yield incorrect estimates?

danielinteractive · 2024-08-12T12:53:45Z

Thanks @vincentarelbundock , looking into this now on the mmrm end - thanks for the detailed pointers!

danielinteractive · 2024-08-17T14:12:22Z

Thanks again @vincentarelbundock! While we will be able to fix the first problem quickly (indeed this is a bug in mmrm), the second problem cannot easily be fixed on the mmrm side I think. For non-spatial covariance structures we need unique IDs/visit variable combinations here, otherwise downstream computations are not well defined. Is there maybe a point in the marginaleffects stack to define a method for mmrm objects such that in the stacking some kind of dummy renaming of the IDs is performed, ensuring unique IDs/visits for calling mmrm afterwards?

* issue1204 * test fix

…n()` (vincentarelbundock#1206)

* support glmtoolbox::glmgee() * fixup test * test fixup * supported models

vincentarelbundock · 2024-08-31T02:50:19Z

Cool. Thanks for taking a look and finding a solution for issue 1.

I just spend a bunch of time trying to find a place to insert a hack that would append some random string to the subject ID column. Unfortunately, it's very hard. The problem is that this stacking logic is very "deep" in the marginaleffects. There isn't just one stacking. We stack when computing contrasts for different variables, but we also stack when computing several contrasts for a single variable (ex: different factor levels). And the stacking code is separate for different data types: numeric, logical, factor, etc.

So in order to support this, I would have to insert complex hacks in a dozen places in the code base, and don't think that's sensible design for software that wants to support 100+ model classes.

To be sure, that's less convenient, but maybe it's not the end of the world that users have to define a single contrast they are interested in, rather than get all of the them at once. This can probably be documented.

danielinteractive · 2024-09-11T02:25:03Z

Hi @vincentarelbundock , I discussed the issue openpharma/mmrm#461 further with @clarkliming , and it seems that a more principled solution to that will likely also fix the issue 2 here. So let us first sort that out and maybe both issues 1 and 2 will be solved in mmrm.

vincentarelbundock · 2024-09-11T02:53:28Z

Oh wow, OK. I really appreciate your help in working on this. It'll be really cool.if we can push this through the fi ish line

initial attempt to implement interface

34ba6a5

vincentarelbundock force-pushed the main branch 2 times, most recently from f28ed19 to 632dfc5 Compare January 10, 2024 15:37

Kunzmann added 2 commits January 24, 2024 14:38

register method correctly and conforming with API of get_predict()

98b8038

added news item

1c7bb76

kkmann and others added 4 commits January 24, 2024 08:45

Merge branch 'main' into mmrm-extension

3cc027f

using model.frame to avoid insights pkg

93b6cce

switch back to model$data for now

52116fe

Merge branch 'main' into mmrm-extension

ade1672

danielinteractive mentioned this pull request Apr 8, 2024

support for {marginaleffects} openpharma/mmrm#437

Closed

update predictions

68c8486

clarkliming and others added 3 commits April 22, 2024 12:17

use arg conditional for predict

5a4ecc4

Merge branch 'main' into mmrm-extension

73eca1d

sanity + docs

780747d

vincentarelbundock force-pushed the main branch from 15b1e28 to eed408f Compare June 14, 2024 13:35

vincentarelbundock mentioned this pull request Jun 20, 2024

Support mmrm #570

Closed

vincentarelbundock added 2 commits August 10, 2024 21:33

Merge branch 'main' into mmrm-extension

ed55867

bump + test

41f4cc9

vincentarelbundock mentioned this pull request Aug 11, 2024

Support new models #49

Open

danielinteractive mentioned this pull request Aug 17, 2024

predict() should not require response variable to be present in newdata openpharma/mmrm#461

Closed

vincentarelbundock and others added 12 commits August 30, 2024 22:24

tinytable print

f7109ad

mgcv allows discrete argument

17b5d10

comparisons manual computation

3f5467f

Support class model_fit directly vincentarelbundock#1194

6670a54

bump

970cc4e

bump

a4e87a7

issue vincentarelbundock#1204 (vincentarelbundock#1205)

95ef3e9

* issue1204 * test fix

issue vincentarelbundock#1202 default value of type `plot_compariso…

dd13491

…n()` (vincentarelbundock#1206)

Support glmtoolbox::glmgee() (vincentarelbundock#1207)

60ed303

* support glmtoolbox::glmgee() * fixup test * test fixup * supported models

news

be5ba13

Support glmtoolbox::glmgee() (vincentarelbundock#1207)

c5290e8

* support glmtoolbox::glmgee() * fixup test * test fixup * supported models

Merge branch 'main' into mmrm-extension

c6730c4

Merge branch 'main' into mmrm-extension

9dd18c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] adding support for the {mmrm} package #1000

[DRAFT] adding support for the {mmrm} package #1000

kkmann commented Jan 10, 2024 •

edited

Loading

kkmann commented Jan 10, 2024 •

edited

Loading

vincentarelbundock commented Jan 10, 2024

lang-benjamin commented Jan 11, 2024

vincentarelbundock commented Jan 12, 2024

kkmann commented Jan 24, 2024

vincentarelbundock commented Apr 6, 2024

danielinteractive commented Apr 7, 2024

vincentarelbundock commented Apr 7, 2024 •

edited

Loading

clarkliming commented Apr 22, 2024

vincentarelbundock commented Aug 11, 2024

danielinteractive commented Aug 12, 2024

danielinteractive commented Aug 17, 2024

vincentarelbundock commented Aug 31, 2024

danielinteractive commented Sep 11, 2024

vincentarelbundock commented Sep 11, 2024

[DRAFT] adding support for the {mmrm} package #1000

Are you sure you want to change the base?

[DRAFT] adding support for the {mmrm} package #1000

Conversation

kkmann commented Jan 10, 2024 • edited Loading

kkmann commented Jan 10, 2024 • edited Loading

vincentarelbundock commented Jan 10, 2024

lang-benjamin commented Jan 11, 2024

vincentarelbundock commented Jan 12, 2024

kkmann commented Jan 24, 2024

vincentarelbundock commented Apr 6, 2024

danielinteractive commented Apr 7, 2024

vincentarelbundock commented Apr 7, 2024 • edited Loading

clarkliming commented Apr 22, 2024

vincentarelbundock commented Aug 11, 2024

newdata=NULL needs to include the response variable by default

Duplicate indices

danielinteractive commented Aug 12, 2024

danielinteractive commented Aug 17, 2024

vincentarelbundock commented Aug 31, 2024

danielinteractive commented Sep 11, 2024

vincentarelbundock commented Sep 11, 2024

kkmann commented Jan 10, 2024 •

edited

Loading

kkmann commented Jan 10, 2024 •

edited

Loading

vincentarelbundock commented Apr 7, 2024 •

edited

Loading

`newdata=NULL` needs to include the response variable by default