Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_model() and other functions not working with only post-"hurdle" glmmTMB model #781

Open
MaBKo opened this issue Nov 26, 2024 · 5 comments · May be fixed by #782
Open

check_model() and other functions not working with only post-"hurdle" glmmTMB model #781

MaBKo opened this issue Nov 26, 2024 · 5 comments · May be fixed by #782
Assignees
Labels
Bug 🐛 Something isn't working

Comments

@MaBKo
Copy link

MaBKo commented Nov 26, 2024

(First time opening an issue on github, please bear with me)
Foreword: I'm a trained entomologist, with statistical and R-knowledge gathered through "learning by doing", so if this is just user error, I want to apologise in advance.

Stumbled upon this error, using a glmmTMB model with only the truncated "post hurdle" part (lacking a better descriptor):
Error: check_model() returned following error: 'data' must be of a vector type, was 'NULL'

As far as I understand it, the usage of the truncated_nbinom2 family argument leads to this error, if the zi-component is deactivated.

Creating the model which produces the error:

library(glmmTMB)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.4.2
#> # Attaching packages: easystats 0.7.3.2
#> ✔ bayestestR  0.15.0.2    ✔ correlation 0.8.6    
#> ✔ datawizard  0.13.0.13   ✔ effectsize  0.8.9    
#> ✔ insight     1.0.0       ✔ modelbased  0.8.9    
#> ✔ performance 0.12.4.8    ✔ parameters  0.23.0.11
#> ✔ report      0.5.9       ✔ see         0.9.0.9
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_trunc_error<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                         data=Salamanders%>%filter(count>0), family="truncated_nbinom2", 
                         ziformula = ~0, dispformula = ~1)

check_collinearity(mod_trunc_error)
#> Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'
check_model(mod_trunc_error, panel=F)
#> Error: `check_model()` returned following error: 'data' must be of a vector
#>   type, was 'NULL'
#>   
#> If the error message does not help identifying your problem, another
#>   reason why `check_model()` failed might be that models of class
#>   `glmmTMB` are not yet supported.

Created on 2024-11-26 with reprex v2.1.1

Creating a whole hurdle model working fine (plotting deactivated; works as intended):

library(glmmTMB)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.4.2
#> # Attaching packages: easystats 0.7.3.2
#> ✔ bayestestR  0.15.0.2    ✔ correlation 0.8.6    
#> ✔ datawizard  0.13.0.13   ✔ effectsize  0.8.9    
#> ✔ insight     1.0.0       ✔ modelbased  0.8.9    
#> ✔ performance 0.12.4.8    ✔ parameters  0.23.0.11
#> ✔ report      0.5.9       ✔ see         0.9.0.9
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders, family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

check_collinearity(mod_whole)
#> # Check for Multicollinearity
#> 
#> * conditional component:
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.03 [1.00, 1.37]         1.02      0.97     [0.73, 1.00]
#>  mined 1.03 [1.00, 1.37]         1.02      0.97     [0.73, 1.00]
check_model(mod_whole, panel=F)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

Created on 2024-11-26 with reprex v2.1.1

Creating a post-hurdle model but with "normal" negbin2-family instead of the truncated version working fine (plotting deactivated; works as intended):

library(glmmTMB)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.4.2
#> # Attaching packages: easystats 0.7.3.2
#> ✔ bayestestR  0.15.0.2    ✔ correlation 0.8.6    
#> ✔ datawizard  0.13.0.13   ✔ effectsize  0.8.9    
#> ✔ insight     1.0.0       ✔ modelbased  0.8.9    
#> ✔ performance 0.12.4.8    ✔ parameters  0.23.0.11
#> ✔ report      0.5.9       ✔ see         0.9.0.9
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_trunc_working<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                           data=Salamanders%>%filter(count>0), family="nbinom2", 
                           ziformula = ~0, dispformula = ~1)

check_collinearity(mod_trunc_working)
#> # Check for Multicollinearity
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.02 [1.00, 4.45]         1.01      0.98     [0.22, 1.00]
#>  mined 1.02 [1.00, 4.45]         1.01      0.98     [0.22, 1.00]
check_model(mod_trunc_working, panel=F)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

Created on 2024-11-26 with reprex v2.1.1

Session info

Created on 2024-11-26 with reprex v2.1.1

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.utf8
#>  ctype    English_United Kingdom.utf8
#>  tz       Europe/Berlin
#>  date     2024-11-26
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.4.0)
#>  digest        0.6.35  2024-03-11 [1] CRAN (R 4.4.0)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.4.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.4.0)
#>  fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.2)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  knitr         1.46    2024-04-06 [1] CRAN (R 4.4.0)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.4.2)
#>  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.2)
#>  rmarkdown     2.26    2024-03-05 [1] CRAN (R 4.4.0)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.2)
#>  xfun          0.43    2024-03-25 [1] CRAN (R 4.4.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.3)
#> 
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@strengejacke strengejacke added the Bug 🐛 Something isn't working label Nov 26, 2024
@strengejacke
Copy link
Member

Thanks for reporting, and the reproducible example! Will look into this.

@strengejacke
Copy link
Member

Seems to work now, but not sure if the output is sensible?

data(Salamanders, package = "glmmTMB")
mod_trunc_error <- glmmTMB::glmmTMB(
  count ~ spp + mined + (1 | site),
  data = Salamanders[Salamanders$count > 0, , drop = FALSE],
  family = glmmTMB::truncated_nbinom2(),
  ziformula = ~ 0,
  dispformula = ~ 1
)
performance::check_collinearity(mod_trunc_error)
#> # Check for Multicollinearity
#> 
#> * conditional component:
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.03 [1.00, 2.32]         1.02      0.97     [0.43, 1.00]
#>  mined 1.03 [1.00, 2.32]         1.02      0.97     [0.43, 1.00]

Created on 2024-11-26 with reprex v2.1.1

@MaBKo
Copy link
Author

MaBKo commented Nov 27, 2024

Important:

In my initial post, I forgot to filter the dataset for zeros in the mod_whole part:

It should be

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders%>%filter(count>0), family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

Instead of

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders, family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

Whats the best practice here?
Leaving it in this comment or editing the original post?

@MaBKo
Copy link
Author

MaBKo commented Nov 27, 2024

To @strengejacke's solution:

As stated before I'm not a person with solid statistical knowledge, but I guess that did the trick.

My understanding is that the vif results of the post-hurdle only model should be the same as the conditional component of the whole hurdle model, which seems to be the case, comparing the output of the corrected mod_whole with the ouput from @strengejacke's model

library(glmmTMB)
library(easystats)
library(tidyverse)
library(reprex)
#> Warning: package 'reprex' was built under R version 4.4.2

mod_whole<-glmmTMB(count~spp+mined+(1|site)+(1|spp), 
                   data=Salamanders%>%filter(count>0), family="truncated_nbinom2", 
                   ziformula = ~1, dispformula = ~1)

check_collinearity(mod_whole)
#> # Check for Multicollinearity
#> 
#> * conditional component:
#> 
#> Low Correlation
#> 
#>   Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
#>    spp 1.03 [1.00, 2.30]         1.02      0.97     [0.43, 1.00]
#>  mined 1.03 [1.00, 2.30]         1.02      0.97     [0.43, 1.00]
check_model(mod_whole, panel=F)
#> `check_outliers()` does not yet support models of class `glmmTMB`.

Created on 2024-11-27 with reprex v2.1.1

But maybe someone with a deeper understanding of the glmmTMB package or general statistical knowledge could weigh in.

P.S: I also want to say a big thank you to all contributors of this awesome package; the package and its documentation helped me a lot.

@strengejacke
Copy link
Member

Whats the best practice here?
Leaving it in this comment or editing the original post?

Doesn't matter, I have a reproducible example and added a test, so no need to edit the posts.

My understanding is that the vif results of the post-hurdle only model should be the same as the conditional component of the whole hurdle model, which seems to be the case, comparing the output of the corrected mod_whole with the ouput from @strengejacke's model

Great! I'll merge the PR once all tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 Something isn't working
Projects
None yet
2 participants