Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with check_model() for Stan models #354

Open
bwiernik opened this issue Aug 18, 2021 · 11 comments
Open

Issues with check_model() for Stan models #354

bwiernik opened this issue Aug 18, 2021 · 11 comments
Labels
3 investigators ❔❓ Need to look further into this issue

Comments

@bwiernik
Copy link
Contributor

I'm not exactly what the thinking is behind the current behavior of check_model() for Stan models. It calls the internal function performance:::.check_assumptions_stan(). This returns a data frame with columns Group (Prior or Posterior), y (parameter), x (value), and id (posterior draw).

I'm not exactly sure what sort of plot this is intended to be used for (perhaps a scatterplot matrix for the various parameters across posterior draws?). Could someone elaborate? @strengejacke @mattansb? Nevertheless, it is really different from the results for lm/glm models, where a list with the various diagnostic data frames is returned. This produces several problems:

  1. see::plot.see_check_model() fails for check_model(stan_model) objects

    • This function expects to find the names of plots in a list (QQ, VIF, etc.). It errors when given the single data frame from performance:::.check_assumptions_stan().
  2. There is no way for users to generate the typical regression model diagnositic plots, such as fitted-residual plots, qq plots, etc.

  3. The current performance:::.check_assumptions_stan() function fails often for brmsfit models

    • If sample_priors = "no" (default), it stops. It could instead return a data frame with just the posterior.
    • If improper priors are used (default), the reshaping fails. It should be robust to the non-sampled priors.

This leads to 2 big questions:

  1. What should check_model() do for Stan models?

    • My preference would be for it to return the same sort of plots as for lm/glm objects.
    • Could you describe what sort of plot was intended with the current output? Could we implement that as a different function (similar to pp_check())?
  2. What should we do in the mean time?

    • My suggestion would be to internally call bayestestR::convert_bayesian_as_frequentist(), then call check_model() on that. I think this would be an okay stopgap until we implement the individual plots for stan objects.

Thoughts?

@bwiernik
Copy link
Contributor Author

Here is an example of the issues:

library(performance)

model_stan <- rstanarm::stan_glm(mpg ~ wt + cyl, data = mtcars)
model_brms <- brms::brm(mpg ~ wt + cyl, data = mtcars)
model_brms_wPrior <- brms::brm(mpg ~ wt + cyl, data = mtcars, sample_prior = "yes")
model_lm   <- lm(mpg ~ wt + cyl, data = mtcars)
model_glm  <- glm(mpg ~ wt + cyl, data = mtcars)

check_stan <- check_model(model_stan)
check_brms <- check_model(model_brms) # need to sample priors
check_brms_wPrior <- check_model(model_brms_wPrior) # stil fails due to improper priors
check_lm   <- check_model(model_lm)
check_glm  <- check_model(model_glm)

plot(check_stan) # fails
plot(check_lm)
plot(check_glm)

@strengejacke
Copy link
Member

I would agree to your suggestions. My only question is, if the "typical" plots created by check_model() would also be useful / applicable in a Bayesian context?

@strengejacke

This comment was marked as outdated.

@bwiernik
Copy link
Contributor Author

I'm thinking that Residuals plots, VIF, and normality plots absolutely. These same assumptions apply to linear models regardless of the inference framework.

Influence plots should probably use a Bayesian analogue to Cook's distance or a pointwise LOO-IC--there are some papers and existing functions I found yesterday we can draw on there.

The stopgap of converting to a frequentist model for now should be generally reasonable for common cases.

@strengejacke
Copy link
Member

btw, @DominiqueMakowski, this doc needs some more love, I think ;-)

strengejacke added a commit that referenced this issue Aug 19, 2021
@strengejacke

This comment was marked as off-topic.

@bwiernik

This comment was marked as off-topic.

@strengejacke
Copy link
Member

I like the prior-posterior plot, but I think it would be best as another function. Thoughts on a name?

I think this is that urgent, since we can already add layers of prior samples to the plot:
https://easystats.github.io/see/articles/bayestestR.html#adding-prior-samples-4

@bwiernik

This comment was marked as outdated.

@strengejacke

This comment was marked as outdated.

@mattansb

This comment was marked as outdated.

mattansb added a commit that referenced this issue Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue
Projects
None yet
Development

No branches or pull requests

3 participants