-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rope range for linear models #364
Comments
mmh I think the rationale we followed back then was quite simple: finding the equivalent of 0.1 * SD, and the most straightforward way was using the d -> log odds conversion formula... but indeed, there might be more appropriate ways of tackling this, what would you recommend? |
I thought this was based on Kruschke's book, no? |
afaik he only gave recommendation for linear models and from there we derived the logistic one |
No, see (from Supplement https://doi.org/10.1177/2515245918771304) |
Reminder for ordinal regression: https://twitter.com/ChelseaParlett/status/1352367383243939841?s=19 |
From my empirical experience, I felt like the current default rope range for logistic was "a bit larger" than for lms (i.e., it was a bit harder to be outside the rope given similarish effects). Is the new value usually bigger or smaller than the old? |
1. The conditional sd will be smaller than the unconditional sd.So for the logistic case, the default we use is expected to be rather small - a change of of 2. We have an inconsistency in the default rope range between models:For linear models we use the unconditional, and for logistic we use conditional:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@DominiqueMakowski not 100% sure what you're simulation was doing, but you're (again 😂) using the OR to d conversation wrong - it's not for predicting group from y in a logistic regression, it's for predicting a dichotomized y from group in a logistic regression. Also, the warnings shouldn't be for logistic models (as I noted, there is no unconditional alternative for |
This is what I'm trying to say: get_results <- function(n=100, d=0.5){
df <- bayestestR::simulate_difference(n=n, d=d)
m1 <- glm(I(V1>median(V1)) ~ V0, data=df, family="binomial")
m2 <- lm(V1 ~ V0, data=df)
data.frame(n=c(n, n, n),
d=c(d, d, d),
type=c("LM - current (S)", "LM - sigma", "GLM (rescaled)"),
rope=c(bayestestR::rope_range(m2)[2], 0.1 * insight::get_sigma(m2), 0.1), # rescaled, 0.1 * pi / sqrt(3) is just 0.1
coef=c(insight::get_parameters(m2)[2, 2], insight::get_parameters(m2)[2, 2], insight::get_parameters(m1)[2, 2]))
}
library(purrr)
data <- map_dfr(seq(0, 2, length.out = 50), get_results, n = 100)
library(ggplot2)
ggplot(data, aes(x=d, y=rope, color=type)) +
geom_line(size = 2) +
coord_cartesian(ylim = c(0.05,0.15)) Created on 2021-01-26 by the reprex package (v0.3.0) |
aaaaaah I entirely misread this thread then |
lol |
My priors made me think it was a rehash of #20, my deepest apologies 😅 |
Classic example of bad priors :P |
I cleaned-up the code a bit, and put the warning in the right place. |
re-submitting, then |
resubmitted ^^ |
Dear maintainer, package bayestestR_0.8.2.tar.gz does not pass the incoming checks automatically, please see the following pre-tests: |
Because of the note?? The URL works fine... |
Do you have an attached text file that also contains another check log? |
Lol dont worry I just pasted that here for reference but I'll look into it as soon as I reach soon :) |
I'll just reply say it's prolly a false positive :) |
Or remove link, merge #367, and re-submit 8-) |
On CRAN |
help ... it's great that this warning exists (I guess), but it's super-opaque to figure out what to do about it if it pops up (apparently) out of nowhere ... I'm getting it from running reprex:
Session info has 111 packages (if you want I can post the whole thing) but this is dotwhisker-0.6.0, rstanarm 2.21.1, bayestestR 0.8.2 Should this be posted as an issue somewhere upstream ... ?
|
Yes, this was a misnomer fixed in this commit, but not yet in the CRAN version: e70ec3e#diff-8a9dfe10a23f01c6067d481572ad8f67365347367637e87548e6fa48f1119d54 You can use |
I think it would perhaps be better to have this in the startup message, what do you think? Also, @bbolker while you're here... Care to chime in? 😅 Should the default rope be based on |
Or we resolve this issue and can then remove the message ;-) |
The question is whether the conditional "rope" makes more sense than the unconditional one... The idea behind the ROPE is to have a relationship/association/effectsize between predictors and outcome that can be considered as "large enough" to be relevant. From my understanding, "large enough" refers to a significant change in the outcome - i.e. the unconditional rope range (+/- 0.1 * SD(y)). |
I spontaneously tend to agree |
One could argue that, due to multiple regression, the "raw" effect of a predictor (i.e. unconditional rope range) is not of interest, since coefficients are "adjusted" for each other. That could mean we should use the conditional rope range. However, in real worlds, we always have confounders and no "isolated" associations, but we still want to have a relevant change in our outcome - even if conditioned on / adjusted for other covariates. Then we would still look for +/- 0.1 * SD(y) - unless there's a good reason that the unconditional rope range is biased, and the conditional rope range is preferred. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I think there are reasons for each of the options (and as you know I think that defaults should be avoided anyway)... But, My main concern (which is how this whole thing started #364 (comment)) is that for GLMs, we return the "conditional"-based rope, and so there is an issue of consistency if we return the unconditional-rope only for gaussian models. |
We could keep this inconsistency top secret, delete this issue and, to be sure, the whole internet as well... 😬 |
should we have it as an option to toggle? Fixed vs. Conditional? In the first case, it would be 0.1 * SD(y) and the equivalent based on the log-odds -> transformation for logit, and in the second we base it on sigma? |
So:
(There is no equivalent for the marginal option for GLMs... 😢) |
The above seems good! |
So that one?
|
Yes. So we should add a note in the startup message that the default rope for linear models is changed. |
but currently the default is |
Ah, you switched my table around! Should be this:
|
Oops my bad. Something that will need to be explained and clarified that might bug people's intuition is that for lms, the better the model the lower sigma and the smaller the rope, but for glms the rope doesn't change depending on the model quality, it's always the same value |
That's true - for the same data. However, for different samples, over different ranges of Xs (say you sample everyone and I only sample children) sigma is (or should be) the same. I can add a note to the relevant vignette, that you can reference in the docs and startup message. Sounds good? |
Yeah But wouldn't 3/sqrt(pi) also apply for the marginal ROPE (for logit)? I'm trying to wrap my head around the implications |
Nope... GLMs don't have a straight forward way of getting the marginal variance of the latent variable 🤷♂️ |
I'm starting to think that this is actually a not-so-bad alternative 😁 |
To be honest, while I understand the crux of this issue (though I still have trouble understanding why 3/sqrt(pi) is conditional and why is it fundamentally different from the marginal linear version), the current behaviour (i.e., "marginal" for linear and "conditional" for logit) is really straightforward to explain and transparent to implement We discuss(ed) it clearly in the docs, posts and papers, so it's really weird that nobody else had seemed to notice the discrepancy (especially if it leads to some problems)... I mean the Bayesian framework have been under so much scrutiny lately, and ROPEs are even used outside of that context (btw I wonder what would be Lacken's perspective on that)... This relative absence of agitation surrounding our previous default suggests that it's not a timely issue: we can probablyt run a bit more with it So I'd suggest adding a |
Yes, I also find it weird that no one has pointed this out. However, the might be because (1) serious people don't use the default values, or (2) they tend towards BFs more than ROPEs? I agree that we can let this steam some more - no need to add the
This is the definition of a logistic regression:
This is also why
Here's some more complexity: In a (G)LMM, the default rope range should probably be conditioned on the level of the effect.
And remember kids - don't use defaults! |
thanks for the explanations, (and your patience), it's much clearer now |
For logistic regression, the rope range is a function of 3/sqrt(pi), as this is the sd of the logistic distribution (on the latent scale).
However, this is the conditional sd, which is akin to sigma in a linear model.
But in a linear model, the rope range if a function of the marginal sd, no?
This seems odd to me. Have I missed something?
The text was updated successfully, but these errors were encountered: