Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensitivity to Priors in Bayesian Inference #437

Open
mattansb opened this issue Nov 13, 2019 · 14 comments
Open

Sensitivity to Priors in Bayesian Inference #437

mattansb opened this issue Nov 13, 2019 · 14 comments
Labels
What's your opinion 🙉 Collectively discuss something

Comments

@mattansb
Copy link
Member

@strengejacke @DominiqueMakowski Let's open this up here officially.

I think this paper can be quite short, really. Here I am pre-regestering my hypotheses:

  1. For posterior-based indices (which is all but the BFs), the degree by which they are affected by priors looks like this:
    image
    That is, weak-to-flat priors have close to no effect on the indices, but as the priors grow more narrow, the indices become reflections of the prior, rather than the data (or a mix of the prior and the data). This is the reverse Jeffreys-Lindley-Bartlett paradox.

  2. For BFs, we will get:

    1. the classic Jeffreys-Lindley-Bartlett "paradox", where more-flat priors result in BFs supporting the null
    2. The more the priors do not match the data (e.g., prior centered around -1, but data is centered around +2), the stronger BF will support the null.

When can then talk about why it is dangerous to treat the posterior based indices as "objective", and why BFs should only be used when researchers have informative (weak or strong) priors they want to test. That is, posterior indices are descriptive of the posterior (= prior + data, and not the data alone), while BFs are not descriptive of the data / posterior, they are indicative of the match between data + prior.

Where we might run into trouble is with reviewers asking we suggest priors - so we should emphasis that our focus is not on correct prior selection, but on correct inference using Bayesian indices.


As for who should take the helm on this paper... My plan is to leave academia post PhD (but of course I will be an r-developer 4EVA!), so first-authored papers will serve you two better than they will serve me. So this is up to you guys 😁

@DominiqueMakowski
Copy link
Member

One aspect that could be interesting is the interaction between prior width and location, in other words, the interaction between the effect expectation and its certainty, which would be reflected by a simulation that modulates independently these both aspects. Maybe some (I don't have any more precise hypotheses here yet though) that some indices would be more sensitive to location and others to precision?

My plan is to leave academia post PhD

Sad to hear that, seems like you would be a great fit for it, but I am sure you have better plans in mind :)

Let's discuss it uppon Daniel's return, which happens in SW Episode 6 (aka in a few days I think)

@mattansb
Copy link
Member Author

If by effect certainty you mean "how wide is the prior" or "how informative is the data is", then I think this falls under my first hypothesis:
The strength of the prior is always relative to the strength of the data (that is, there is a trade-off in their combined effect on the posterior). So the if data is very strong (large N~) or the prior is very weak (wide), then the posterior will reflect the data more than the prior, regardless of how "wrong" the location of the prior.


Sad to hear that, seems like you would be a great fit for it, but I am sure you have better plans in mind :)

Thanks, Dom! I do love research, but I equally hate all the academia fluff - reminds me of one of my favorite tweets:

I'm thinking of leaving academia so I can focus on my true passions: research and education.
Chris Ferrie, February 19, 2019

I'll head out to the real world, and start the credibility crisis in industry (oh boy...) 😋

@lindeloev
Copy link

Many relevant thoughts here! We the flat-priors-support-the-null very strongly for BIC-based Bayes Factors which is close to the most uninformative thing you can get (Wagenmakers, 2007).

I have recently begun thinking about priors as the model itself. For example, a model with 10 parameters with wide priors can easily be more complex than a model with 12 parameters with narrow priors, if the prior predictive space is larger in the former. At a more extreme end, the very inclusion of a parameter in the model is a 100% prior that it exists, and excluding one is a 100% prior that it doesn't (as seen from the model).

So the issues about how the prior affects the posterior and the relation between the two is really the same issue as choosing which parameters to include in the model. I am not good at reading up on the literature, so this may have been discussed extensively already without me noticing. But at least, it is my impression that this has not penetrated public discourse about models/priors yet, and I think it makes priors less terrifying (by making model building more terrifying, I guess...).

As a side note, I am finalizing a package where you can fix parameters to specific values (or other parameters) via the prior: https://lindeloev.github.io/mcp/articles/priors.html. Hoping to make it public today or tomorrow.

@strengejacke
Copy link
Member

Sounds very interesting, and would fit into my current plannings on a more methodology based habilitation :-)

My plan is to leave academia post PhD

Indeed sad, but of course you know you can't leave the easystats project, no matter where you are! :-)

As for who should take the helm on this paper...

Since I now start working on a prior-tutorial paper, maybe @DominiqueMakowski can take the lead? Not sure about the time frame though, and if funding for open access publishing might be an issue, maybe we'll find a solution (don't see much problems here right now).

@mattansb
Copy link
Member Author

Hi @lindeloev! Really interesting stuff!
On the one hand you really got me thinking about priors in an interesting way - on the other hand, you pushed me to the limits of my Bayesian knowledge and make me feel like an impostor, so... 🤷‍♂️

Here are my thought on your thoughts:

The BIC-based BF's approximate prior is far from flat - if it were, all resulting BFs would be highly supportive of the null. In Wagenmakers' (2007) Appendix B he describes the assumed prior, and though it is wide, it is not too wide (he calls it reasonable "noninformative", which made me laugh).

So the issues about how the prior affects the posterior and the relation between the two is really the same issue as choosing which parameters to include in the model.

I generally agree that the topic of "how do priors affect posteriors" should always include point priors (be they "null" points at 0, or fixed points as in your mcp package that looks wayyy above my head 😅). But since point priors result in point posteriors, this leaves very little to investigate, I think.
Also, I don't think we'll be looking at "models" per se, but as univariate parameter "spaces", like in our previous paper. So perhaps we should also address this in the paper:

We assume that the user wants to learn about a (fixed effect?) parameter from the data, which is done by setting a non-point prior. How then do the scale and location of such a prior affect the Bayesian indices, and as a result, what can be inferred?


but of course you know you can't leave the easystats project, no matter where you are! :-)

Should have known not so sign that piece of paper...👿

@lindeloev
Copy link

@mattansb

The BIC-based BF's approximate prior is far from flat - if it were, all resulting BFs would be highly supportive of the null. In Wagenmakers' (2007) Appendix B he describes the assumed prior, and though it is wide, it is not too wide (he calls it reasonable "noninformative", which made me laugh).

Oh, I should have read that appendix (and https://pdfs.semanticscholar.org/36ee/f823310b020648d1b254ca1e35e3362655d1.pdf which discuss the implications of prior width on BFs) in more detail. Now I'm feeling like an imposter :-) It has just been very wide for the analyses I've been doing. I still need to find the formula for computing a BIC prior - though computationally it may just be like taking a flat prior, and updating it with one data point on a normal likelihood?

Yeah, discussion of point priors may be beside the point on a paper like this (though I just find it really intriguing to think of a model as having point-null priors for all parameters in the universe except those "included"). I guess I was just replying to

When can then talk about why it is dangerous to treat the posterior based indices as "objective",

saying that the subjectivity enters already in deciding which parameters to include in our model. Just to take some of that criticism/worry off the prior distributions. Though I'm worrying that I'm being way too philosophical now :-)

@mattansb
Copy link
Member Author

mattansb commented Nov 14, 2019

Interesting reference - add to me "to read" list - thanks!

I still need to find the formula for computing a BIC prior - though computationally it may just be like taking a flat prior, and updating it with one data point on a normal likelihood?

That sounds super interesting... I wonder if that will be equivalent - keep us (me!) posted!

saying that the subjectivity enters already in deciding which parameters to include in our model.

Definitely need to address this in the paper.

@strengejacke
Copy link
Member

there's a paragraph on exactly this topic (however, advocating for careful use of BF) here:
https://psyarxiv.com/zcf8s/

@DominiqueMakowski
Copy link
Member

Since I now start working on a prior-tutorial paper, maybe @DominiqueMakowski can take the lead?

Since I feel in the force that @mattansb has some great ideas about that paper, I feel he should still be first without considering his eventual abandonment of academia (and you never know what the future has in store :) (plus, I want to see the Ben-Sachar inverse paradox becoming real 😋) Me and Daniel can share the remaining last because why not 😁

Not sure about the time frame though, and if funding for open access publishing might be an issue, maybe we'll find a solution (don't see much problems here right now).

Although in a perfect world I'd prefer open access publishing, publishing is the priority over not publishing, so if we don't have funding we should still submit to a non-OA journal. And nowadays, with preprints and scihub, there will be ways to access it 🏴 ☠️

NOW LET'S START THINKING about the important


What do we compare

  • pd
  • p-map
  • p-rope (probability in rope)
  • bf-pointnull
  • bf-rope
  • ...? (you don't even imagine how much this lack of symmetry with pd makes me uncomfortable 😅)

Of what

Logistic and Linear Model similar to paper1?

On what

  • True effect (presence 1, absence 0)

  • Amount of evidence (i.e., sample size, do we want to manipulate that or is it too much?)

  • Priors location: agnostic (0), believer (1), sceptic (-1), zealot (2), critic(-2)

  • Priors scale (i.e., "precision", "certainty", "confidence): vague, flexible, precise

  • Interaction between them

Once we have a small running script, I can run it on the server.

@mattansb
Copy link
Member Author

mattansb commented Dec 9, 2019

Unlike you guys, I am not a full jedi master PhD yet, and trying to become one is taking up most of my time. I feel like having me as sole lead on this might mean putting this on low heat for quite a while. I promise to fully participate in the writing and thinking processes even if I am not the lead.
(Can consider co-lead if you really insist...)


As I started saying above, I think this paper can be lighter on the simulations than the previous one. That is, we can explain with math the implications of the location and scale of priors on each of the measures, and then have a figure, derived from simulations, to drive the point home. (Like, the simulations aren't so much the leading result, but as visual + semi-empirical aid).

The effects of interest are:

  • The strength of the prior
    • The scale of the prior: non-informative / informative / strong.
  • The strength of the data
    • Sample size.
  • Deviation of the priors location from the true value.
    This will be the conclusion of the parameters:
    • Effect is zero vs non-zero (how big?)
    • Location of prior: agnostic conservative (0), believer (1), skeptic (-1)
      Though I'm not sure these are the parameters I'd put in the simulation. We need to think about this point more....

As we can essentially write this whole paper up before a single simulation is made with maths / Bayesian logic alone (it could in fact be a semi-introduction to Bayesian concepts), we might want to start with that?

@DominiqueMakowski
Copy link
Member

Unlike you guys, I am not a full jedi master PhD yet

at least you are on the council

in that case, I can take the lead (although this is a tough period for me, I don't have much time 😢), but I'd like you to at least be officially co-lead :)


The strength of the prior: The scale of the prior: non-informative / informative / strong.

I feel like "informativeness" of a prior, is a consequence of both location and scale. And more predominantly location, where we expect to effect to be with highest probability. Then scale is just how confident we are in our expectation. Hence we might want to clarify the terminology?

@mattansb
Copy link
Member Author

mattansb commented Dec 9, 2019

in that case, I can take the lead (although this is a tough period for me, I don't have much time 😢), but I'd like you to at least be officially co-lead :)

image


I feel like "informativeness" of a prior, is a consequence of both location and scale.

Yes, you are right. How about:

  • informativeness = the sum of:
    • location
    • scale = strength of the prior: weak <-> strong.

@IndrajeetPatil
Copy link
Member

Should this conversation move to bayestestR and continue there?

@IndrajeetPatil IndrajeetPatil transferred this issue from easystats/easystats Jun 28, 2021
@bwiernik
Copy link
Contributor

Jeff Rouder and co recently posted a preprint comparing BF to information criteria like WAIC where they discussed sensitivity to priors somewhat. I disagree with the interpretation of their results somewhat, but may be useful to refer to or cite. https://psyarxiv.com/e6g9d/

@IndrajeetPatil IndrajeetPatil added the What's your opinion 🙉 Collectively discuss something label Jul 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
What's your opinion 🙉 Collectively discuss something
Projects
None yet
Development

No branches or pull requests

6 participants