-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sbc_hist #193
base: master
Are you sure you want to change the base?
Add sbc_hist #193
Conversation
Thanks a lot! Will review soon. |
@jpritikin How do you feel about enabling this https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork so I can push directly to your PR branch? |
Looks like it's already enabled. Go ahead with your changes. |
Ok great thanks. |
I made a bunch of edits so that it conforms to the conventions we've been using in bayesplot. I still need to add a few tests and I'd like to get @avehtari (and/or @dpsimpson) to take a look and comment on a few things:
Example# create some fake inputs to use for sbc_hist()
set.seed(19)
pars <- paste0("beta[", 1:4, "]")
samples_per_prior <- 511
n_replications <- 500
ranks <- list()
for (n in 1:n_replications) {
r1 <- matrix(0, nrow=samples_per_prior, ncol=length(pars),
dimnames=list(NULL, pars))
for (p1 in 1:length(pars)) {
r1[sample.int(samples_per_prior, floor(runif(1, 0, samples_per_prior))), p1] <- 1
}
ranks[[n]] <- r1
}
color_scheme_set("purple")
sbc_hist(ranks) |
Codecov Report
@@ Coverage Diff @@
## master #193 +/- ##
==========================================
- Coverage 99.31% 99.28% -0.04%
==========================================
Files 30 31 +1
Lines 4088 4175 +87
==========================================
+ Hits 4060 4145 +85
- Misses 28 30 +2
Continue to review full report at Codecov.
|
And one other consideration: do we want the only allowed input type to be a list of matrices? This type of input could also be stored in a 3-D array or a (long) data frame. This doesn’t necessarily need to be decided now since can always add that ability later without breaking anything. |
Generally looks good, but why did you get rid of the horz dotted lines at the top and bottom on the butterfly? Especially for the lower side, the histogram bars can cover most of the butterfly and it's hard to see the boundary. If you don't like the top dotted line, at least keep the bottom one. |
Oops. I meant to put that back in. I had removed it while trying to figure something else out. I’ll add it back. |
I'm curious what you kept the horz line in the middle of the butterfly. I mean, what matters is whether the histogram bars are within the butterfly or not. It doesn't really matter whether they are close to the mean, correct? |
I just did it to match the paper, and I think it looks nice. But now that you mention it I think it could draw the eye to that part of the plot which isn’t the most relevant. Thanks for the feedback. |
* remove segment in center of CI * add segments at extremes of CI (lower one is in front of histogram, upper is behind, I think it looks nice) * add x-axis ticks and labels at 0, samples_per_prior/2,samples_per_prior (this does convey some information)
Here's the result of the example from above after the following changes:
|
also rename samples_per_prior to thinned_sample_size in internal code
Beautiful 👍 |
Thanks for working on this. Some comments Butterfly histogram is useful also for PITs and MCMC rank plots (see #179), so this should be modularized so that there would be only one butterfly histogram plotting function.
As in other pull request @jpritikin noticed that it's waste of memory to save indicators, it would be better to provide just ranks for this function.
Yes, we want to be able to select parameters by name, too. There are often specific parameters we care more.
In the paper and dicussions outside the paper, we don't recommend any quantity. KL is known also to miss some things other diagnostics (there's a 20+ uniformity tests paper) or human can see.
The current arxiv version is wrong about this and the updated version is not complete from other parts. The difference is not big for thetic chains, but antithetic chains may have effective sample size larger than the number of iterations. In this case the chain should be thinned, too. One way is to thin first by 3 (thinning by odd lag removes the antithetic behavior) and then check again the effective sample size. IMO there is no need to warn about uneven samples per bin, if the variation is small. For example, if we thin we we may often have uneven division but, e.g.. if we have some bins with 50 and some 51, the variation is 2% which is visually very small. Alternatively it would be useful to round to even. |
@avehtari Actually I realized that you can't change the thinning if you don't have the raw ranks data. It's not actually as much data as I originally thought. I don't think it's a problem to pass around the raw ranks.
If there are 1000s of parameters, are you suppose to inspect every plot? There's got to be a way to prioritize, even if it's not perfect.
Might be nice to cite. DOI 10.1080/02331880500178562 ?
Can this be (partially) automated? |
I would not recommend that.
I just stated that we don't recommend anything as Jonah asked my opinion. The reason is that we don't know what to recommend. Now when thinking this, it might be useful to have different options, for example showing most smiley ones, most frowny ones, most biased ones, specifically edge problems, etc. If I would need to choose just one, I think I would continue testing something computed from ecdf vs. envelope for uniform. I'm sorry that I don't have better answer for this. I also think that it's better to show "worst" ones instead of just trying to make hypothesis test that all is fine or not. Visual inspection helps to learn more.
Yes. 34 different tests and they recommend any one of them as the best for all. Good to cite as it shows that there is no one good solution.
If ESS>S, thin by 3. Then proceed as before. |
Agreed. How about if we add some other measures of uniformity in addition to KL and require the user to specify which measure to use in conjunction with |
What's the point of the butterfly shape, as opposed to a ribbon? I don't think the bin boundaries are working correctly. First one straddles 0 and last one do not appear to evenly line up with 128, at least on my device. Histograms work on continuous data, so they don't respect the discrete boundaries of the rankings by default. The use of the ribbon expands the x-axis so that the nonsensical values are part of the binning, so I think we are getting 128 ranks pushed into 29 bins which is causing uneven binning. sbc_hist(ranks) +
geom_text(aes(label = stat(count)), stat = "bin", bins = 32) This is probably a better sense of what 32 bins should be... sbc_hist(ranks) +
stat_count(aes(x = plyr::round_any(u, 4, floor))) |
Aah shoot, that last figure has 33 bins because of how 129 was being rounded. General problem still stands. |
Maybe using geom_bar() instead of geom_histogram() let us get around that problem? |
Another modularization suggestion. We need independent draws also for MCMC rank ecdf difference plots, and maybe for other things. Provide a common helper function to do thinning with couple options:
sbc_hist could use this thinning function, too, with possibility of favoring thinning values which make the histogram division more even (ecdf difference plots are not sensitive to this) |
Consider using uniftest to rank uniformity. |
@jpritikin thanks for pointing out the package, although it doesn't solve the problem which of the many tests to use. |
Compared to the previous version, I added: