Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New draws_tensor format? #349

Open
paul-buerkner opened this issue Mar 6, 2024 · 4 comments
Open

New draws_tensor format? #349

paul-buerkner opened this issue Mar 6, 2024 · 4 comments

Comments

@paul-buerkner
Copy link
Collaborator

The discussion about a data.table format reminded me of something I wanted to bring up for a while. I wonder if we can have a non-rvar format that can handle high dimensional arrays of posterior draws? Otherwise behaving like a draws_matrix, i.e., storing chains only as an attribute. If I remember correctly, this is kind of how we store draws in rvars behind the scenes without making this its own format.

The reason for bringing this up is that some brms post-processing function return multi-dimensional output (+ the draws dim) and we don't really represent this in posterior so far except for in rvar. Using rvar as a default output in brms however would be too much a backwards compatibility breaking change (although it will become a non-default option in brms 3.0) and perhaps a bit too adventurous too. That is why I am looking for a "low-level" alternative that resembles what brms already does, namely just outputting high-dimensional arrays.

What are your thoughts on this? Specifically, what does @mjskay think?

@mjskay
Copy link
Collaborator

mjskay commented Mar 8, 2024

Cool, yeah. If I understand the intention, it wouldn't use rvar's specialized indexing with [ and [[, but could probably implement many of the other functions of rvars?

This could be implemented as a codification of what draws_of(<rvar>) returns. Would have to modify it a bit because rvar keeps the nchains and weights` attributes on the surrounding object rather than the array it contains, but in principle I think that could be adjusted.

I'm not sure what to call such a type --- I think it wouldn't be a "draws" type because the other "draws" types are collections of variables rather than a single multidimensional variable, so you wouldn't be able to convert any "draws" object to one of this new type (just as you can't convert other "draws" types to just a single rvar). Maybe this is a "raw_rvar" or a "rvar_array" or something? Or "variable_array"? Hmm.

This also raises another thought I've been mulling over, which is I wonder if it would be helpful to have some variations on rvar with different backing formats. E.g. one solution for #234 is to have a variant of rvar that is a list-column format. This would be less efficient for some operations (like matrix multiplication), but would allow storage in data.tables and may be more efficient for some other operations too (like storage in tibbles).

This perhaps suggests two families of formats within {posterior}: formats that represent collections of variables (the "draws" formats), and formats that represent single multidimensional variables (currently just "rvar", but maybe this family needs a name if we expand it).

@paul-buerkner
Copy link
Collaborator Author

you are right. It is not a draws_ object but like rvar conceptually, and indeed what we do have in draws_of(<rvar>). If you add this new format, I agree it should probably be the same kind of object than is/will be strored in draws_of(<rvar>).

Do you have ideas for next steps we should take in this direction?

@mjskay
Copy link
Collaborator

mjskay commented Apr 6, 2024

Hmm, I would probably wait on this until the weighted rvars stuff is merged, since that will add another attribute to rvar that will have to be dealt with.

Then, I would try moving any rvar attributes stored outside of the internal array (e.g. nchains and weights) to be attributes of the array itself, probably creating a simple wrapper type (call it var_array for now) around the array in the process. The process of doing that should reveal any hairy corner cases we might expect to encounter.

The var_array type will probably need to support factors as well, and it might need some subtypes to do that properly.

If all of that goes well, I'd try to figure out what rvar operations can be moved to var_array and turned into simple wrappers at the rvar level.

Somewhere in all of this we might also want to come up with a parent type for rvar and var_array, similar to how "draws" works.

@paul-buerkner
Copy link
Collaborator Author

paul-buerkner commented Apr 6, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants