New draws_tensor format? #349

paul-buerkner · 2024-03-06T06:59:26Z

The discussion about a data.table format reminded me of something I wanted to bring up for a while. I wonder if we can have a non-rvar format that can handle high dimensional arrays of posterior draws? Otherwise behaving like a draws_matrix, i.e., storing chains only as an attribute. If I remember correctly, this is kind of how we store draws in rvars behind the scenes without making this its own format.

The reason for bringing this up is that some brms post-processing function return multi-dimensional output (+ the draws dim) and we don't really represent this in posterior so far except for in rvar. Using rvar as a default output in brms however would be too much a backwards compatibility breaking change (although it will become a non-default option in brms 3.0) and perhaps a bit too adventurous too. That is why I am looking for a "low-level" alternative that resembles what brms already does, namely just outputting high-dimensional arrays.

What are your thoughts on this? Specifically, what does @mjskay think?

mjskay · 2024-03-08T19:58:33Z

Cool, yeah. If I understand the intention, it wouldn't use rvar's specialized indexing with [ and [[, but could probably implement many of the other functions of rvars?

This could be implemented as a codification of what draws_of(<rvar>) returns. Would have to modify it a bit because rvar keeps the nchains and weights` attributes on the surrounding object rather than the array it contains, but in principle I think that could be adjusted.

I'm not sure what to call such a type --- I think it wouldn't be a "draws" type because the other "draws" types are collections of variables rather than a single multidimensional variable, so you wouldn't be able to convert any "draws" object to one of this new type (just as you can't convert other "draws" types to just a single rvar). Maybe this is a "raw_rvar" or a "rvar_array" or something? Or "variable_array"? Hmm.

This also raises another thought I've been mulling over, which is I wonder if it would be helpful to have some variations on rvar with different backing formats. E.g. one solution for #234 is to have a variant of rvar that is a list-column format. This would be less efficient for some operations (like matrix multiplication), but would allow storage in data.tables and may be more efficient for some other operations too (like storage in tibbles).

This perhaps suggests two families of formats within {posterior}: formats that represent collections of variables (the "draws" formats), and formats that represent single multidimensional variables (currently just "rvar", but maybe this family needs a name if we expand it).

paul-buerkner · 2024-03-11T10:23:48Z

you are right. It is not a draws_ object but like rvar conceptually, and indeed what we do have in draws_of(<rvar>). If you add this new format, I agree it should probably be the same kind of object than is/will be strored in draws_of(<rvar>).

Do you have ideas for next steps we should take in this direction?

mjskay · 2024-04-06T20:49:25Z

Hmm, I would probably wait on this until the weighted rvars stuff is merged, since that will add another attribute to rvar that will have to be dealt with.

Then, I would try moving any rvar attributes stored outside of the internal array (e.g. nchains and weights) to be attributes of the array itself, probably creating a simple wrapper type (call it var_array for now) around the array in the process. The process of doing that should reveal any hairy corner cases we might expect to encounter.

The var_array type will probably need to support factors as well, and it might need some subtypes to do that properly.

If all of that goes well, I'd try to figure out what rvar operations can be moved to var_array and turned into simple wrappers at the rvar level.

Somewhere in all of this we might also want to come up with a parent type for rvar and var_array, similar to how "draws" works.

paul-buerkner · 2024-04-06T23:37:13Z

I fully agree with your thoughts. thank you for looking into this! Matthew Kay ***@***.***> schrieb am Sa., 6. Apr. 2024, 23:49:

…

Hmm, I would probably wait on this until the weighted rvars stuff is merged. Then, I would try moving any rvar attributes stored outside of the internal array (e.g. nchains and weights) to be attributes of the array itself, probably creating a simple wrapper type (call it var_array for now) around the array in the process. The process of doing that should reveal any hairy corner cases we might expect to encounter. The var_array type will probably need to support factors as well, and it might need some subtypes to do that properly. If all of that goes well, I'd try to figure out what rvar operations can be moved to var_array and turned into simple wrappers at the rvar level. Somewhere in all of this we might also want to come up with a parent type for rvar and var_array, similar to how "draws" works. — Reply to this email directly, view it on GitHub <#349 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2ADQZZZIF2OFFANKATTY4BNWXAVCNFSM6AAAAABEIPGDIWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGIYDEMZSHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New draws_tensor format? #349

New draws_tensor format? #349

paul-buerkner commented Mar 6, 2024

mjskay commented Mar 8, 2024

paul-buerkner commented Mar 11, 2024

mjskay commented Apr 6, 2024 •

edited

Loading

paul-buerkner commented Apr 6, 2024 via email

New draws_tensor format? #349

New draws_tensor format? #349

Comments

paul-buerkner commented Mar 6, 2024

mjskay commented Mar 8, 2024

paul-buerkner commented Mar 11, 2024

mjskay commented Apr 6, 2024 • edited Loading

paul-buerkner commented Apr 6, 2024 via email

mjskay commented Apr 6, 2024 •

edited

Loading