You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently had the pleasure of using R as part of a team in a data science project. Despite the best reproducibility intentions we ended up getting ourselves in a mighty tangle with dataset versions, modelling results versions and the matching up of the two.
It got me thinking about the issue of provenance and the tooling in R. I'd be keen to work on any of the following:
A possible component could be last year's suggested project of an 'R package to store/access metadata associated with data/functions': ropensci/auunconf#18
Wow Jono you are right this is a very similar idea to that one!
I have in mind (and remember, this is all purely brainstorming at this point) the case where you load some data from a trusted source, validate that it is indeed unchanged (validate_checksum(data)), print out the context (context(data)$owner; context(data)$last_modified), etc... ditto for functions that do what one thinks they do (context(my_function)$assumptions). The context travels with the data/function and can be tested against, e.g.
I recently had the pleasure of using R as part of a team in a data science project. Despite the best reproducibility intentions we ended up getting ourselves in a mighty tangle with dataset versions, modelling results versions and the matching up of the two.
It got me thinking about the issue of provenance and the tooling in R. I'd be keen to work on any of the following:
A much more long winded proposal that motivates all of these is available here: https://github.com/MilesMcBain/journalr/blob/master/Journalling_tool_proposal.Rmd
The text was updated successfully, but these errors were encountered: