The {drake} package by Will Landau 'is a pipeline toolkit and a scalable, R-focused solution for reproducibility and high-performance computing.' Tagline: 'what gets done stays done'.
This repo contains a {drake} workflow for creating a demo statistics publication using the latest UK egg statistics as its input.
You can:
- read my blog post about how {drake} could be helpful for the UK government's Reproducible Analytical Pipelines approach to producing statistical publications
- launch the contents of this repo in a live browser instance of RStudio by clicking the 'launch binder' badge above (read another blog post about why this is awesome
- take a look at a presentation I gave about {drake} at a couple of cross-government Coffee & Coding events
- download or clone this repo to serve as the basis for your own {drake} pipeline
- download a clean {drake} project template example with
drake::drake_example("main")
.
In short:
- Run
make.R
to execute the workflow that creates the report - Change stuff
- Run
make.R
again to bring everything up to date ({drake} only runs the things that are out-of-date)
You need only run the contents of make.R
to execute the workflow, which will source files from R/
and Rmd/
folders and create the output report in the Rmd/
folder (intermediate objects are added to the hidden .drake/
folder).
If you change part of the workflow (e.g. change the chart title in the create_plot()
function of the functions.R
script) you don't have to re-run everything, or try to remember exactly which files need to be re-run. {drake} does this for you.
Instead, you can redo source(functions.R)
and outdated(egg_plan)
will tell you the files that have been updated or impacted by that update. You can bring up to date by executing make(egg_plan)
again.
A helpful addition is that you can visualise the dependency network (and its outdated version) by passing the drake_config()
object to vis_drake_graph()
(code is supplied for this in make.R
file).
For purposes of recreating this demo, the relevant file structure is as follows:
drake-egg-rap/
├── .drake/
├── data/
│ └── eggs-packers-02may19a.ods
├── egg-report.Rmd
├── R/
│ ├── functions.R
│ ├── packages.R
│ └── plan.R
└── make.R
All other files and folders are related to git, GitHub or Binder and can be ignored. It might be helpful to use the 'drake-egg-rap.Rproj' file if you're familiar with RStudio Projects.