Skip to content

🦆🥚 Demo/talk: {drake} for making a reproducible analytical pipeline for a stats report

License

Notifications You must be signed in to change notification settings

matt-dray/drake-egg-rap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drake-egg-rap

Launch Rstudio Binder Blog post

Purpose

The {drake} package by Will Landau 'is a pipeline toolkit and a scalable, R-focused solution for reproducibility and high-performance computing.' Tagline: 'what gets done stays done'.

This repo contains a {drake} workflow for creating a demo statistics publication using the latest UK egg statistics as its input.

You can:

Execution

tl;dr

In short:

  1. Run make.R to execute the workflow that creates the report
  2. Change stuff
  3. Run make.R again to bring everything up to date ({drake} only runs the things that are out-of-date)

Full process

You need only run the contents of make.R to execute the workflow, which will source files from R/ and Rmd/ folders and create the output report in the Rmd/ folder (intermediate objects are added to the hidden .drake/ folder).

If you change part of the workflow (e.g. change the chart title in the create_plot() function of the functions.R script) you don't have to re-run everything, or try to remember exactly which files need to be re-run. {drake} does this for you.

Instead, you can redo source(functions.R) and outdated(egg_plan) will tell you the files that have been updated or impacted by that update. You can bring up to date by executing make(egg_plan) again.

A helpful addition is that you can visualise the dependency network (and its outdated version) by passing the drake_config() object to vis_drake_graph() (code is supplied for this in make.R file).

File structure

For purposes of recreating this demo, the relevant file structure is as follows:

drake-egg-rap/
├── .drake/
├── data/
│   └── eggs-packers-02may19a.ods
├── egg-report.Rmd
├── R/
│   ├── functions.R
│   ├── packages.R
│   └── plan.R
└── make.R

All other files and folders are related to git, GitHub or Binder and can be ignored. It might be helpful to use the 'drake-egg-rap.Rproj' file if you're familiar with RStudio Projects.