This repository contains a minimal viable example of an R data visualisation and report generation workflow using ABS labour force open data.
The contents of this repository have been created to support the Automating R Markdown report generation - Part 2 tutorial in my r_tips
repository.
- As referenced in this GitHub issue, path handling by
rmarkdown::render()
is currently not ideal as theoutput_dir
argument creates an absolute path for rendered figures. This can be resolved by usingxfun::in_dir("code", ...)
to render inside.\code
and then moving the outputs into.\output
.
-
Use
renv
to manage package version and commit yourrenv.lock
file with your repository. Therenv
package will automatically create a second.gitignore
file in~/renv
, which prevents the private project library~/renv/library
from being committed. -
Load the minimum set of packages required i.e. load
dplyr
instead oftidyverse
if you are just performing simple data transformations and avoid usingpacman::p_load()
. -
The package
renv
uses static analysis to determine which packages are used i.e. by scanning your code for calls tolibrary(pkg)
,require(pkg)
orpkg::
. Due to this functionality, avoid mapping package loading withlapply(packages, library, character.only = TRUE)
as described here.# Recommended due to renv static analysis approach library("here") library("readr") # Also recommmended for extra code reproducibility here::here(...) readr::read_csv(...) # Not recommended packages <- c("here", "readr") invisible(lapply(packages, library, character.only = TRUE))
-
The
pandoc
package is not bundled with thermarkdown
package (pandoc
is provided by RStudio) so the correct version ofpandoc
needs to be manually specified in the YAML pipeline.steps: # Checks out your repository under $GITHUB_WORKSPACE, so your job can access it - uses: actions/checkout@v2 # Sets up pandoc which is required for knitting HTML reports - uses: r-lib/actions/setup-pandoc@v2 with: pandoc-version: '2.17.1'
-
A virtual R environment needs to first be set up.
steps: - name: Setup R version 4.1.2 uses: r-lib/actions/setup-r@v2 with: r-version: '4.1.2'
-
The template CI/CD code for using
renv
to install R package dependencies is found here, based on a GitHub actionsrenv
cache issue recorded here.env: RENV_PATHS_ROOT: ~/.local/share/renv steps: # Set up R packages cache for workflow reruns - name: Cache R packages uses: actions/cache@v1 with: path: ${{ env.RENV_PATHS_ROOT }} key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }} restore-keys: |- ${{ runner.os }}-renv- # Install cURL to transfer data to virtual environment - run: sudo apt-get install -y --no-install-recommends libcurl4-openssl-dev # Install renv and project specific R packages - name: Restore R packages shell: Rscript {0} run: | if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv") renv::restore()
-
Write scripts that are self-contained. This means using one script to separately load all R libraries should be avoided, to minimise errors in case one job cannot access the outputs of another job.
-
I personally prefer running scripts as separate steps, for better job progress monitoring.
# Execute R scripts - name: Extract data from ABS labour force data API run: Rscript code/01_extract_data.R - name: Clean raw labour force data run: Rscript code/02_clean_data.R