diff --git a/README.html b/README.html index 1fa5817..db24182 100644 --- a/README.html +++ b/README.html @@ -356,8 +356,38 @@
(https://framverse.github.io/coding-practices/)
Our evolving coding best practices document
+The goal of these best practices is to act as a guideline to produce +code and analyses that are highly transparent, transferable, +reproducible and approachable.
+Good coding practices make collaboration easier and faster, and +reduce the frequency and consequences of bugs and problems. At least +initially, adhering to good practices can feel like it add unnecessary +steps that slow progress. In the long run, however, we find that these +practices save time. Further, they increase the transparency of our +code, which in turn increases the overall transparency of our work.
+Below, we outline best practices organized into related topics. When +it can be done succinctly, we provide explanations for why the +practices save time. After the guidelines, we include some short +tutorials and examples to show how to implement some of the +less obvious practices.
+For any kind of substantial work involving more than one file, use
Rprojects, the here
package, and renv
to make
scripts easily shareable. The goal is that you can zip up a folder, send
@@ -366,27 +396,34 @@
When developing a document to report results or findings to a general user, use Rmarkdown or Quarto to create a report that blends R code with explanations and graphics.
-When code is likely to be re-used (e.g. not a one-off analysis), +
When a script is likely to be re-used (e.g. not a one-off analysis),
create a commented version with instructions on use. This should be
-stored somewhere accessible. Collin maintains the snippets
-github repository
-for this kind of thing, or it could live in a Teams folder. It may also
-be appropriate to incorporate this code into an R package, or develop a
-new R package for this code. Converting code to packages is much more
-involved that storing a code snippet somewhere, but makes it much easier
-for incorporation into other code.
Need to give directions for starting Rprojects, using the here
-package, using renv
snippets
github repository for
+this kind of thing, or it could live in a Teams folder. If code is
+likely to be useful to the team or others, it may also be appropriate to
+incorporate this code into an R package, or develop a new R package for
+this code. Converting code to packages is much more involved that
+storing a code snippet or re-useable script somewhere, but makes it much
+easier for incorporation into other code.
+To make scripts easier to re-use, replace hard-coded specifics with
+variables that are defined at the top of the script. For example, if
+Collin wrote a script to read in the Mortalities table of a FRAM
+database and plot the landed catch for a specific fishery, he would
+probably initially write that script using the file name and fishery
+name wherever he needed it (e.g.,
+connect_fram_db("FramDBExample.Mdb")
and
+data |> select(fishery_id == 19) |> ...
). To make
+this script easier to re-use, he could add lines of code near the top of
+the script, with
file_use = "FramDBExample.Mdb"
+fishery_use = 19
+and then replace any hard-coded uses of the filename and fishery ID
+with those variables (e.g., connect_fram_db(file_use)
and
+data |> select(fishery_id == fishery_use) |> ...
).
+This makes it very easy to re-use for a different case – simply update
+the lines defining file_use
and
+fishery_use
.
When working across multiple projects, it can be helpful if each @@ -399,17 +436,22 @@
Draft file structure?
+Draft file structure? CBE: slightly updated. I like to keep the +raw data in a separate folder from where cleaned / intermediate data +files live.
project_folder
├── scripts
-│ ├── data_clean.R
+│ ├── data_clean.R # should save to `cleaned data/`
│ └── analysis.R
-├── data
+├── original_data
│ ├── data.csv
│ └── more_data.xlsx
+├── cleaned_data
+│ ├── data_cleaned.csv
├── figures
-├── results
│ └── some_figure.png
+├── results
+│ └── some_spreadsheet.xlsx
├── .gitignore
└── project_folder.Rproj
Ensure that your code is reproducible by never saving / loading
+the environment. Scripts should include code to read in relevant files,
+and can save key objects for re-use later. In Rstudio, go to
Tools > Global Options
and in the General
section, make sure that “Restore .Rdata into workspace on startup” is
-NOT checked, and make sure that “Save worskpace to .Rdata on exit:”
-dropdown is set to “Never”
Code that is meant to be shared should not include a hard-coded +setwd() or file paths based on the local machine directory structure. +Function calls that require file paths should be relative, such that +someone with a copy of the project directory can run the script without +needing to change those file paths.
Ensure that figure titles are correct. When copy-pasting
+figure-generation code to make comparable figures for different parts of
+the data (e.g., different stocks or different fisheries), it’s easy to
+accidentally leave old titles in place, leading to confusion. Consider
+using paste()
or glue()
with variable names or
+even r functions so that the figure title auto-updates
+appropriately.
## "fragile" version of plotting an mtcar variable; copy-pasting and plotting a second variable requires careful updating of ggtitle()
+dat.plot <- data |>
+ filter(fishery_title == "NT Area 10 Sport")
+ggplot(dat.plot, aes(x = stock, y = AEQ))+
+ geom_col()+
+ ggtitle("Chinook AEQ of NT Area 10 Sport")+
+ coord_flip()
+
+## robust version:
+fishery_plot <- "NT Area 10 Sport" ## define the fishery to plot in one place at the top
+dat.plot <- data |>
+ filter(fishery_title == fishery_plot) ## use variable in our filter function
+ggplot(dat.plot, aes(x = stock, y = AEQ))+
+ geom_col()+
+ ggtitle(paste("Chinook AEQ of", fishery_plot))+ ## use paste and variable name
+ coord_flip()
+
+## alternative robust version:
+dat.plot <- data |>
+ filter(fishery_title == "NT Area 10 Sport")
+ggplot(dat.plot, aes(x = stock, y = AEQ))+
+ geom_col()+
+ ggtitle(paste("Chinook AEQ of", dat.plot$fishery_title[1]))+ ## obtain the fishery name directly from dat.plot
+ coord_flip()
+library()
rather than
+require()
. Put all library calls at the top of the script,
+so that users immediately encounter errors if they have not yet
+installed relevant libraries.(Ty’s plan, Collin has regrets)
+Variables and columns of dataframes should be descriptive of their +contents while still being machine readable e.g. lacking all whitespace +and special charaters.
+# Good:
+mark_rate <- tibble()
+mortality.table <- read_csv('mortality_table.csv')
+
+# Bad:
+mr1.2 <- tibble()
+'Mortality Table` <- read_csv('mortality_table.csv')
+Often times column names imported into R from various sources have
+spaces, special characters, capitalization, or are just bizarre. The
+janitor
package’s clean_names()
function is a
+great automated solution to cleaning up dataframe names.
data <- readr::read_csv(here::here('data/ugly_column_names.csv'))
+
+data |>
+ janitor::clean_names()
+Naming +conventions are an important part of understanding code, below are +some common examples:
+Naming Convention | +Example | +
---|---|
Snake Case | +big_red_dog | +
Screaming Snake Case | +BIG_REG_DOG | +
Dot Case | +big.red.dog | +
Camel Case | +bigRedDog | +
Pascal Case | +BigRedDog | +
Although Hadley Wickham recommends snake case for R scripting, R has +no official naming convention. When writing a script a naming convention +should be chosen and be consistently used throughout the documents +entirety.
+There are a variety of assignment operators in the R scripting
+language, <-
, =
, <<-
as
+well as their directional reversals. The vast majority of assignments
+will either be <-
or =
, although
+essentially the same one should be chosen throughout the entire
+document.
The original pipe %>%
is a function of the
+magittr
package. The ‘native pipe’ |>
was
+introduced in R 4.0. These two perform essentially the same function,
+but with different placeholders which can lead to various errors in
+scripts when mix. One pipe should be used in the document.
======= The following are good general practices, but specific style +choices are often a matter of taste. Consistency is the most important +part – use the same style throughout your script.
+chinook_landed_catch
.<-
for assignment rather than =
chinook_landed_catch
. Using separators in variable
+names makes them easier to read. Using periods as separators becomes
+ambiguous when dealing with S3 methods
+<-
for assignment rather than =
.
+Always ensure there is a space before and after the assignment operator.
+This helps with visually distinguishing the assignment
+x <- 10
and the test x < -10
.x_mean = mean(x)
instead of
+mean = mean(X)
, and cur_plot = ggplot(...
+instead of plot = ggplot(...
).mtcars$cyl
or
+mtcars[, cyl]
rather than mtcars[, 2]
)We often need to create graphics to show aspects of the data. There @@ -507,6 +694,35 @@
plot
)assign()
). If you need a function
+to create several objects, have the function return a list of those
+objects. (Note that file manipulation is an obvious exception to the
+general aim to avoid side-effects in functions; functions can read or
+write)source()
to call them from a single main
+script. This can be especially effective for scripts that contain many
+custom function definitions – move the functions to a separate script
+that gets source()
ed at the top of the remaining code leads
+to a primary script that is easy to read, and a companion script that is
+just the definitions of functions.When multiple people are collaborating on a project, it gets very @@ -550,6 +766,17 @@
here::here()
renv