diff --git a/.github/workflows/quarto-publish.yml b/.github/workflows/quarto-publish.yml
index d89462b..937f7e1 100644
--- a/.github/workflows/quarto-publish.yml
+++ b/.github/workflows/quarto-publish.yml
@@ -25,9 +25,12 @@ jobs:
- name: Install R packages
uses: r-lib/actions/setup-r-dependencies@v2
with:
- packages:
+ packages: |
any::knitr
any::rmarkdown
+ any::palmerpenguins
+ any::tidyr
+ any::dplyr
- name: Render and Publish
uses: quarto-dev/quarto-actions/publish@v2
diff --git a/_quarto.yml b/_quarto.yml
index b804089..38e52d4 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -40,11 +40,10 @@ website:
format:
html:
- theme: litera
- css: styles.css
+ theme: [litera,styles.scss]
code-copy: true
code-overflow: wrap
code-line-numbers: true
code-block-border-left: true
- highlight-style: zenburn
+ highlight-style: nord
diff --git a/posts/2024-11-19-the-apply-function-family/2024-11-19-the-apply-function-family.qmd b/posts/2024-11-19-the-apply-function-family/2024-11-19-the-apply-function-family.qmd
new file mode 100644
index 0000000..b2eb7ac
--- /dev/null
+++ b/posts/2024-11-19-the-apply-function-family/2024-11-19-the-apply-function-family.qmd
@@ -0,0 +1,373 @@
+---
+title: "The apply() function family"
+author: "Nicolas Casajus"
+date: "2024-11-19"
+categories: [r, apply, lapply, tapply, recursive, function, loop]
+image: ""
+toc: true
+draft: false
+lightbox: true
+code-overflow: scroll
+---
+
+In this post, we will discuss about the family of `apply()` functions. These functions allows you to recursively apply a function across all elements of a `vector`, `list`, `matrix`, or `data.frame`. The `apply()`
+family is an interesting alternative to the `for` loop because it wraps the loop into a simple function.
+
+
+The functions in the `apply()` family differ in their input and output types:
+
+| Function | Description |
+|:------------------------------------------------:|:--------------------------------------------------------------------------|
+| [`apply()`](https://rdrr.io/r/base/apply.html) | Applies a function to **margins** of an `array`, `matrix` or `data.frame` (2D objects) |
+| [`lapply()`](https://rdrr.io/r/base/lapply.html) | Applies a function over a `list` or `vector` and returns a `list` |
+| [`sapply()`](https://rdrr.io/r/base/lapply.html) | Wrapper of `lapply` but returns a `vector` or `matrix` (volatile) |
+| [`vapply()`](https://rdrr.io/r/base/lapply.html) | Similar to `sapply` but safer |
+| [`tapply()`](https://rdrr.io/r/base/tapply.html) | Applies a function to a **group** of data grouped by one or more factors and returns an `array` |
+
+
+
+**NB.** Here we won't talk about `sapply()` and `vapply()` as there are similar to `lapply()`.
+
+
+
+
+
+
+## Dataset
+
+To illustrate to use of `apply()` functions, we will use the [`palmerpenguins`](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) package. It contains the `penguins` dataset with size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.
+
+![Atwork by [Allison Horst](https://allisonhorst.com/)](palmer-penguins.png){width=60%}
+
+> These data were collected from 2007 and 2009 by Dr. Kristen Gorman and are released under the [CC0](https://creativecommons.org/public-domain/cc0/) license.
+
+
+
+
+
+Let's install the released version of [`palmerpenguins`](https://allisonhorst.github.io/palmerpenguins/) package from [CRAN](https://cran.r-project.org/):
+
+```{r}
+#| echo: true
+#| eval: false
+
+## Install 'palmerpenguins' package ----
+install.packages("palmerpenguins")
+```
+
+
+Now, let's import the dataset:
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Import 'penguins' dataset ----
+library("palmerpenguins")
+
+penguins
+```
+
+For this post, we will use a subset of this dataset:
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Columns to keep ----
+cols <- c("species", "island", "bill_length_mm", "bill_depth_mm",
+ "body_mass_g")
+
+## Subset data ----
+penguins <- penguins[ , cols]
+penguins
+```
+
+
+
+## The `apply()` function
+
+The `apply()` lets you perform a function across rows or columns of a `data.frame` (or any types of 2-dimension objects).
+
+- the first argument `X` specifies the data
+- the second argument `MARGIN` specifies the direction (`1` for rows, `2` for columns)
+- the third argument `FUN` is the function to apply
+
+Let's compute the arithmetic mean of the columns `bill_length_mm`, `bill_depth_mm` and `body_mass_g` by applying the `mean()` function **across columns** 3 to 5 of the `penguins` dataset.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Mean of columns 3, 4 and 5 ----
+apply(penguins[ , 3:5], 2, mean)
+```
+
+We can pass arguments to the function `mean()` by using the argument `...` of the function `apply()`. Let's remove missing values before computing the mean.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Use additional arguments ----
+apply(penguins[ , 3:5], 2, mean, na.rm = TRUE)
+```
+
+Note that the `apply()` functions are pipe-friendly.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Pipe version ----
+penguins[ , 3:5] |>
+ apply(2, mean, na.rm = TRUE)
+```
+
+We can also use a custom function.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Custom function ----
+apply(penguins[ , 3:5], 2, function(x) mean(x, na.rm = TRUE))
+```
+
+Finally, we can define a custom function outside the `apply()` function.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Custom function ----
+my_mean <- function(x, na_rm = FALSE) {
+ mean(x, na.rm = na_rm)
+}
+
+apply(penguins[ , 3:5], 2, my_mean, na_rm = TRUE)
+```
+
+
+The output is a vector, but in some cases it can be a `matrix` (or an `array`).
+
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Different output class ----
+apply(penguins[ , 3:5], 2, range, na.rm = TRUE)
+```
+
+
+
+
+## The `lapply()` function
+
+The `lapply()` function performs a function on each element of a `list` or `vector`.
+
+- the first argument `X` specifies the `list` or the `vector`
+- the second argument `FUN` is the function to apply
+
+Let's try to compute the arithmetic mean of the columns `bill_length_mm`, `bill_depth_mm` and `body_mass_g`.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Column names ----
+columns <- c("bill_length_mm", "bill_depth_mm", "body_mass_g")
+
+## Mean of columns 3, 4 and 5 ----
+lapply(columns, function(x) {
+ penguins[ , x, drop = TRUE] |>
+ mean(na.rm = TRUE)
+})
+```
+
+The output is a `list` of same length as `X`, and we can simplified it by using `unlist()`. We can do this because the output for each iteration is a single value.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Column names ----
+columns <- c("bill_length_mm", "bill_depth_mm", "body_mass_g")
+
+## Mean of columns 3, 4 and 5 ----
+values <- lapply(columns, function(x) {
+ penguins[ , x, drop = TRUE] |>
+ mean(na.rm = TRUE)
+})
+
+## Simplify output ----
+unlist(values)
+```
+
+And we can name values.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Column names ----
+columns <- c("bill_length_mm", "bill_depth_mm", "body_mass_g")
+
+## Mean of columns 3, 4 and 5 ----
+values <- lapply(columns, function(x) {
+ penguins[ , x, drop = TRUE] |>
+ mean(na.rm = TRUE)
+})
+
+## Simplify output ----
+values <- unlist(values)
+
+## Name elements ----
+names(values) <- columns
+
+values
+```
+
+
+The `lapply()` allows you to perform complex tasks.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Column names ----
+columns <- c("bill_length_mm", "bill_depth_mm", "body_mass_g")
+
+## Mean, min and max of columns 3, 4 and 5 ----
+values <- lapply(columns, function(x) {
+ column <- penguins[ , x, drop = TRUE]
+ data.frame("trait" = x,
+ "mean" = mean(column, na.rm = TRUE),
+ "min" = min(column, na.rm = TRUE),
+ "max" = max(column, na.rm = TRUE))
+})
+
+values
+```
+
+
+Let's simplify the output into a single `data.frame` by recursively applying (with `do.call()`) the function `rbind.data.frame()` to each `data.frame` of the `list`.
+
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Simplify output ----
+values <- do.call(rbind.data.frame, values)
+
+values
+```
+
+**NB.** Here the object `penguins` is retrieved from the global environment. But it's safer to explicitly use it like this:
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Column names ----
+columns <- c("bill_length_mm", "bill_depth_mm", "body_mass_g")
+
+## Mean, min and max of columns 3, 4 and 5 ----
+values <- lapply(columns, function(x, data) {
+ column <- data[ , x, drop = TRUE]
+ data.frame("trait" = x,
+ "mean" = mean(column, na.rm = TRUE),
+ "min" = min(column, na.rm = TRUE),
+ "max" = max(column, na.rm = TRUE))
+}, data = penguins)
+
+do.call(rbind.data.frame, values)
+```
+
+
+
+
+
+## The `tapply()` function
+
+The `tapply()` allows you to perform a function across specified groups in your data. For `dplyr` users, it's equivalent to the `group_by()` and `summarize()` functions.
+
+- the first argument `X` specifies the values
+- the second argument `INDEX` specifies the groups
+- the third argument `FUN` is the function to apply
+
+Lets' compute the mean of `bill_length_mm` for each species.
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Average bill length for each species ----
+tapply(penguins$"bill_length_mm", penguins$"species", function(x) {
+ mean(x, na.rm = TRUE)
+})
+```
+
+We can group values according to two variables.
+
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Average bill length for each species ----
+tapply(penguins$"bill_length_mm",
+ list(penguins$"species", penguins$"island"),
+ function(x) mean(x, na.rm = TRUE))
+```
+
+Here the output is a `matrix`. We can convert it to long `data.frame` w/ `tidyr::pivot_longer()`.
+
+
+
+```{r}
+#| echo: true
+#| eval: true
+
+## Load 'dplyr' package ----
+library("tidyr")
+
+## Average bill length for each species and island ----
+values <- tapply(penguins$"bill_length_mm",
+ list(penguins$"species", penguins$"island"),
+ function(x) mean(x, na.rm = TRUE))
+
+## Convert to data.frame ----
+values <- data.frame(values)
+values$"species" <- rownames(values)
+
+
+## Pivot data ----
+values |>
+ pivot_longer(cols = !species,
+ values_to = "bill_length_mm",
+ names_to = "island")
+```
+
+
+This is equivalent to `dplyr` approach.
+
+
+```{r}
+#| echo: true
+#| eval: true
+#| message: false
+#| warning: false
+
+## Load 'dplyr' package ----
+library("dplyr")
+
+## Summarise data ----
+penguins %>%
+ group_by(species, island) %>%
+ summarize(bill_length_mm = mean(bill_length_mm,
+ na.rm = TRUE)) %>%
+ ungroup()
+```
diff --git a/posts/2024-11-19-the-apply-function-family/palmer-penguins.png b/posts/2024-11-19-the-apply-function-family/palmer-penguins.png
new file mode 100644
index 0000000..736ae89
Binary files /dev/null and b/posts/2024-11-19-the-apply-function-family/palmer-penguins.png differ
diff --git a/styles.css b/styles.css
deleted file mode 100644
index 88bf42a..0000000
--- a/styles.css
+++ /dev/null
@@ -1,18 +0,0 @@
-.bouton {
- margin-left: 1%;
- margin-right: 1%;
-}
-
-.btn {
- border-radius: 0.5rem;
-}
-
-a.bouton {
- color: #375A7F;
- text-decoration: none;
-}
-
-a.bouton:hover {
- color: white;
- text-decoration: none;
-}
diff --git a/styles.scss b/styles.scss
new file mode 100644
index 0000000..600da17
--- /dev/null
+++ b/styles.scss
@@ -0,0 +1,26 @@
+/*-- scss:defaults --*/
+
+@import url('https://fonts.googleapis.com/css2?family=Fira+Code:wght@300;400;500;600;700&display=swap');
+
+$font-family-monospace: "Fira Code", monospace !default;
+
+/*-- scss:rules --*/
+
+.bouton {
+ margin-left: 1%;
+ margin-right: 1%;
+}
+
+.btn {
+ border-radius: 0.5rem;
+}
+
+a.bouton {
+ color: #375A7F;
+ text-decoration: none;
+}
+
+a.bouton:hover {
+ color: white;
+ text-decoration: none;
+}