easystats · etiennebacher · Jul 14, 2024 · Jul 13, 2024 · Jul 13, 2024 · Jul 13, 2024
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Type: Package
 Package: datawizard
 Title: Easy Data Wrangling and Statistical Transformations
-Version: 0.12.0
+Version: 0.12.0.1
 Authors@R: c(
     person("Indrajeet", "Patil", , "[email protected]", role = "aut",
            comment = c(ORCID = "0000-0003-1995-6531", Twitter = "@patilindrajeets")),
@@ -21,10 +21,10 @@ Authors@R: c(
     person("Robert", "Garrett", , "[email protected]", role = "rev")
   )
 Maintainer: Etienne Bacher <[email protected]>
-Description: A lightweight package to assist in key steps involved in any data 
-    analysis workflow: (1) wrangling the raw data to get it in the needed form, 
-    (2) applying preprocessing steps and statistical transformations, and 
-    (3) compute statistical summaries of data properties and distributions. 
+Description: A lightweight package to assist in key steps involved in any data
+    analysis workflow: (1) wrangling the raw data to get it in the needed form,
+    (2) applying preprocessing steps and statistical transformations, and
+    (3) compute statistical summaries of data properties and distributions.
     It is also the data wrangling backend for packages in 'easystats' ecosystem.
     References: Patil et al. (2022) <doi:10.21105/joss.04684>.
 License: MIT + file LICENSE
@@ -36,7 +36,7 @@ Imports:
     insight (>= 0.20.1),
     stats,
     utils
-Suggests: 
+Suggests:
     bayestestR,
     boot,
     brms,
@@ -68,7 +68,7 @@ Suggests:
     tibble,
     tidyr,
     withr
-VignetteBuilder: 
+VignetteBuilder:
     knitr
 Encoding: UTF-8
 Language: en-US

diff --git a/vignettes/tidyverse_translation.Rmd b/vignettes/tidyverse_translation.Rmd
@@ -1,6 +1,6 @@
 ---
 title: "Coming from 'tidyverse'"
-output: 
+output:
   rmarkdown::html_vignette:
     toc: true
 vignette: >
@@ -22,7 +22,8 @@
 pkgs <- c(
   "dplyr",
   "datawizard",
-  "tidyr"
+  "tidyr",
+  "htmltools"
 )
 
 # since we explicitely put eval = TRUE for some chunks, we can't rely on
@@ -33,9 +34,11 @@
 if (!all(vapply(pkgs, requireNamespace, quietly = TRUE, FUN.VALUE = logical(1L))) || getRversion() < "4.1.0") {
   evaluate_chunk <- FALSE
 }
+```
 
+```{r echo=FALSE, message=FALSE, eval=evaluate_chunk}
 row <- function(...) {
-  div(
+  htmltools::div(
     class = "custom_note",
     ...
   )
@@ -63,24 +66,24 @@
 
 # Introduction
 
-`{datawizard}` package aims to make basic data wrangling easier than 
+`{datawizard}` package aims to make basic data wrangling easier than
 with base R. The data wrangling workflow it supports is similar to the one
 supported by the tidyverse package combination of `{dplyr}` and `{tidyr}`. However,
 one of its main features is that it has a very few dependencies: `{stats}` and `{utils}`
-(included in base R) and `{insight}`, which is the core package of the _easystats_ 
-ecosystem. This package grew organically to simultaneously satisfy the 
+(included in base R) and `{insight}`, which is the core package of the _easystats_
+ecosystem. This package grew organically to simultaneously satisfy the
 "0 non-base hard dependency" principle of _easystats_ and the data wrangling needs
-of the constituent packages in this ecosystem. It is also 
-important to note that `{datawizard}` was designed to avoid namespace collisions 
+of the constituent packages in this ecosystem. It is also
+important to note that `{datawizard}` was designed to avoid namespace collisions
 with `{tidyverse}` packages.
 
-In this article, we will see how to go through basic data wrangling steps with 
-`{datawizard}`. We will also compare it to the `{tidyverse}` syntax for achieving the same. 
+In this article, we will see how to go through basic data wrangling steps with
+`{datawizard}`. We will also compare it to the `{tidyverse}` syntax for achieving the same.
 This way, if you decide to make the switch, you can easily find the translations here.
 This vignette is largely inspired from `{dplyr}`'s [Getting started vignette](https://dplyr.tidyverse.org/articles/dplyr.html).
 
 ```{r echo=FALSE}
 row("Note: In this vignette, we use the native pipe-operator, `|>`, which was introduced in R 4.1. Users of R version 3.6 or 4.0 should replace the native pipe by magrittr's one (`%>%`) so that examples work.")
 ```

 ```{r, eval = evaluate_chunk}
@@ -94,7 +97,7 @@
 
 # Workhorses
 
-Before we look at their *tidyverse* equivalents, we can first have a look at 
+Before we look at their *tidyverse* equivalents, we can first have a look at
 `{datawizard}`'s key functions for data wrangling:
 
 | Function          | Operation                                         |
@@ -147,14 +150,14 @@

 ```{r filter, class.source = "datawizard"}
 # ---------- datawizard -----------
 starwars |>
  data_filter(
    skin_color == "light",
    eye_color == "brown"
  )

 # or
 starwars |>
  data_filter(
    skin_color == "light" &
      eye_color == "brown"
@@ -166,7 +169,7 @@

 ```{r, class.source = "tidyverse"}
 # ---------- tidyverse -----------
 starwars |>
  filter(
    skin_color == "light",
    eye_color == "brown"
@@ -187,9 +190,9 @@
 
 ## Selecting {#selecting}
 
-`data_select()` is the equivalent of `dplyr::select()`. 
+`data_select()` is the equivalent of `dplyr::select()`.
 The main difference between these two functions is that `data_select()` uses two
-arguments (`select` and `exclude`) and requires quoted column names if we want to 
+arguments (`select` and `exclude`) and requires quoted column names if we want to
 select several variables, while `dplyr::select()` accepts any unquoted column names.
 
 :::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
@@ -207,7 +210,7 @@

 ```{r, class.source = "tidyverse"}
 # ---------- tidyverse -----------
 starwars |>
  select(hair_color, skin_color, eye_color)
 ```
 :::
@@ -251,7 +254,7 @@

 ```{r select3, class.source = "datawizard"}
 # ---------- datawizard -----------
 starwars |>
  data_select(select = -(hair_color:eye_color))
 ```
 :::
@@ -260,7 +263,7 @@

 ```{r, class.source = "tidyverse"}
 # ---------- tidyverse -----------
 starwars |>
  select(!(hair_color:eye_color))
 ```
 :::
@@ -303,7 +306,7 @@

 ```{r select5, class.source = "datawizard"}
 # ---------- datawizard -----------
 starwars |>
  data_select(select = is.numeric)
 ```
 :::
@@ -327,17 +330,17 @@
 
 ## Modifying {#modifying}
 
-`data_modify()` is a wrapper around `base::transform()` but has several additional 
-benefits: 
+`data_modify()` is a wrapper around `base::transform()` but has several additional
+benefits:
 
 * it allows us to use newly created variables in the following expressions;
 * it works with grouped data;
 * it preserves variable attributes such as labels;
 * it accepts expressions as character vectors so that it is easy to program with it
 
 
-This last point is also the main difference between `data_modify()` and 
-`dplyr::mutate()`. 
+This last point is also the main difference between `data_modify()` and
+`dplyr::mutate()`.
 
 :::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
 
@@ -420,7 +423,7 @@

 ```{r, class.source = "tidyverse"}
 # ---------- tidyverse -----------
 starwars |>
  arrange(hair_color, height)
 ```
 :::
@@ -430,7 +433,7 @@
 ```{r arrange1, eval = evaluate_chunk, echo = FALSE}
 ```
 
-You can also sort variables in descending order by putting a `"-"` in front of 
+You can also sort variables in descending order by putting a `"-"` in front of
 their name, like below:
 
 :::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
@@ -459,15 +462,15 @@
 
 ## Extracting {#extracting}
 
-Although we mostly work on data frames, it is sometimes useful to extract a single 
-column as a vector. This can be done with `data_extract()`, which reproduces the 
+Although we mostly work on data frames, it is sometimes useful to extract a single
+column as a vector. This can be done with `data_extract()`, which reproduces the
 behavior of `dplyr::pull()`:
 
 :::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
 :::{}
 ```{r extract1, class.source = "datawizard"}
 # ---------- datawizard -----------
 starwars |>
  data_extract(gender)
 ```
 :::
@@ -499,9 +502,9 @@
 
 ## Renaming {#renaming}
 
-`data_rename()` is the equivalent of `dplyr::rename()` but the syntax between the 
+`data_rename()` is the equivalent of `dplyr::rename()` but the syntax between the
 two is different. While `dplyr::rename()` takes new-old pairs of column
-names, `data_rename()` requires a vector of column names to rename, and then 
+names, `data_rename()` requires a vector of column names to rename, and then
 a vector of new names for these columns that must be of the same length.
 
 :::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
@@ -535,8 +538,8 @@
 ```{r rename1, eval = evaluate_chunk, echo = FALSE}
 ```
 
-The way `data_rename()` is designed makes it easy to apply the same modifications 
-to a vector of column names. For example, we can remove underscores and use 
+The way `data_rename()` is designed makes it easy to apply the same modifications
+to a vector of column names. For example, we can remove underscores and use
 TitleCase with the following code:
 
 ```{r rename2}
@@ -552,8 +555,8 @@
 ```{r rename2, eval = evaluate_chunk, echo = FALSE}
 ```
 
-It is also possible to add a prefix or a suffix to all or a subset of variables 
-with `data_addprefix()` and `data_addsuffix()`. The argument `select` accepts 
+It is also possible to add a prefix or a suffix to all or a subset of variables
+with `data_addprefix()` and `data_addsuffix()`. The argument `select` accepts
 all select helpers that we saw above with `data_select()`:
 
 ```{r rename3}
@@ -577,7 +580,7 @@
 Rather than typing many names in `data_select()`, we can use `data_relocate()`,
 which is the equivalent of `dplyr::relocate()`. Just like `data_select()`, we can
 specify a list of variables we want to relocate with `select` and `exclude`.
-Then, the arguments `before` and `after`^[Note that we use `before` and `after` 
+Then, the arguments `before` and `after`^[Note that we use `before` and `after`
 whereas `dplyr::relocate()` uses `.before` and `.after`.] specify where the selected columns should
 be relocated:
 
@@ -591,7 +594,7 @@
   data_relocate(sex:homeworld, before = "height")
 ```
 :::
-  
+
 ::: {}
 
 ```{r, class.source = "tidyverse"}
@@ -600,14 +603,14 @@
   relocate(sex:homeworld, .before = height)
 ```
 :::
-  
+
 ::::
 
 ```{r relocate1, eval = evaluate_chunk, echo = FALSE}
 ```
 
 In addition to column names, `before` and `after` accept column indices. Finally,
-one can use `before = -1` to relocate the selected columns just before the last 
+one can use `before = -1` to relocate the selected columns just before the last
 column, or `after = -1` to relocate them after the last column.
 
 ```{r eval = evaluate_chunk}
@@ -622,10 +625,10 @@
 ### Longer
 
 Reshaping data from wide to long or from long to wide format can be done with
-`data_to_long()` and `data_to_wide()`. These functions were designed to match 
-`tidyr::pivot_longer()` and `tidyr::pivot_wider()` arguments, so that the only 
-thing to do is to change the function name. However, not all of 
-`tidyr::pivot_longer()` and `tidyr::pivot_wider()` features are available yet. 
+`data_to_long()` and `data_to_wide()`. These functions were designed to match
+`tidyr::pivot_longer()` and `tidyr::pivot_wider()` arguments, so that the only
+thing to do is to change the function name. However, not all of
+`tidyr::pivot_longer()` and `tidyr::pivot_wider()` features are available yet.
 
 We will use the `relig_income` dataset, as in the [`{tidyr}` vignette](https://tidyr.tidyverse.org/articles/pivot.html).
 
@@ -634,11 +637,11 @@
 ```
 
 
-We would like to reshape this dataset to have 3 columns: religion, count, and 
-income. The column "religion" doesn't need to change, so we exclude it with 
-`-religion`. Then, each remaining column corresponds to an income category. 
-Therefore, we want to move all these column names to a single column called 
-"income". Finally, the values corresponding to each of these columns will be 
+We would like to reshape this dataset to have 3 columns: religion, count, and
+income. The column "religion" doesn't need to change, so we exclude it with
+`-religion`. Then, each remaining column corresponds to an income category.
+Therefore, we want to move all these column names to a single column called
+"income". Finally, the values corresponding to each of these columns will be
 reshaped to be in a single new column, called "count".
 
 :::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
@@ -765,12 +768,12 @@
 
 <!-- explain a bit more the args of data_join -->
 
-In `{datawizard}`, joining datasets is done with `data_join()` (or its alias 
-`data_merge()`). Contrary to `{dplyr}`, this unique function takes care of all 
+In `{datawizard}`, joining datasets is done with `data_join()` (or its alias
+`data_merge()`). Contrary to `{dplyr}`, this unique function takes care of all
 types of join, which are then specified inside the function with the argument
 `join` (by default, `join = "left"`).
 
-Below, we show how to perform the four most common joins: full, left, right and 
+Below, we show how to perform the four most common joins: full, left, right and
 inner. We will use the datasets `band_members`and `band_instruments` provided by `{dplyr}`:
 
 :::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
@@ -935,7 +938,7 @@
   )
 ```
 :::
-  
+
 ::: {}
 
 ```{r, class.source = "tidyverse"}
@@ -948,7 +951,7 @@
   )
 ```
 :::
-  
+
 ::::
 
 ```{r unite1, eval = evaluate_chunk, echo = FALSE}
@@ -969,7 +972,7 @@
   )
 ```
 :::
-  
+
 ::: {}
 
 ```{r, class.source = "tidyverse"}
@@ -983,7 +986,7 @@
   )
 ```
 :::
-  
+
 ::::
 
 ```{r unite2, eval = evaluate_chunk, echo = FALSE}
@@ -1017,7 +1020,7 @@
   )
 ```
 :::
-  
+
 ::: {}
 
 ```{r, class.source = "tidyverse"}
@@ -1029,7 +1032,7 @@
   )
 ```
 :::
-  
+
 ::::
 
 ```{r separate1, eval = evaluate_chunk, echo = FALSE}
@@ -1051,9 +1054,9 @@
 
 # Other useful functions
 
-`{datawizard}` contains other functions that are not necessarily included in 
-`{dplyr}` or `{tidyr}` or do not directly modify the data. Some of them are 
-inspired from the package `janitor`. 
+`{datawizard}` contains other functions that are not necessarily included in
+`{dplyr}` or `{tidyr}` or do not directly modify the data. Some of them are
+inspired from the package `janitor`.
 
 ## Work with rownames
 
@@ -1079,7 +1082,7 @@
 The main difference is when we use it with grouped data. While `tibble::rowid_to_column()`
 uses one distinct rowid for every row in the dataset, `rowid_as_column()` creates
 one id for every row *in each group*. Therefore, two rows in different groups
-can have the same row id. 
+can have the same row id.
 
 This means that `rowid_as_column()` is closer to using `n()` in `mutate()`, like
 the following: