Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CRAN errors #525

Merged
merged 3 commits into from
Jul 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: datawizard
Title: Easy Data Wrangling and Statistical Transformations
Version: 0.12.0
Version: 0.12.0.1
Authors@R: c(
person("Indrajeet", "Patil", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0003-1995-6531", Twitter = "@patilindrajeets")),
Expand All @@ -21,10 +21,10 @@ Authors@R: c(
person("Robert", "Garrett", , "[email protected]", role = "rev")
)
Maintainer: Etienne Bacher <[email protected]>
Description: A lightweight package to assist in key steps involved in any data
analysis workflow: (1) wrangling the raw data to get it in the needed form,
(2) applying preprocessing steps and statistical transformations, and
(3) compute statistical summaries of data properties and distributions.
Description: A lightweight package to assist in key steps involved in any data
analysis workflow: (1) wrangling the raw data to get it in the needed form,
(2) applying preprocessing steps and statistical transformations, and
(3) compute statistical summaries of data properties and distributions.
It is also the data wrangling backend for packages in 'easystats' ecosystem.
References: Patil et al. (2022) <doi:10.21105/joss.04684>.
License: MIT + file LICENSE
Expand All @@ -36,7 +36,7 @@ Imports:
insight (>= 0.20.1),
stats,
utils
Suggests:
Suggests:
bayestestR,
boot,
brms,
Expand Down Expand Up @@ -68,7 +68,7 @@ Suggests:
tibble,
tidyr,
withr
VignetteBuilder:
VignetteBuilder:
knitr
Encoding: UTF-8
Language: en-US
Expand Down
107 changes: 55 additions & 52 deletions vignettes/tidyverse_translation.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Coming from 'tidyverse'"
output:
output:
rmarkdown::html_vignette:
toc: true
vignette: >
Expand All @@ -22,7 +22,8 @@
pkgs <- c(
"dplyr",
"datawizard",
"tidyr"
"tidyr",
"htmltools"
)

# since we explicitely put eval = TRUE for some chunks, we can't rely on
Expand All @@ -33,9 +34,11 @@
if (!all(vapply(pkgs, requireNamespace, quietly = TRUE, FUN.VALUE = logical(1L))) || getRversion() < "4.1.0") {
evaluate_chunk <- FALSE
}
```

```{r echo=FALSE, message=FALSE, eval=evaluate_chunk}
row <- function(...) {
div(
htmltools::div(
class = "custom_note",
...
)
Expand Down Expand Up @@ -63,24 +66,24 @@

# Introduction

`{datawizard}` package aims to make basic data wrangling easier than
`{datawizard}` package aims to make basic data wrangling easier than
with base R. The data wrangling workflow it supports is similar to the one
supported by the tidyverse package combination of `{dplyr}` and `{tidyr}`. However,
one of its main features is that it has a very few dependencies: `{stats}` and `{utils}`
(included in base R) and `{insight}`, which is the core package of the _easystats_
ecosystem. This package grew organically to simultaneously satisfy the
(included in base R) and `{insight}`, which is the core package of the _easystats_
ecosystem. This package grew organically to simultaneously satisfy the
"0 non-base hard dependency" principle of _easystats_ and the data wrangling needs
of the constituent packages in this ecosystem. It is also
important to note that `{datawizard}` was designed to avoid namespace collisions
of the constituent packages in this ecosystem. It is also
important to note that `{datawizard}` was designed to avoid namespace collisions
with `{tidyverse}` packages.

In this article, we will see how to go through basic data wrangling steps with
`{datawizard}`. We will also compare it to the `{tidyverse}` syntax for achieving the same.
In this article, we will see how to go through basic data wrangling steps with
`{datawizard}`. We will also compare it to the `{tidyverse}` syntax for achieving the same.
This way, if you decide to make the switch, you can easily find the translations here.
This vignette is largely inspired from `{dplyr}`'s [Getting started vignette](https://dplyr.tidyverse.org/articles/dplyr.html).

```{r echo=FALSE}
row("Note: In this vignette, we use the native pipe-operator, `|>`, which was introduced in R 4.1. Users of R version 3.6 or 4.0 should replace the native pipe by magrittr's one (`%>%`) so that examples work.")

Check warning on line 86 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=86,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 210 characters.
```

```{r, eval = evaluate_chunk}
Expand All @@ -94,7 +97,7 @@

# Workhorses

Before we look at their *tidyverse* equivalents, we can first have a look at
Before we look at their *tidyverse* equivalents, we can first have a look at
`{datawizard}`'s key functions for data wrangling:

| Function | Operation |
Expand Down Expand Up @@ -147,14 +150,14 @@

```{r filter, class.source = "datawizard"}
# ---------- datawizard -----------
starwars |>

Check warning on line 153 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=153,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
data_filter(
skin_color == "light",
eye_color == "brown"
)

# or
starwars |>

Check warning on line 160 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=160,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
data_filter(
skin_color == "light" &
eye_color == "brown"
Expand All @@ -166,7 +169,7 @@

```{r, class.source = "tidyverse"}
# ---------- tidyverse -----------
starwars |>

Check warning on line 172 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=172,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
filter(
skin_color == "light",
eye_color == "brown"
Expand All @@ -187,9 +190,9 @@

## Selecting {#selecting}

`data_select()` is the equivalent of `dplyr::select()`.
`data_select()` is the equivalent of `dplyr::select()`.
The main difference between these two functions is that `data_select()` uses two
arguments (`select` and `exclude`) and requires quoted column names if we want to
arguments (`select` and `exclude`) and requires quoted column names if we want to
select several variables, while `dplyr::select()` accepts any unquoted column names.

:::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
Expand All @@ -207,7 +210,7 @@

```{r, class.source = "tidyverse"}
# ---------- tidyverse -----------
starwars |>

Check warning on line 213 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=213,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
select(hair_color, skin_color, eye_color)
```
:::
Expand Down Expand Up @@ -251,7 +254,7 @@

```{r select3, class.source = "datawizard"}
# ---------- datawizard -----------
starwars |>

Check warning on line 257 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=257,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
data_select(select = -(hair_color:eye_color))
```
:::
Expand All @@ -260,7 +263,7 @@

```{r, class.source = "tidyverse"}
# ---------- tidyverse -----------
starwars |>

Check warning on line 266 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=266,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
select(!(hair_color:eye_color))
```
:::
Expand Down Expand Up @@ -303,7 +306,7 @@

```{r select5, class.source = "datawizard"}
# ---------- datawizard -----------
starwars |>

Check warning on line 309 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=309,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
data_select(select = is.numeric)
```
:::
Expand All @@ -327,17 +330,17 @@

## Modifying {#modifying}

`data_modify()` is a wrapper around `base::transform()` but has several additional
benefits:
`data_modify()` is a wrapper around `base::transform()` but has several additional
benefits:

* it allows us to use newly created variables in the following expressions;
* it works with grouped data;
* it preserves variable attributes such as labels;
* it accepts expressions as character vectors so that it is easy to program with it


This last point is also the main difference between `data_modify()` and
`dplyr::mutate()`.
This last point is also the main difference between `data_modify()` and
`dplyr::mutate()`.

:::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}

Expand Down Expand Up @@ -420,7 +423,7 @@

```{r, class.source = "tidyverse"}
# ---------- tidyverse -----------
starwars |>

Check warning on line 426 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=426,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
arrange(hair_color, height)
```
:::
Expand All @@ -430,7 +433,7 @@
```{r arrange1, eval = evaluate_chunk, echo = FALSE}
```

You can also sort variables in descending order by putting a `"-"` in front of
You can also sort variables in descending order by putting a `"-"` in front of
their name, like below:

:::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
Expand Down Expand Up @@ -459,15 +462,15 @@

## Extracting {#extracting}

Although we mostly work on data frames, it is sometimes useful to extract a single
column as a vector. This can be done with `data_extract()`, which reproduces the
Although we mostly work on data frames, it is sometimes useful to extract a single
column as a vector. This can be done with `data_extract()`, which reproduces the
behavior of `dplyr::pull()`:

:::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
:::{}
```{r extract1, class.source = "datawizard"}
# ---------- datawizard -----------
starwars |>

Check warning on line 473 in vignettes/tidyverse_translation.Rmd

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=vignettes/tidyverse_translation.Rmd,line=473,col=1,[one_call_pipe_linter] Avoid pipe |> for expressions with only a single call.
data_extract(gender)
```
:::
Expand Down Expand Up @@ -499,9 +502,9 @@

## Renaming {#renaming}

`data_rename()` is the equivalent of `dplyr::rename()` but the syntax between the
`data_rename()` is the equivalent of `dplyr::rename()` but the syntax between the
two is different. While `dplyr::rename()` takes new-old pairs of column
names, `data_rename()` requires a vector of column names to rename, and then
names, `data_rename()` requires a vector of column names to rename, and then
a vector of new names for these columns that must be of the same length.

:::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
Expand Down Expand Up @@ -535,8 +538,8 @@
```{r rename1, eval = evaluate_chunk, echo = FALSE}
```

The way `data_rename()` is designed makes it easy to apply the same modifications
to a vector of column names. For example, we can remove underscores and use
The way `data_rename()` is designed makes it easy to apply the same modifications
to a vector of column names. For example, we can remove underscores and use
TitleCase with the following code:

```{r rename2}
Expand All @@ -552,8 +555,8 @@
```{r rename2, eval = evaluate_chunk, echo = FALSE}
```

It is also possible to add a prefix or a suffix to all or a subset of variables
with `data_addprefix()` and `data_addsuffix()`. The argument `select` accepts
It is also possible to add a prefix or a suffix to all or a subset of variables
with `data_addprefix()` and `data_addsuffix()`. The argument `select` accepts
all select helpers that we saw above with `data_select()`:

```{r rename3}
Expand All @@ -577,7 +580,7 @@
Rather than typing many names in `data_select()`, we can use `data_relocate()`,
which is the equivalent of `dplyr::relocate()`. Just like `data_select()`, we can
specify a list of variables we want to relocate with `select` and `exclude`.
Then, the arguments `before` and `after`^[Note that we use `before` and `after`
Then, the arguments `before` and `after`^[Note that we use `before` and `after`
whereas `dplyr::relocate()` uses `.before` and `.after`.] specify where the selected columns should
be relocated:

Expand All @@ -591,7 +594,7 @@
data_relocate(sex:homeworld, before = "height")
```
:::

::: {}

```{r, class.source = "tidyverse"}
Expand All @@ -600,14 +603,14 @@
relocate(sex:homeworld, .before = height)
```
:::

::::

```{r relocate1, eval = evaluate_chunk, echo = FALSE}
```

In addition to column names, `before` and `after` accept column indices. Finally,
one can use `before = -1` to relocate the selected columns just before the last
one can use `before = -1` to relocate the selected columns just before the last
column, or `after = -1` to relocate them after the last column.

```{r eval = evaluate_chunk}
Expand All @@ -622,10 +625,10 @@
### Longer

Reshaping data from wide to long or from long to wide format can be done with
`data_to_long()` and `data_to_wide()`. These functions were designed to match
`tidyr::pivot_longer()` and `tidyr::pivot_wider()` arguments, so that the only
thing to do is to change the function name. However, not all of
`tidyr::pivot_longer()` and `tidyr::pivot_wider()` features are available yet.
`data_to_long()` and `data_to_wide()`. These functions were designed to match
`tidyr::pivot_longer()` and `tidyr::pivot_wider()` arguments, so that the only
thing to do is to change the function name. However, not all of
`tidyr::pivot_longer()` and `tidyr::pivot_wider()` features are available yet.

We will use the `relig_income` dataset, as in the [`{tidyr}` vignette](https://tidyr.tidyverse.org/articles/pivot.html).

Expand All @@ -634,11 +637,11 @@
```


We would like to reshape this dataset to have 3 columns: religion, count, and
income. The column "religion" doesn't need to change, so we exclude it with
`-religion`. Then, each remaining column corresponds to an income category.
Therefore, we want to move all these column names to a single column called
"income". Finally, the values corresponding to each of these columns will be
We would like to reshape this dataset to have 3 columns: religion, count, and
income. The column "religion" doesn't need to change, so we exclude it with
`-religion`. Then, each remaining column corresponds to an income category.
Therefore, we want to move all these column names to a single column called
"income". Finally, the values corresponding to each of these columns will be
reshaped to be in a single new column, called "count".

:::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
Expand Down Expand Up @@ -765,12 +768,12 @@

<!-- explain a bit more the args of data_join -->

In `{datawizard}`, joining datasets is done with `data_join()` (or its alias
`data_merge()`). Contrary to `{dplyr}`, this unique function takes care of all
In `{datawizard}`, joining datasets is done with `data_join()` (or its alias
`data_merge()`). Contrary to `{dplyr}`, this unique function takes care of all
types of join, which are then specified inside the function with the argument
`join` (by default, `join = "left"`).

Below, we show how to perform the four most common joins: full, left, right and
Below, we show how to perform the four most common joins: full, left, right and
inner. We will use the datasets `band_members`and `band_instruments` provided by `{dplyr}`:

:::: {style="display: grid; grid-template-columns: 50% 50%; grid-column-gap: 10px;"}
Expand Down Expand Up @@ -935,7 +938,7 @@
)
```
:::

::: {}

```{r, class.source = "tidyverse"}
Expand All @@ -948,7 +951,7 @@
)
```
:::

::::

```{r unite1, eval = evaluate_chunk, echo = FALSE}
Expand All @@ -969,7 +972,7 @@
)
```
:::

::: {}

```{r, class.source = "tidyverse"}
Expand All @@ -983,7 +986,7 @@
)
```
:::

::::

```{r unite2, eval = evaluate_chunk, echo = FALSE}
Expand Down Expand Up @@ -1017,7 +1020,7 @@
)
```
:::

::: {}

```{r, class.source = "tidyverse"}
Expand All @@ -1029,7 +1032,7 @@
)
```
:::

::::

```{r separate1, eval = evaluate_chunk, echo = FALSE}
Expand All @@ -1051,9 +1054,9 @@

# Other useful functions

`{datawizard}` contains other functions that are not necessarily included in
`{dplyr}` or `{tidyr}` or do not directly modify the data. Some of them are
inspired from the package `janitor`.
`{datawizard}` contains other functions that are not necessarily included in
`{dplyr}` or `{tidyr}` or do not directly modify the data. Some of them are
inspired from the package `janitor`.

## Work with rownames

Expand All @@ -1079,7 +1082,7 @@
The main difference is when we use it with grouped data. While `tibble::rowid_to_column()`
uses one distinct rowid for every row in the dataset, `rowid_as_column()` creates
one id for every row *in each group*. Therefore, two rows in different groups
can have the same row id.
can have the same row id.

This means that `rowid_as_column()` is closer to using `n()` in `mutate()`, like
the following:
Expand Down
Loading