Skip to content

Commit

Permalink
Revamp data_rename() (#568)
Browse files Browse the repository at this point in the history
* init

* update wordlist

* redoc

* fix vignette

* doc for text_format

* comments

* pkgdown

* lints

* seealso

* pkgdown

* do not use column indices as replacement

* forbid partially named vector

* add arg ifnotfound in select_nse()

* simplify

* simplify again [skip ci]

* fix tests and examples

* lint

* use dev styler

* fix
  • Loading branch information
etiennebacher authored Dec 2, 2024
1 parent 81dd0e0 commit 0512212
Show file tree
Hide file tree
Showing 27 changed files with 395 additions and 330 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: datawizard
Title: Easy Data Wrangling and Statistical Transformations
Version: 0.13.0.15
Version: 0.13.0.16
Authors@R: c(
person("Indrajeet", "Patil", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0003-1995-6531")),
Expand Down
17 changes: 13 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
# datawizard (development)

BREAKING CHANGES

* Argument `drop_na` in `data_match()` is deprecated now. Please use `remove_na`
instead.
BREAKING CHANGES AND DEPRECATIONS

* Argument `drop_na` in `data_match()` is deprecated now. Please use
`remove_na` instead.

* In `data_rename()` (#567):
- argument `pattern` is deprecated. Use `select` instead.
- argument `safe` is deprecated. The function now errors when `select`
contains unknown column names.
- when `replacement` is `NULL`, an error is now thrown (previously, column
indices were used as new names).
- if `select` (previously `pattern`) is a named vector, then all elements
must be named, e.g. `c(length = "Sepal.Length", "Sepal.Width")` errors.

CHANGES

Expand Down
12 changes: 10 additions & 2 deletions R/data_addprefix.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
#' @rdname data_rename
#' Add a prefix or suffix to column names
#'
#' @rdname data_prefix_suffix
#' @inheritParams extract_column_names
#' @param pattern A character string, which will be added as prefix or suffix
#' to the column names.
#' @param ... Other arguments passed to or from other functions.
#'
#' @seealso
#' [data_rename()] for more fine-grained column renaming.
#' @examples
#' # Add prefix / suffix to all columns
#' head(data_addprefix(iris, "NEW_"))
Expand Down Expand Up @@ -29,7 +37,7 @@ data_addprefix <- function(data,
}


#' @rdname data_rename
#' @rdname data_prefix_suffix
#' @export
data_addsuffix <- function(data,
pattern,
Expand Down
143 changes: 70 additions & 73 deletions R/data_rename.R
Original file line number Diff line number Diff line change
@@ -1,36 +1,27 @@
#' @title Rename columns and variable names
#' @name data_rename
#'
#' @description Safe and intuitive functions to rename variables or rows in
#' data frames. `data_rename()` will rename column names, i.e. it facilitates
#' renaming variables `data_addprefix()` or `data_addsuffix()` add prefixes
#' or suffixes to column names. `data_rename_rows()` is a convenient shortcut
#' renaming variables. `data_rename_rows()` is a convenient shortcut
#' to add or rename row names of a data frame, but unlike `row.names()`, its
#' input and output is a data frame, thus, integrating smoothly into a possible
#' pipe-workflow.
#' input and output is a data frame, thus, integrating smoothly into a
#' possible pipe-workflow.
#'
#' @inheritParams extract_column_names
#' @param data A data frame, or an object that can be coerced to a data frame.
#' @param pattern Character vector.
#' - For `data_addprefix()` or `data_addsuffix()`, a character string, which
#' will be added as prefix or suffix to the column names.
#' - For `data_rename()`, indicates columns that should be selected for
#' renaming. Can be `NULL` (in which case all columns are selected).
#' `pattern` can also be a named vector. In this case, names are used as
#' values for the `replacement` argument (i.e. `pattern` can be a character
#' vector using `<new name> = "<old name>"` and argument `replacement` will
#' be ignored then).
#' @param replacement Character vector. Can be one of the following:
#' - A character vector that indicates the new names of the columns selected
#' in `pattern`. `pattern` and `replacement` must be of the same length.
#' - `NULL`, in which case columns are numbered in sequential order.
#' - A string (i.e. character vector of length 1) with a "glue" styled pattern.
#' Currently supported tokens are:
#' in `select`. `select` and `replacement` must be of the same length.
#' - A string (i.e. character vector of length 1) with a "glue" styled
#' pattern. Currently supported tokens are:
#' - `{col}` which will be replaced by the column name, i.e. the
#' corresponding value in `pattern`.
#' corresponding value in `select`.
#' - `{n}` will be replaced by the number of the variable that is replaced.
#' - `{letter}` will be replaced by alphabetical letters in sequential order.
#' - `{letter}` will be replaced by alphabetical letters in sequential
#' order.
#' If more than 26 letters are required, letters are repeated, but have
#' sequential numeric indices (e.g., `a1` to `z1`, followed by `a2` to `z2`).
#' sequential numeric indices (e.g., `a1` to `z1`, followed by `a2` to
#' `z2`).
#' - Finally, the name of a user-defined object that is available in the
#' environment can be used. Note that the object's name is not allowed to
#' be one of the pre-defined tokens, `"col"`, `"n"` and `"letter"`.
Expand All @@ -39,35 +30,32 @@
#' ```r
#' data_rename(
#' mtcars,
#' pattern = c("am", "vs"),
#' select = c("am", "vs"),
#' replacement = "new_name_from_{col}"
#' )
#' ```
#' ... which would return new column names `new_name_from_am` and
#' `new_name_from_vs`. See 'Examples'.
#'
#' If `pattern` is a named vector, `replacement` is ignored.
#' If `select` is a named vector, `replacement` is ignored.
#' @param rows Vector of row names.
#' @param safe Do not throw error if for instance the variable to be
#' renamed/removed doesn't exist.
#' @param verbose Toggle warnings and messages.
#' @param safe Deprecated. Passing unknown column names now always errors.
#' @param pattern Deprecated. Use `select` instead.
#' @param ... Other arguments passed to or from other functions.
#'
#' @details
#' `select` can also be a named character vector. In this case, the names are
#' used to rename the columns in the output data frame. See 'Examples'.
#'
#' @return A modified data frame.
#'
#' @examples
#' # Rename columns
#' head(data_rename(iris, "Sepal.Length", "length"))
#' # data_rename(iris, "FakeCol", "length", safe=FALSE) # This fails
#' head(data_rename(iris, "FakeCol", "length")) # This doesn't
#' head(data_rename(iris, c("Sepal.Length", "Sepal.Width"), c("length", "width")))
#'
#' # use named vector to rename
#' head(data_rename(iris, c(length = "Sepal.Length", width = "Sepal.Width")))
#'
#' # Reset names
#' head(data_rename(iris, NULL))
#'
#' # Change all
#' head(data_rename(iris, replacement = paste0("Var", 1:5)))
#'
Expand All @@ -80,8 +68,7 @@
#' x <- c("hi", "there", "!")
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "col_{x}"))
#' @seealso
#' - Functions to rename stuff: [data_rename()], [data_rename_rows()],
#' [data_addprefix()], [data_addsuffix()]
#' - Add a prefix or suffix to column names: [data_addprefix()], [data_addsuffix()]
#' - Functions to reorder or remove columns: [data_reorder()], [data_relocate()],
#' [data_remove()]
#' - Functions to reshape, pivot or rotate data frames: [data_to_long()],
Expand All @@ -96,28 +83,48 @@
#'
#' @export
data_rename <- function(data,
pattern = NULL,
select = NULL,
replacement = NULL,
safe = TRUE,
verbose = TRUE,
pattern = NULL,
...) {
# change all names if no pattern specified
if (is.null(pattern)) {
pattern <- names(data)
# If the user does data_rename(iris, pattern = "Sepal.Length", "length"),
# then "length" is matched to select by position while it's the replacement
# => do the switch manually
if (!is.null(pattern)) {
.is_deprecated("pattern", "select")
if (!is.null(select)) {
replacement <- select
}
select <- pattern
}

if (!is.character(pattern)) {
insight::format_error("Argument `pattern` must be of type character.")
if (isFALSE(safe)) {
insight::format_warning("In `data_rename()`, argument `safe` is no longer used and will be removed in a future release.") # nolint
}

# check if `pattern` has names, and if so, use as "replacement"
if (!is.null(names(pattern))) {
replacement <- names(pattern)
# change all names if no pattern specified
select <- .select_nse(
select,
data,
exclude = NULL,
ignore_case = NULL,
regex = NULL,
allow_rename = TRUE,
verbose = verbose,
ifnotfound = "error"
)

# Forbid partially named "select",
# Ex: if select = c("foo" = "Species", "Sepal.Length") then the 2nd name and
# 2nd value are "Sepal.Length"
if (!is.null(names(select)) && any(names(select) == select)) {
insight::format_error("When `select` is a named vector, all elements must be named.")
}

# name columns 1, 2, 3 etc. if no replacement
if (is.null(replacement)) {
replacement <- paste0(seq_along(pattern))
# check if `select` has names, and if so, use as "replacement"
if (!is.null(names(select))) {
replacement <- names(select)
}

# coerce to character
Expand All @@ -126,22 +133,22 @@ data_rename <- function(data,
# check if `replacement` has no empty strings and no NA values
invalid_replacement <- is.na(replacement) | !nzchar(replacement)
if (any(invalid_replacement)) {
if (is.null(names(pattern))) {
# when user did not match `pattern` with `replacement`
if (is.null(names(select))) {
# when user did not match `select` with `replacement`
msg <- c(
"`replacement` is not allowed to have `NA` or empty strings.",
sprintf(
"Following values in `pattern` have no match in `replacement`: %s",
toString(pattern[invalid_replacement])
"Following values in `select` have no match in `replacement`: %s",
toString(select[invalid_replacement])
)
)
} else {
# when user did not name all elements of `pattern`
# when user did not name all elements of `select`
msg <- c(
"Either name all elements of `pattern` or use `replacement`.",
"Either name all elements of `select` or use `replacement`.",
sprintf(
"Following values in `pattern` were not named: %s",
toString(pattern[invalid_replacement])
"Following values in `select` were not named: %s",
toString(select[invalid_replacement])
)
)
}
Expand All @@ -163,30 +170,20 @@ data_rename <- function(data,
# check if we have "glue" styled replacement-string
glue_style <- length(replacement) == 1 && grepl("{", replacement, fixed = TRUE)

if (length(replacement) > length(pattern) && verbose) {
insight::format_alert(
paste0(
"There are more names in `replacement` than in `pattern`. The last ",
length(replacement) - length(pattern), " names of `replacement` are not used."
)
)
} else if (length(replacement) < length(pattern) && verbose && !glue_style) {
insight::format_alert(
paste0(
"There are more names in `pattern` than in `replacement`. The last ",
length(pattern) - length(replacement), " names of `pattern` are not modified."
)
)
if (length(replacement) > length(select)) {
insight::format_error("There are more names in `replacement` than in `select`.")
} else if (length(replacement) < length(select) && !glue_style) {
insight::format_error("There are more names in `select` than in `replacement`")
}

# if we have glue-styled replacement-string, create replacement pattern now
# if we have glue-styled replacement-string, create replacement select now
if (glue_style) {
replacement <- .glue_replacement(pattern, replacement)
replacement <- .glue_replacement(select, replacement)
}

for (i in seq_along(pattern)) {
for (i in seq_along(select)) {
if (!is.na(replacement[i])) {
data <- .data_rename(data, pattern[i], replacement[i], safe, verbose)
data <- .data_rename(data, select[i], replacement[i], safe, verbose)
}
}

Expand Down
Loading

1 comment on commit 0512212

@strengejacke
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the pkgdown action was not triggered? Still the old docs.

Please sign in to comment.