Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow glue-styled pattern for data_rename() #563

Merged
merged 13 commits into from
Nov 27, 2024
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: datawizard
Title: Easy Data Wrangling and Statistical Transformations
Version: 0.13.0.13
Version: 0.13.0.14
Authors@R: c(
person("Indrajeet", "Patil", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0003-1995-6531")),
Expand Down
7 changes: 5 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,11 @@ CHANGES
* `data_read()` no longer shows warning about forthcoming breaking changes
in upstream packages when reading `.RData` files.

* `data_modify()` now recognizes `n()`, for example to create an index for data groups
with `1:n()` (#535).
* `data_modify()` now recognizes `n()`, for example to create an index for data
groups with `1:n()` (#535).

* The `replacement` argument in `data_rename()` now supports glue-styled
tokens (#563).

BUG FIXES

Expand Down
160 changes: 143 additions & 17 deletions R/data_rename.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,43 @@
#' pipe-workflow.
#'
#' @param data A data frame, or an object that can be coerced to a data frame.
#' @param pattern Character vector. For `data_rename()`, indicates columns that
#' should be selected for renaming. Can be `NULL` (in which case all columns
#' are selected). For `data_addprefix()` or `data_addsuffix()`, a character
#' string, which will be added as prefix or suffix to the column names. For
#' `data_rename()`, `pattern` can also be a named vector. In this case, names
#' are used as values for the `replacement` argument (i.e. `pattern` can be a
#' character vector using `<new name> = "<old name>"` and argument `replacement`
#' will be ignored then).
#' @param replacement Character vector. Indicates the new name of the columns
#' selected in `pattern`. Can be `NULL` (in which case column are numbered
#' in sequential order). If not `NULL`, `pattern` and `replacement` must be
#' of the same length. If `pattern` is a named vector, `replacement` is ignored.
#' @param pattern Character vector.
#' - For `data_addprefix()` or `data_addsuffix()`, a character string, which
#' will be added as prefix or suffix to the column names.
#' - For `data_rename()`, indicates columns that should be selected for
#' renaming. Can be `NULL` (in which case all columns are selected).
#' `pattern` can also be a named vector. In this case, names are used as
#' values for the `replacement` argument (i.e. `pattern` can be a character
#' vector using `<new name> = "<old name>"` and argument `replacement` will
#' be ignored then).
#' @param replacement Character vector. Can be one of the following:
#' - A character vector that indicates the new names of the columns selected
#' in `pattern`. `pattern` and `replacement` must be of the same length.
#' - `NULL`, in which case columns are numbered in sequential order.
#' - A string (i.e. character vector of length 1) with a "glue" styled pattern.
#' Currently supported tokens are:
#' - `{col}` which will be replaced by the column name, i.e. the
#' corresponding value in `pattern`.
#' - `{n}` will be replaced by the number of the variable that is replaced.
#' - `{letter}` will be replaced by alphabetical letters in sequential order.
strengejacke marked this conversation as resolved.
Show resolved Hide resolved
#' If more than 26 letters are required, letters are repeated, but have
#' seqential numeric indices (e.g., `a1` to `z1`, followed by `a2` to `z2`).
#' - Finally, the name of a user-defined object that is available in the
#' environment can be used. Note that the object's name is not allowed to
#' be one of the pre-defined tokens, `"col"`, `"n"` and `"letter"`.
#'
#' An example for the use of tokens is...
#' ```r
#' data_rename(
#' mtcars,
#' pattern = c("am", "vs"),
#' replacement = "new_name_from_{col}"
#' )
#' ```
#' ... which would return new column names `new_name_from_am` and
#' `new_name_from_vs`. See 'Examples'.
#'
#' If `pattern` is a named vector, `replacement` is ignored.
#' @param rows Vector of row names.
#' @param safe Do not throw error if for instance the variable to be
#' renamed/removed doesn't exist.
Expand All @@ -45,13 +70,26 @@
#'
#' # Change all
#' head(data_rename(iris, replacement = paste0("Var", 1:5)))
#'
#' # Use glue-styled patterns
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "formerly_{col}"))
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "{col}_is_column_{n}"))
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "new_{letter}"))
#'
#' # User-defined glue-styled patterns from objects in environment
#' x <- c("hi", "there", "!")
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "col_{x}"))
#' @seealso
#' - Functions to rename stuff: [data_rename()], [data_rename_rows()], [data_addprefix()], [data_addsuffix()]
#' - Functions to reorder or remove columns: [data_reorder()], [data_relocate()], [data_remove()]
#' - Functions to reshape, pivot or rotate data frames: [data_to_long()], [data_to_wide()], [data_rotate()]
#' - Functions to rename stuff: [data_rename()], [data_rename_rows()],
#' [data_addprefix()], [data_addsuffix()]
#' - Functions to reorder or remove columns: [data_reorder()], [data_relocate()],
#' [data_remove()]
#' - Functions to reshape, pivot or rotate data frames: [data_to_long()],
#' [data_to_wide()], [data_rotate()]
#' - Functions to recode data: [rescale()], [reverse()], [categorize()],
#' [recode_values()], [slide()]
#' - Functions to standardize, normalize, rank-transform: [center()], [standardize()], [normalize()], [ranktransform()], [winsorize()]
#' - Functions to standardize, normalize, rank-transform: [center()], [standardize()],
#' [normalize()], [ranktransform()], [winsorize()]
#' - Split and merge data frames: [data_partition()], [data_merge()]
#' - Functions to find or select columns: [data_select()], [extract_column_names()]
#' - Functions to filter rows: [data_match()], [data_filter()]
Expand Down Expand Up @@ -122,14 +160,17 @@ data_rename <- function(data,
}
}

# check if we have "glue" styled replacement-string
glue_style <- length(replacement) == 1 && grepl("{", replacement, fixed = TRUE)

if (length(replacement) > length(pattern) && verbose) {
insight::format_alert(
paste0(
"There are more names in `replacement` than in `pattern`. The last ",
length(replacement) - length(pattern), " names of `replacement` are not used."
)
)
} else if (length(replacement) < length(pattern) && verbose) {
} else if (length(replacement) < length(pattern) && verbose && !glue_style) {
insight::format_alert(
paste0(
"There are more names in `pattern` than in `replacement`. The last ",
Expand All @@ -138,6 +179,11 @@ data_rename <- function(data,
)
}

# if we have glue-styled replacement-string, create replacement pattern now
if (glue_style) {
replacement <- .glue_replacement(pattern, replacement)
}

for (i in seq_along(pattern)) {
if (!is.na(replacement[i])) {
data <- .data_rename(data, pattern[i], replacement[i], safe, verbose)
Expand Down Expand Up @@ -167,6 +213,86 @@ data_rename <- function(data,
}


.glue_replacement <- function(pattern, replacement) {
# this function replaces "glue" tokens into their related
# real names/values. Currently, following tokens are accepted:
# - {col}: replacement is the name of the column (indicated in "pattern")
# - {letter}: replacement is lower-case alphabetically letter, in sequential order
# - {n}: replacement is the number of the variable out of n, that should be renamed
strengejacke marked this conversation as resolved.
Show resolved Hide resolved
out <- rep_len("", length(pattern))

# for alphabetical letters, we prepare a string if we have more than
# 26 columns to rename
if (length(out) > 26) {
long_letters <- paste0(
rep.int(letters[1:26], times = ceiling(length(out) / 26)),
rep(1:ceiling(length(out) / 26), each = 26)
)
} else {
long_letters <- letters[1:26]
}
long_letters <- long_letters[seq_len(length(out))]

for (i in seq_along(out)) {
# prepare pattern
column_name <- pattern[i]
out[i] <- replacement
# replace first pre-defined token
out[i] <- gsub(
"(.*)(\\{col\\})(.*)",
replacement = paste0("\\1", column_name, "\\3"),
x = out[i]
)
# replace second pre-defined token
out[i] <- gsub(
"(.*)(\\{n\\})(.*)",
replacement = paste0("\\1", i, "\\3"),
x = out[i]
)
# replace third pre-defined token
out[i] <- gsub(
"(.*)(\\{letter\\})(.*)",
replacement = paste0("\\1", long_letters[i], "\\3"),
x = out[i]
)
# extract all non-standard tokens
matches <- unlist(
regmatches(out[i], gregexpr("\\{([^}]*)\\}", out[i])),
use.names = FALSE
)
# do we have any additional tokens, i.e. variable names from the environment?
# users can also specify variable names, where the
if (length(matches)) {
# if so, iterate all tokens
for (token in matches) {
# evaluate token-object from the environment
values <- tryCatch(
.dynEval(str2lang(gsub("\\{(.*)\\}", "\\1", token))),
error = function(e) {
insight::format_error(paste0(
"The object `", token, "` was not found. Please check if it really exists."
))
}
)
strengejacke marked this conversation as resolved.
Show resolved Hide resolved
# check for correct length
if (length(values) != length(pattern)) {
insight::format_error(paste0(
"The number of values provided in `", token, "` (", length(values),
" values) do not match the number of columns to rename (",
length(pattern), " columns)."
))
}
# replace token with values from the object
if (length(values)) {
out[i] <- gsub(token, values[i], out[i], fixed = TRUE)
}
}
}
}
out
}


# Row.names ----------------------------------------------------------------

#' @rdname data_rename
Expand Down
12 changes: 8 additions & 4 deletions man/categorize.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 8 additions & 4 deletions man/data_match.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 8 additions & 4 deletions man/data_merge.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 8 additions & 4 deletions man/data_partition.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading