Skip to content

Commit

Permalink
Merge branch 'main' into docs_smoothness
Browse files Browse the repository at this point in the history
  • Loading branch information
strengejacke authored Nov 27, 2024
2 parents 1ea2c3d + 5463c93 commit 6a91882
Show file tree
Hide file tree
Showing 49 changed files with 667 additions and 280 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: datawizard
Title: Easy Data Wrangling and Statistical Transformations
Version: 0.13.0.13
Version: 0.13.0.15
Authors@R: c(
person("Indrajeet", "Patil", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0003-1995-6531")),
Expand Down
10 changes: 8 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,14 @@ CHANGES
* `data_read()` no longer shows warning about forthcoming breaking changes
in upstream packages when reading `.RData` files.

* `data_modify()` now recognizes `n()`, for example to create an index for data groups
with `1:n()` (#535).
* `data_modify()` now recognizes `n()`, for example to create an index for data
groups with `1:n()` (#535).

* The `replacement` argument in `data_rename()` now supports glue-styled
tokens (#563).

* `data_summary()` also accepts the results of `bayestestR::ci()` as summary
function (#483).

BUG FIXES

Expand Down
158 changes: 141 additions & 17 deletions R/data_rename.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,43 @@
#' pipe-workflow.
#'
#' @param data A data frame, or an object that can be coerced to a data frame.
#' @param pattern Character vector. For `data_rename()`, indicates columns that
#' should be selected for renaming. Can be `NULL` (in which case all columns
#' are selected). For `data_addprefix()` or `data_addsuffix()`, a character
#' string, which will be added as prefix or suffix to the column names. For
#' `data_rename()`, `pattern` can also be a named vector. In this case, names
#' are used as values for the `replacement` argument (i.e. `pattern` can be a
#' character vector using `<new name> = "<old name>"` and argument `replacement`
#' will be ignored then).
#' @param replacement Character vector. Indicates the new name of the columns
#' selected in `pattern`. Can be `NULL` (in which case column are numbered
#' in sequential order). If not `NULL`, `pattern` and `replacement` must be
#' of the same length. If `pattern` is a named vector, `replacement` is ignored.
#' @param pattern Character vector.
#' - For `data_addprefix()` or `data_addsuffix()`, a character string, which
#' will be added as prefix or suffix to the column names.
#' - For `data_rename()`, indicates columns that should be selected for
#' renaming. Can be `NULL` (in which case all columns are selected).
#' `pattern` can also be a named vector. In this case, names are used as
#' values for the `replacement` argument (i.e. `pattern` can be a character
#' vector using `<new name> = "<old name>"` and argument `replacement` will
#' be ignored then).
#' @param replacement Character vector. Can be one of the following:
#' - A character vector that indicates the new names of the columns selected
#' in `pattern`. `pattern` and `replacement` must be of the same length.
#' - `NULL`, in which case columns are numbered in sequential order.
#' - A string (i.e. character vector of length 1) with a "glue" styled pattern.
#' Currently supported tokens are:
#' - `{col}` which will be replaced by the column name, i.e. the
#' corresponding value in `pattern`.
#' - `{n}` will be replaced by the number of the variable that is replaced.
#' - `{letter}` will be replaced by alphabetical letters in sequential order.
#' If more than 26 letters are required, letters are repeated, but have
#' sequential numeric indices (e.g., `a1` to `z1`, followed by `a2` to `z2`).
#' - Finally, the name of a user-defined object that is available in the
#' environment can be used. Note that the object's name is not allowed to
#' be one of the pre-defined tokens, `"col"`, `"n"` and `"letter"`.
#'
#' An example for the use of tokens is...
#' ```r
#' data_rename(
#' mtcars,
#' pattern = c("am", "vs"),
#' replacement = "new_name_from_{col}"
#' )
#' ```
#' ... which would return new column names `new_name_from_am` and
#' `new_name_from_vs`. See 'Examples'.
#'
#' If `pattern` is a named vector, `replacement` is ignored.
#' @param rows Vector of row names.
#' @param safe Do not throw error if for instance the variable to be
#' renamed/removed doesn't exist.
Expand All @@ -45,13 +70,26 @@
#'
#' # Change all
#' head(data_rename(iris, replacement = paste0("Var", 1:5)))
#'
#' # Use glue-styled patterns
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "formerly_{col}"))
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "{col}_is_column_{n}"))
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "new_{letter}"))
#'
#' # User-defined glue-styled patterns from objects in environment
#' x <- c("hi", "there", "!")
#' head(data_rename(mtcars[1:3], c("mpg", "cyl", "disp"), "col_{x}"))
#' @seealso
#' - Functions to rename stuff: [data_rename()], [data_rename_rows()], [data_addprefix()], [data_addsuffix()]
#' - Functions to reorder or remove columns: [data_reorder()], [data_relocate()], [data_remove()]
#' - Functions to reshape, pivot or rotate data frames: [data_to_long()], [data_to_wide()], [data_rotate()]
#' - Functions to rename stuff: [data_rename()], [data_rename_rows()],
#' [data_addprefix()], [data_addsuffix()]
#' - Functions to reorder or remove columns: [data_reorder()], [data_relocate()],
#' [data_remove()]
#' - Functions to reshape, pivot or rotate data frames: [data_to_long()],
#' [data_to_wide()], [data_rotate()]
#' - Functions to recode data: [rescale()], [reverse()], [categorize()],
#' [recode_values()], [slide()]
#' - Functions to standardize, normalize, rank-transform: [center()], [standardize()], [normalize()], [ranktransform()], [winsorize()]
#' - Functions to standardize, normalize, rank-transform: [center()], [standardize()],
#' [normalize()], [ranktransform()], [winsorize()]
#' - Split and merge data frames: [data_partition()], [data_merge()]
#' - Functions to find or select columns: [data_select()], [extract_column_names()]
#' - Functions to filter rows: [data_match()], [data_filter()]
Expand Down Expand Up @@ -122,14 +160,17 @@ data_rename <- function(data,
}
}

# check if we have "glue" styled replacement-string
glue_style <- length(replacement) == 1 && grepl("{", replacement, fixed = TRUE)

if (length(replacement) > length(pattern) && verbose) {
insight::format_alert(
paste0(
"There are more names in `replacement` than in `pattern`. The last ",
length(replacement) - length(pattern), " names of `replacement` are not used."
)
)
} else if (length(replacement) < length(pattern) && verbose) {
} else if (length(replacement) < length(pattern) && verbose && !glue_style) {
insight::format_alert(
paste0(
"There are more names in `pattern` than in `replacement`. The last ",
Expand All @@ -138,6 +179,11 @@ data_rename <- function(data,
)
}

# if we have glue-styled replacement-string, create replacement pattern now
if (glue_style) {
replacement <- .glue_replacement(pattern, replacement)
}

for (i in seq_along(pattern)) {
if (!is.na(replacement[i])) {
data <- .data_rename(data, pattern[i], replacement[i], safe, verbose)
Expand Down Expand Up @@ -167,6 +213,84 @@ data_rename <- function(data,
}


.glue_replacement <- function(pattern, replacement) {
# this function replaces "glue" tokens into their related
# real names/values. Currently, following tokens are accepted:
# - {col}: replacement is the name of the column (indicated in "pattern")
# - {letter}: replacement is lower-case alphabetically letter, in sequential order
# - {n}: replacement is the number of the variable out of n, that should be renamed
out <- rep_len("", length(pattern))

# for alphabetical letters, we prepare a string if we have more than
# 26 columns to rename
if (length(out) > 26) {
long_letters <- paste0(
rep.int(letters[1:26], times = ceiling(length(out) / 26)),
rep(1:ceiling(length(out) / 26), each = 26)
)
} else {
long_letters <- letters[1:26]
}
long_letters <- long_letters[seq_len(length(out))]

for (i in seq_along(out)) {
# prepare pattern
column_name <- pattern[i]
out[i] <- replacement
# replace first pre-defined token
out[i] <- gsub(
"(.*)(\\{col\\})(.*)",
replacement = paste0("\\1", column_name, "\\3"),
x = out[i]
)
# replace second pre-defined token
out[i] <- gsub(
"(.*)(\\{n\\})(.*)",
replacement = paste0("\\1", i, "\\3"),
x = out[i]
)
# replace third pre-defined token
out[i] <- gsub(
"(.*)(\\{letter\\})(.*)",
replacement = paste0("\\1", long_letters[i], "\\3"),
x = out[i]
)
# extract all non-standard tokens
matches <- unlist(
regmatches(out[i], gregexpr("\\{([^}]*)\\}", out[i])),
use.names = FALSE
)
# do we have any additional tokens, i.e. variable names from the environment?
# users can also specify variable names, where the
if (length(matches)) {
# if so, iterate all tokens
for (token in matches) {
# evaluate token-object from the environment
values <- .dynEval(
str2lang(gsub("\\{(.*)\\}", "\\1", token)),
ifnotfound = insight::format_error(paste0(
"The object `", token, "` was not found. Please check if it really exists."
))
)
# check for correct length
if (length(values) != length(pattern)) {
insight::format_error(paste0(
"The number of values provided in `", token, "` (", length(values),
" values) do not match the number of columns to rename (",
length(pattern), " columns)."
))
}
# replace token with values from the object
if (length(values)) {
out[i] <- gsub(token, values[i], out[i], fixed = TRUE)
}
}
}
}
out
}


# Row.names ----------------------------------------------------------------

#' @rdname data_rename
Expand Down
27 changes: 19 additions & 8 deletions R/data_summary.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
#' data frame or a matrix.
#'
#' @param x A (grouped) data frame.
#' @param by Optional character string, indicating the name of a variable in `x`.
#' If supplied, the data will be split by this variable and summary statistics
#' will be computed for each group.
#' @param by Optional character string, indicating the names of one or more
#' variables in the data frame. If supplied, the data will be split by these
#' variables and summary statistics will be computed for each group.
#' @param remove_na Logical. If `TRUE`, missing values are omitted from the
#' grouping variable. If `FALSE` (default), missing values are included as a
#' level in the grouping variable.
Expand Down Expand Up @@ -122,7 +122,7 @@ data_summary.data.frame <- function(x, ..., by = NULL, remove_na = FALSE) {
# bind grouping-variables and values
summarised_data <- cbind(s[1, by], summarised_data)
# make sure we have proper column names
colnames(summarised_data) <- c(by, vapply(summarise, names, character(1)))
colnames(summarised_data) <- c(by, unlist(lapply(summarise, names)))
summarised_data
})
out <- do.call(rbind, out)
Expand Down Expand Up @@ -187,18 +187,24 @@ data_summary.grouped_df <- function(x, ..., by = NULL, remove_na = FALSE) {

out <- lapply(seq_along(dots), function(i) {
new_variable <- .get_new_dots_variable(dots, i, data)
stats::setNames(new_variable, names(dots)[i])
if (inherits(new_variable, c("bayestestR_ci", "bayestestR_eti"))) {
stats::setNames(new_variable, c("CI", "CI_low", "CI_high"))
} else {
stats::setNames(new_variable, names(dots)[i])
}
})
}

# check for correct length of output - must be a single value!
if (any(lengths(out) != 1)) {
# Exception: bayestestR::ci()
wrong_length <- !sapply(out, inherits, what = c("bayestestR_ci", "bayestestR_eti")) & lengths(out) != 1 # nolint
if (any(wrong_length)) {
insight::format_error(
paste0(
"Each expression must return a single value. Following expression",
ifelse(sum(lengths(out) != 1) > 1, "s", " "),
ifelse(sum(wrong_length) > 1, "s", " "),
" returned more than one value: ",
text_concatenate(vapply(dots[lengths(out) != 1], insight::safe_deparse, character(1)), enclose = "\"")
text_concatenate(vapply(dots[wrong_length], insight::safe_deparse, character(1)), enclose = "\"")
)
)
}
Expand All @@ -214,6 +220,11 @@ print.dw_data_summary <- function(x, ...) {
if (nrow(x) == 0) {
cat("No matches found.\n")
} else {
if (all(c("CI", "CI_low", "CI_high") %in% colnames(x))) {
ci <- insight::format_table(x[c("CI", "CI_low", "CI_high")], ...)
x$CI <- x$CI_low <- x$CI_high <- NULL
x <- cbind(x, ci)
}
cat(insight::export_table(x, missing = "<NA>", ...))
}
}
16 changes: 10 additions & 6 deletions R/extract_column_names.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,16 @@
#' - a vector of negative integers, giving the positions counting from the
#' right (e.g., `-1` or `-1:-3`),
#' - one of the following select-helpers: `starts_with()`, `ends_with()`,
#' `contains()`, a range using `:` or `regex("")`. `starts_with()`,
#' `contains()`, a range using `:`, or `regex()`. `starts_with()`,
#' `ends_with()`, and `contains()` accept several patterns, e.g
#' `starts_with("Sep", "Petal")`.
#' - or a function testing for logical conditions, e.g. `is.numeric()` (or
#' `starts_with("Sep", "Petal")`. `regex()` can be used to define regular
#' expression patterns.
#' - a function testing for logical conditions, e.g. `is.numeric()` (or
#' `is.numeric`), or any user-defined function that selects the variables
#' for which the function returns `TRUE` (like: `foo <- function(x) mean(x) > 3`),
#' - ranges specified via literal variable names, select-helpers (except
#' `regex()`) and (user-defined) functions can be negated, i.e. return
#' non-matching elements, when prefixed with a `-`, e.g. `-ends_with("")`,
#' non-matching elements, when prefixed with a `-`, e.g. `-ends_with()`,
#' `-is.numeric` or `-(Sepal.Width:Petal.Length)`. **Note:** Negation means
#' that matches are _excluded_, and thus, the `exclude` argument can be
#' used alternatively. For instance, `select=-ends_with("Length")` (with
Expand All @@ -48,7 +49,7 @@
#' character string (or a variable containing a character string) and is not
#' allowed to be one of the supported select-helpers or a character vector
#' of length > 1. `regex = TRUE` is comparable to using one of the two
#' select-helpers, `select = contains("")` or `select = regex("")`, however,
#' select-helpers, `select = contains()` or `select = regex()`, however,
#' since the select-helpers may not work when called from inside other
#' functions (see 'Details'), this argument may be used as workaround.
#' @param verbose Toggle warnings.
Expand Down Expand Up @@ -131,7 +132,10 @@
#' numeric_mean_35 <- function(x) is.numeric(x) && mean(x, na.rm = TRUE) > 3.5
#' extract_column_names(iris, numeric_mean_35)
#'
#' # find range of colum names by range, using character vector
#' # find column names, using range
#' extract_column_names(mtcars, c(cyl:hp, wt))
#'
#' # find range of column names by range, using character vector
#' extract_column_names(mtcars, c("cyl:hp", "wt"))
#'
#' # rename returned columns for "data_select()"
Expand Down
11 changes: 6 additions & 5 deletions man/adjust.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 6 additions & 5 deletions man/assign_labels.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 6a91882

Please sign in to comment.