Skip to content

Commit

Permalink
also improve data_to_long
Browse files Browse the repository at this point in the history
  • Loading branch information
strengejacke committed May 20, 2024
1 parent 85ce67b commit 125c7c9
Show file tree
Hide file tree
Showing 4 changed files with 99 additions and 36 deletions.
1 change: 1 addition & 0 deletions R/data_restoretype.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#' Restore the type of columns according to a reference data frame
#'
#' @param data A data frame for which to restore the column types.
#' @inheritParams data_to_long
#' @inheritParams data_rename
#' @param reference A reference data frame from which to find the correct
Expand Down
72 changes: 51 additions & 21 deletions R/data_to_long.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,65 +4,95 @@
#' the number of columns. This is a dependency-free base-R equivalent of
#' `tidyr::pivot_longer()`.
#'
#' @param data A data frame to pivot.
#' @param names_to The name of the new column that will contain the column
#' names.
#' @param data A data frame to convert to long format, so that it has more
#' rows and fewer columns post-lengthening than pre-lengthening.
#' @param names_to The name of the new column (variable) that will contain the
#' _names_ from columns in `select` as values, to identify the source of the
#' values.
#' @param names_prefix A regular expression used to remove matching text from
#' the start of each variable name.
#' @param names_sep,names_pattern If `names_to` contains multiple values, this
#' argument controls how the column name is broken up.
#' `names_pattern` takes a regular expression containing matching groups, i.e. "()".
#' @param values_to The name of the new column that will contain the values of
#' the pivoted variables.
#' @param values_to The name of the new column that will contain the _values_ of
#' the columns in `select`.
#' @param values_drop_na If `TRUE`, will drop rows that contain only `NA` in the
#' `values_to` column. This effectively converts explicit missing values to
#' implicit missing values, and should generally be used only when missing values
#' in data were created by its structure.
#' `values_to` column. This effectively converts explicit missing values to
#' implicit missing values, and should generally be used only when missing values
#' in data were created by its structure.
#' @param rows_to The name of the column that will contain the row names or row
#' numbers from the original data. If `NULL`, will be removed.
#' numbers from the original data. If `NULL`, will be removed.
#' @param ... Currently not used.
#' @inheritParams extract_column_names
#' @param cols Identical to `select`. This argument is here to ensure compatibility
#' with `tidyr::pivot_longer()`. If both `select` and `cols` are provided, `cols`
#' is used.
#' with `tidyr::pivot_longer()`. If both `select` and `cols` are provided, `cols`
#' is used.
#'
#' @details
#' Reshaping data into long format usually means that the input data frame is
#' in _wide_ format, where multiple measurements taken on the same subject are
#' stored in multiple columns (variables). The long format stores the same
#' information in a single column, with each measurement per subject stored in
#' a separate row. All variables that are not in `select` will be repeated for
#' each row that is lengthened.
#'
#' The necessary information for `data_to_long()` is:
#'
#' - The columns that contain the repeated measurements (`select`).
#' - The name of the newly created column that will contain the names of the
#' columns in `select` (`names_to`), to identify the source of the values.
#' - The name of the newly created column that contains the values of the
#' columns in `select` (`values_to`).
#'
#' In other words: Repeated measurements that are spread across several columns
#' will be gathered into a single column (`values_to`), with the original column
#' names, that identify the source of the gathered values, stored in a new column
#' (`names_to`).
#'
#' @return If a tibble was provided as input, `reshape_longer()` also returns a
#' tibble. Otherwise, it returns a data frame.
#'
#' @examplesIf requireNamespace("psych") && requireNamespace("tidyr")
#' wide_data <- data.frame(replicate(5, rnorm(10)))
#' wide_data <- setNames(
#' data.frame(replicate(2, rnorm(8))),
#' c("Time1", "Time2")
#' )
#' wide_data$ID <- 1:8
#' wide_data
#'
#' # Default behaviour (equivalent to tidyr::pivot_longer(wide_data, cols = 1:5))
#' # Default behaviour (equivalent to tidyr::pivot_longer(wide_data, cols = 1:3))
#' # probably doesn't make much sense to mix "time" and "id"
#' data_to_long(wide_data)
#'
#' # Customizing the names
#' data_to_long(wide_data,
#' select = c(1, 2),
#' names_to = "Column",
#' values_to = "Numbers",
#' rows_to = "Row"
#' data_to_long(
#' wide_data,
#' select = c("Time1", "Time2"),
#' names_to = "Timepoint",
#' values_to = "Score"
#' )
#'
#' # Full example
#' # ------------------
#' data <- psych::bfi # Wide format with one row per participant's personality test
#'
#' # Pivot long format
#' data_to_long(data,
#' very_long_data <- data_to_long(data,
#' select = regex("\\d"), # Select all columns that contain a digit
#' names_to = "Item",
#' values_to = "Score",
#' rows_to = "Participant"
#' )
#' head(very_long_data)
#'
#' data_to_long(
#' even_longer_data <- data_to_long(
#' tidyr::who,
#' select = new_sp_m014:newrel_f65,
#' names_to = c("diagnosis", "gender", "age"),
#' names_pattern = "new_?(.*)_(.)(.*)",
#' values_to = "count"
#' )
#'
#' head(even_longer_data)
#' @inherit data_rename
#' @export
data_to_long <- function(data,
Expand Down
2 changes: 1 addition & 1 deletion man/data_restoretype.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

60 changes: 46 additions & 14 deletions man/data_to_long.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 125c7c9

Please sign in to comment.