Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coefficient of variation function #236

Merged
merged 33 commits into from
Sep 15, 2022
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
3ca9b46
Add coefficient of variation function
bwiernik Aug 28, 2022
9be7427
Remove suggestion that `coef_var()` can be used with models.
bwiernik Aug 28, 2022
37da910
code-style, use namespace
strengejacke Aug 29, 2022
25d0a3f
Merge branch 'main' into cv
strengejacke Aug 29, 2022
b586bb0
typo in doc
etiennebacher Aug 29, 2022
82fbcc0
minor
strengejacke Aug 29, 2022
9e85075
Merge branch 'cv' of https://github.com/easystats/datawizard into cv
strengejacke Aug 29, 2022
34c0f9b
Fix logic
bwiernik Aug 29, 2022
4f94216
Move mode and CV to descriptives.R
bwiernik Aug 29, 2022
ac39f96
Bugfix in switch()
bwiernik Aug 29, 2022
98d4752
bug fixes and add tests
bwiernik Aug 29, 2022
cc84f46
Move processing to internal functions for reuse
bwiernik Aug 29, 2022
4d7675f
Fix documentation
bwiernik Aug 29, 2022
c2e7fe3
Merge branch 'main' into cv
strengejacke Aug 29, 2022
471df7f
Merge branch 'main' into cv
IndrajeetPatil Aug 31, 2022
22083dc
Merge branch 'main' into cv
IndrajeetPatil Sep 1, 2022
f75e0f7
Merge branch 'main' into cv
IndrajeetPatil Sep 2, 2022
0273879
Merge branch 'main' into cv
etiennebacher Sep 4, 2022
694dcc5
Merge branch 'main' into cv
IndrajeetPatil Sep 5, 2022
78f1e26
Merge branch 'main' into cv
IndrajeetPatil Sep 6, 2022
76b151e
remove CV alias
bwiernik Sep 6, 2022
6ae9050
Merge branch 'main' into cv
IndrajeetPatil Sep 12, 2022
cb353c4
Merge branch 'main' into cv
IndrajeetPatil Sep 14, 2022
d11f029
Include more digits in text expected values
bwiernik Sep 14, 2022
fa10fbc
Merge branch 'main' into cv
IndrajeetPatil Sep 15, 2022
53c701a
Merge branch 'main' into cv
IndrajeetPatil Sep 15, 2022
63b4edd
check and explain why x must be ratio scaled
mattansb Sep 15, 2022
243109e
Merge branch 'cv' of https://github.com/easystats/datawizard into cv
mattansb Sep 15, 2022
0d11c8a
fix
mattansb Sep 15, 2022
d9cb700
Update R/descriptives.R
mattansb Sep 15, 2022
3626721
backticks
strengejacke Sep 15, 2022
21ea852
msg
strengejacke Sep 15, 2022
a4cff3d
typo
strengejacke Sep 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ VignetteBuilder:
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1
RoxygenNote: 7.2.1.9000
Config/testthat/edition: 3
Config/Needs/website:
rstudio/bslib,
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ S3method(center,factor)
S3method(center,grouped_df)
S3method(center,logical)
S3method(center,numeric)
S3method(coef_var,default)
S3method(coef_var,numeric)
S3method(convert_na_to,character)
S3method(convert_na_to,data.frame)
S3method(convert_na_to,default)
Expand Down Expand Up @@ -163,6 +165,7 @@ export(center)
export(centre)
export(change_code)
export(change_scale)
export(coef_var)
export(coerce_to_numeric)
export(colnames_to_row)
export(column_as_rownames)
Expand Down Expand Up @@ -201,6 +204,7 @@ export(degroup)
export(demean)
export(describe_distribution)
export(detrend)
export(distribution_coef_var)
export(distribution_mode)
export(empty_columns)
export(empty_rows)
Expand Down
26 changes: 1 addition & 25 deletions R/describe_distribution.R
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ describe_distribution.grouped_df <- function(x,
verbose = verbose
)

out <- do.call(rbind, lapply(1:length(groups), function(i) {
out <- do.call(rbind, lapply(seq_along(groups), function(i) {
d <- describe_distribution.data.frame(
groups[[i]][select],
centrality = centrality,
Expand Down Expand Up @@ -511,27 +511,3 @@ print.parameters_distribution <- function(x, digits = 2, ...) {
)
out[[1]]
}

# distribution_mode ----------------------------------

#' Compute mode for a statistical distribution
#'
#' @param x An atomic vector, a list, or a data frame.
#'
#' @return
#'
#' The value that appears most frequently in the provided data.
#' The returned data structure will be the same as the entered one.
#'
#' @examples
#'
#' distribution_mode(c(1, 2, 3, 3, 4, 5))
#' distribution_mode(c(1.5, 2.3, 3.7, 3.7, 4.0, 5))
#'
#' @export
distribution_mode <- function(x) {
uniqv <- unique(x)
tab <- tabulate(match(x, uniqv))
idx <- which.max(tab)
uniqv[idx]
}
179 changes: 179 additions & 0 deletions R/descriptives.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# distribution_mode ----------------------------------

#' Compute mode for a statistical distribution
#'
#' @param x An atomic vector, a list, or a data frame.
#'
#' @return
#'
#' The value that appears most frequently in the provided data.
#' The returned data structure will be the same as the entered one.
#'
#' @seealso For continuous variables, the
#' **Highest Maximum a Posteriori probability estimate (MAP)** may be
#' a more useful way to estimate the most commonly-observed value
#' than the mode. See [bayestestR::map_estimate()].
#'
#' @examples
#'
#' distribution_mode(c(1, 2, 3, 3, 4, 5))
#' distribution_mode(c(1.5, 2.3, 3.7, 3.7, 4.0, 5))
#'
#' @export
distribution_mode <- function(x) {
# TODO: Add support for weights, trim, binned (method)
uniqv <- unique(x)
tab <- tabulate(match(x, uniqv))
idx <- which.max(tab)
uniqv[idx]
}

#' Compute the coefficient of variation
#'
#' Compute the coefficient of variation (CV, ratio of the standard deviation to
#' the mean, \eqn{\sigma/\mu}) for a set of numeric values.
#'
#' @return The computed coefficient of variation for `x`.
#' @export
#'
#' @examples
#' coef_var(1:10)
#' coef_var(c(1:10, 100), method = "median_mad")
#' coef_var(c(1:10, 100), method = "qcd")
#' coef_var(mu = 10, sigma = 20)
#' coef_var(mu = 10, sigma = 20, method = "unbiased", n = 30)
coef_var <- function(x, ...) {
UseMethod("coef_var")
}
#' @name distribution_cv
#' @rdname coef_var
#' @export
distribution_coef_var <- coef_var

#' @export
coef_var.default <- function(x, verbose = TRUE, ...) {
if (verbose) {
warning(insight::format_message(
paste0("Can't compute the coefficient of variation objects of class '", class(x)[1], "'.")
), call. = FALSE)
}
NULL
}

#' @param x A numeric vector of ratio scale (see details), or vector of values than can be coerced to one.
#' @param mu A numeric vector of mean values to use to compute the coefficient
#' of variation. If supplied, `x` is not used to compute the mean.
#' @param sigma A numeric vector of standard deviation values to use to compute the coefficient
#' of variation. If supplied, `x` is not used to compute the SD.
#' @param method Method to use to compute the CV. Can be `"standard"` to compute
#' by dividing the standard deviation by the mean, `"unbiased"` for the
#' unbiased estimator for normally distributed data, or one of two robust
#' alternatives: `"median_mad"` to divide the median by the [stats::mad()],
#' or `"qcd"` (quartile coefficient of dispersion, interquartile range divided
#' by the sum of the quartiles \[twice the midhinge\]: \eqn{(Q_3 - Q_1)/(Q_3 + Q_1)}.
#' @param trim the fraction (0 to 0.5) of values to be trimmed from
#' each end of `x` before the mean and standard deviation (or alternatvies)
#' are computed. Values of `trim` outside the range of (0 to 0.5) are taken
#' as the nearest endpoint.
#' @param na.rm Logical. Should `NA` values be removed before computing (`TRUE`)
#' or not (`FALSE`, default)?
#' @param n If `method = "unbiased"` and both `mu` and `sigma` are provided (not
#' computed from `x`), what sample size to use to adjust the computed CV
#' for small-sample bias?
#' @param ... Further arguments passed to computation functions.
#'
#' @details
#' CV is only applicable of values taken on a ratio scale: values that have a
#' *fixed* meaningfully defined 0 (which is either the lowest or highest
#' possible value), and that ratios between them are interpretable For example,
#' how many sandwiches have I eaten this week? 0 means "none" and 20 sandwiches
#' is 4 times more than 5 sandwiches. If I were to center the number of
#' sandwiches, it will no longer be on a ratio scale (0 is no "none" it is the
#' mean, and the ratio between 4 and -2 is not meaningful). Scaling a ratio
#' scale still results in a ratio scale. So I can re define "how many half
#' sandwiches did I eat this week ( = sandwiches * 0.5) and 0 would still mean
#' "none", and 20 half-sandwiches is still 4 times more than 5 half-sandwiches.
#'
#' This means that CV is **NOT** invariance to shifting, but it is to scaling:
mattansb marked this conversation as resolved.
Show resolved Hide resolved
#' ```{r}
#' sandwiches <- c(0, 4, 15, 0, 0, 5, 2, 7)
#' coef_var(sandwiches)
#'
#' coef_var(sandwiches / 2) # same
#'
#' coef_var(sandwiches + 4) # different! 0 is no longer meaningful!
#' ````
#'
#' @rdname coef_var
#'
#' @export
coef_var.numeric <- function(x, mu = NULL, sigma = NULL,
method = c("standard", "unbiased", "median_mad", "qcd"),
trim = 0, na.rm = FALSE, n = NULL, ...) {
# TODO: Support weights
if (all(c(-1, 1) %in% sign(x))){
stop("CV only applicable for ratio scale variables")
}
method <- match.arg(method, choices = c("standard", "unbiased", "median_mad", "qcd"))
if (is.null(mu) || is.null(sigma)) {
if (isTRUE(na.rm)) {
mattansb marked this conversation as resolved.
Show resolved Hide resolved
x <- .drop_na(x)
}
n <- length(x)
x <- .trim_values(x, trim = trim, n = n)
}
if (is.null(mu)) {
mu <- switch(
method,
standard = , unbiased = mean(x, ...),
median_mad = stats::median(x, ...),
qcd = unname(sum(stats::quantile(x, probs = c(.25, .75), ...)))
)
}
if (is.null(sigma)) {
sigma <- switch(
method,
standard = , unbiased = stats::sd(x, ...),
median_mad = stats::mad(x, center = mu, ...),
qcd = unname(diff(stats::quantile(x, probs = c(.25, .75), ...)))
)
}
out <- sigma / mu
if (method == "unbiased") {
if (is.null(n)) {
stop(insight::format_message(
"A value for `n` must be provided when `method = \"unbiased\"` and both `mu` and `sigma` are provided."
), call. = FALSE)
}
# from DescTools::CoefVar
out <- out * (1 - 1 / (4 * (n - 1)) + 1 / n * out^2 + 1 / (2 * (n - 1)^2))
}
return(out)
}




# descriptives helpers

.drop_na <- function(x) {
x[!is.na(x)]
}

.trim_values <- function(x, trim = 0, n = NULL, weights = NULL) {
# TODO: Support weights
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwiernik Can this already be merged, or do you also want to work in this PR on supporting weights here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least print a message if weights has no functionality yet.

if (!is.numeric(trim) || length(trim) != 1L) {
stop("`trim` must be a single numeric value.", call. = FALSE)
}
if (is.null(NULL)) {
n <- length(x)
}
if (trim > 0 && n) {
if (anyNA(x)) return(NA_real_)
if (trim >= 0.5) return(stats::median(x, na.rm = FALSE))
lo <- floor(n * trim) + 1
hi <- n + 1 - lo
x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
}
x
}
93 changes: 93 additions & 0 deletions man/coef_var.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 7 additions & 1 deletion man/distribution_mode.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions tests/testthat/test-coef_var.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
test_that("coefficient of variation works", {
expect_equal(coef_var(1:10), 0.5504818826)
expect_equal(coef_var(1:10, method = "unbiased"), 0.5552700246)
expect_equal(coef_var(c(1:10, 100), method = "median_mad"), 0.7413)
expect_equal(coef_var(c(1:10, 100), method = "qcd"), 0.4166666667)
expect_equal(coef_var(mu = 10, sigma = 20), 2)
expect_equal(coef_var(mu = 10, sigma = 20, method = "unbiased", n = 30), 2.250614348)
expect_equal(distribution_coef_var(1:10), 0.5504818826)
})