Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difficult-to-diagnose errors using "difftime" response in a linear model #679

Merged
merged 6 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: performance
Title: Assessment of Regression Models Performance
Version: 0.10.8.11
Version: 0.10.8.12
Authors@R:
c(person(given = "Daniel",
family = "Lüdecke",
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@
* `check_outliers()` now properly accept the `percentage_central` argument when
using the `"mcd"` method.

* Fixed edge cases in `check_collinearity()` and `check_outliers()` for models
with response variables of classes `Date`, `POSIXct`, `POSIXlt` or `difftime`.

# performance 0.10.8

## Changes
Expand Down
20 changes: 16 additions & 4 deletions R/check_collinearity.R
Original file line number Diff line number Diff line change
Expand Up @@ -403,8 +403,22 @@



.check_collinearity <- function(x, component, ci = 0.95, verbose = TRUE) {

Check warning on line 406 in R/check_collinearity.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_collinearity.R,line=406,col=1,[cyclocomp_linter] Reduce the cyclomatic complexity of this function from 43 to at most 40.

Check warning on line 406 in R/check_collinearity.R

View workflow job for this annotation

GitHub Actions / lint / lint

file=R/check_collinearity.R,line=406,col=1,[cyclocomp_linter] Reduce the cyclomatic complexity of this function from 43 to at most 40.
v <- insight::get_varcov(x, component = component, verbose = FALSE)

# sanity check
if (is.null(v)) {
if (isTRUE(verbose)) {
insight::format_alert(
paste(
sprintf("Could not extract the variance-covariance matrix for the %s component of the model.", component),
"Please try to run `vcov(model)`, which may help identifying the problem."

Check warning on line 415 in R/check_collinearity.R

View check run for this annotation

Codecov / codecov/patch

R/check_collinearity.R#L411-L415

Added lines #L411 - L415 were not covered by tests
)
)
}
return(NULL)

Check warning on line 419 in R/check_collinearity.R

View check run for this annotation

Codecov / codecov/patch

R/check_collinearity.R#L419

Added line #L419 was not covered by tests
}

term_assign <- .term_assignments(x, component, verbose = verbose)

# any assignment found?
Expand Down Expand Up @@ -432,10 +446,8 @@
if (insight::has_intercept(x)) {
v <- v[-1, -1]
term_assign <- term_assign[-1]
} else {
if (isTRUE(verbose)) {
insight::format_alert("Model has no intercept. VIFs may not be sensible.")
}
} else if (isTRUE(verbose)) {
insight::format_alert("Model has no intercept. VIFs may not be sensible.")

Check warning on line 450 in R/check_collinearity.R

View check run for this annotation

Codecov / codecov/patch

R/check_collinearity.R#L449-L450

Added lines #L449 - L450 were not covered by tests
}

f <- insight::find_formula(x)
Expand Down
12 changes: 9 additions & 3 deletions R/check_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
#'
#' @details For Bayesian models from packages **rstanarm** or **brms**,
#' models will be "converted" to their frequentist counterpart, using
#' [`bayestestR::bayesian_as_frequentist`](https://easystats.github.io/bayestestR/reference/convert_bayesian_as_frequentist.html).

Check warning on line 52 in R/check_model.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_model.R,line=52,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 130 characters.
#' A more advanced model-check for Bayesian models will be implemented at a
#' later stage.
#'
Expand Down Expand Up @@ -77,7 +77,7 @@
#' plots are helpful to check model assumptions, they do not necessarily indicate
#' so-called "lack of fit", e.g. missed non-linear relationships or interactions.
#' Thus, it is always recommended to also look at
#' [effect plots, including partial residuals](https://strengejacke.github.io/ggeffects/articles/introduction_partial_residuals.html).

Check warning on line 80 in R/check_model.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_model.R,line=80,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 134 characters.
#'
#' @section Homogeneity of Variance:
#' This plot checks the assumption of equal variance (homoscedasticity). The
Expand Down Expand Up @@ -192,12 +192,18 @@
suppressWarnings(.check_assumptions_glm(x, minfo, verbose, ...))
},
error = function(e) {
NULL
e
}
)

if (is.null(ca)) {
insight::format_error(paste0("`check_model()` not implemented for models of class `", class(x)[1], "` yet."))
if (inherits(ca, c("error", "simpleError"))) {
pattern <- "(\n|\\s{2,})"
replacement <- " "
cleaned_string <- gsub(pattern, replacement, ca$message)
insight::format_error(
paste("`check_model()` returned following error:", cleaned_string),
paste0("\nIf the error message does not help identifying your problem, another reason why `check_model()` failed might be that models of class `", class(x)[1], "` are not yet supported.") # nolint
)
}

# try to find sensible default for "type" argument
Expand Down
21 changes: 19 additions & 2 deletions R/check_outliers.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#' 'Details'). If a numeric value is given, it will be used as the threshold
#' for any of the method run.
#' @param ID Optional, to report an ID column along with the row number.
#' @param verbose Toggle warnings.
#' @param ... When `method = "ics"`, further arguments in `...` are passed
#' down to [ICSOutlier::ics.outlier()]. When `method = "mahalanobis"`,
#' they are passed down to [stats::mahalanobis()]. `percentage_central` can
Expand Down Expand Up @@ -309,7 +310,7 @@
#' group_iris <- datawizard::data_group(iris, "Species")
#' check_outliers(group_iris)
#'
#' @examplesIf require("see") && require("bigutilsr") && require("loo") && require("MASS") && require("ICSOutlier") && require("ICS") && require("dbscan")

Check warning on line 313 in R/check_outliers.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_outliers.R,line=313,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 154 characters.
#' \donttest{
#' # You can also run all the methods
#' check_outliers(data, method = "all", verbose = FALSE)
Expand Down Expand Up @@ -348,6 +349,7 @@
method = c("cook", "pareto"),
threshold = NULL,
ID = NULL,
verbose = TRUE,
...) {
# Check args
if (all(method == "all")) {
Expand Down Expand Up @@ -391,8 +393,23 @@
# Get data
my_data <- insight::get_data(x, verbose = FALSE)

# sanity check for date, POSIXt and difftime variables
if (any(vapply(my_data, inherits, FUN.VALUE = logical(1), what = c("Date", "POSIXt", "difftime"))) && verbose) {
insight::format_alert(
paste(
"Date variables are not supported for outliers detection. These will be ignored.",
"Make sure any date variables are converted to numeric or factor {.b before} fitting the model."
)
)
}

# Remove non-numerics
my_data <- datawizard::data_select(my_data, select = is.numeric)
my_data <- datawizard::data_select(my_data, select = is.numeric, verbose = FALSE)

# check if any data left
if (is.null(my_data) || ncol(my_data) == 0) {
insight::format_error("No numeric variables found. No data to check for outliers.")
}

# Thresholds
if (is.null(threshold)) {
Expand All @@ -409,7 +426,7 @@
)
}

if (!missing(ID)) {
if (!missing(ID) && verbose) {
insight::format_warning(paste0("ID argument not supported for model objects of class `", class(x)[1], "`."))
}

Expand Down Expand Up @@ -1175,7 +1192,7 @@

# Isolation Forest
# if ("iforest" %in% method) {
# out <- c(out, .check_outliers_iforest(x, threshold = thresholds$iforest))

Check warning on line 1195 in R/check_outliers.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_outliers.R,line=1195,col=7,[commented_code_linter] Remove commented code.
# }

# Local Outlier Factor
Expand Down Expand Up @@ -1559,7 +1576,7 @@
out <- cbind(out, ID.names)
}

# out$Distance_IQR <- Distance_IQR

Check warning on line 1579 in R/check_outliers.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_outliers.R,line=1579,col=5,[commented_code_linter] Remove commented code.

out$Distance_IQR <- vapply(as.data.frame(t(Distance_IQR)), function(x) {
ifelse(all(is.na(x)), NA_real_, max(x, na.rm = TRUE))
Expand Down Expand Up @@ -1741,7 +1758,7 @@
}

# check whether N to p ratio is not too large, else MCD flags too many outliers
# See #672: This does seem to be a function of the N/p (N = sample size; p =

Check warning on line 1761 in R/check_outliers.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_outliers.R,line=1761,col=5,[commented_code_linter] Remove commented code.
# number of parameters) ratio. When it is larger than 10, the % of outliers
# flagged is okay (in well behaved data). This makes sense: the MCD looks at
# the cov matrix of subsamples of the data - with high dimensional data, small
Expand Down Expand Up @@ -1882,14 +1899,14 @@


# .check_outliers_iforest <- function(x, threshold = 0.025) {
# out <- data.frame(Row = seq_len(nrow(x)))

Check warning on line 1902 in R/check_outliers.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_outliers.R,line=1902,col=5,[commented_code_linter] Remove commented code.
#
# # Install packages
# insight::check_if_installed("solitude")

Check warning on line 1905 in R/check_outliers.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_outliers.R,line=1905,col=4,[commented_code_linter] Remove commented code.
#
# # Compute
# if (utils::packageVersion("solitude") < "0.2.0") {
# iforest <- solitude::isolationForest(x)

Check warning on line 1909 in R/check_outliers.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/check_outliers.R,line=1909,col=7,[commented_code_linter] Remove commented code.
# out$Distance_iforest <- stats::predict(iforest, x, type = "anomaly_score")
# } else if (utils::packageVersion("solitude") == "0.2.0") {
# stop(paste("Must update package `solitude` (above version 0.2.0).",
Expand Down
3 changes: 3 additions & 0 deletions man/check_outliers.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions tests/testthat/test-check_collinearity.R
Original file line number Diff line number Diff line change
Expand Up @@ -211,3 +211,10 @@ test_that("check_collinearity, hurdle/zi models w/o zi-formula", {
)
expect_equal(out$VIF, c(1.05772, 1.05772, 1.06587, 1.06587), tolerance = 1e-4)
})

test_that("check_collinearity, invalid data", {
skip_if(packageVersion("insight") < "0.19.8.2")
dd <- data.frame(y = as.difftime(0:5, units = "days"))
m1 <- lm(y ~ 1, data = dd)
expect_error(check_collinearity(m1), "Can't extract variance-covariance matrix")
})
7 changes: 7 additions & 0 deletions tests/testthat/test-check_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,10 @@ test_that("`check_outliers()` works if convergence issues", {
x <- check_outliers(m, verbose = FALSE)
expect_s3_class(x, "check_outliers")
})

test_that("`check_model()` for invalid models", {
skip_if(packageVersion("insight") < "0.19.8.2")
dd <- data.frame(y = as.difftime(0:5, units = "days"))
m1 <- lm(y ~ 1, data = dd)
expect_error(check_model(m1))
})
13 changes: 13 additions & 0 deletions tests/testthat/test-check_outliers.R
Original file line number Diff line number Diff line change
Expand Up @@ -329,3 +329,16 @@ test_that("cook multiple methods which", {
c("setosa", "versicolor", "virginica")
)
})


test_that("check_outliers with invald data", {
dd <- data.frame(y = as.difftime(0:5, units = "days"))
m1 <- lm(y ~ 1, data = dd)
expect_error(
expect_message(
check_outliers(m1),
regex = "Date variables are not supported"
),
regex = "No numeric variables found"
)
})
Loading