-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outlier detection in Linear mixed models failed? #711
Comments
Dear Niklas, thank you very much for following-up with this issue after your email. I was wondering if you could include some example code for both If using your own data, you probably cannot share it, however, you could share the code and output that generated it to give us an idea and test with a reproducible example on our end. Even better, would be if you would be able to generate a |
Hi, I am not familiar with reprex etc but I could generate a reproducible code that has the same issue. When you run below code it will generate the check_model plots and shows only a red dot for 40. However, all other points outside of the contour line are not in red even though they should be outliers? # Install necessary packages if they are not already installed
if (!require("lme4")) install.packages("lme4", dependencies = TRUE)
if (!require("performance")) install.packages("performance", dependencies = TRUE)
# Load the packages
library(lme4)
library(performance)
library(see)
# Create the data frame
set.seed(123) # for reproducibility
n <- 100
subjectID <- paste("Subject", rep(1:10, each = 10))
PN <- runif(n, 0, 100)
alpha <- 0.5 * PN + rnorm(n, mean = 50, sd = 10)
# Introducing outliers
alpha[c(20, 40)] <- c(150, 160) # arbitrarily chosen subjects for outliers
data <- data.frame(subjectID, PN, alpha)
# Fit the mixed model
model <- lmer(alpha ~ PN + (1 | subjectID), data = data)
outlier <- check_outliers(model)
outlier
plot(outlier)
check_model(model) |
Thank you for providing reproducing code. For future reference, I have turned it into a reprex using the # Load the packages
library(lme4)
library(performance)
library(see)
# Create the data frame
set.seed(123) # for reproducibility
n <- 100
subjectID <- paste("Subject", rep(1:10, each = 10))
PN <- runif(n, 0, 100)
alpha <- 0.5 * PN + rnorm(n, mean = 50, sd = 10)
# Introducing outliers
alpha[c(20, 40)] <- c(150, 160) # arbitrarily chosen subjects for outliers
data <- data.frame(subjectID, PN, alpha)
# Fit the mixed model
model <- lmer(alpha ~ PN + (1 | subjectID), data = data)
outlier <- check_outliers(model)
outlier
#> 1 outlier detected: case 40.
#> - Based on the following method and threshold: cook (0.7).
#> - For variable: (Whole model).
# plot(outlier)
x <- check_model(model)
plot1 <- plot(x$OUTLIERS)
plot1 plot2 <- plot(x)$OUTLIERS
plot2 Created on 2024-04-28 with reprex v2.1.0 |
We can see that # Prepare model
library(see)
library(lme4)
library(performance)
set.seed(123) # for reproducibility
n <- 100
subjectID <- paste("Subject", rep(1:10, each = 10))
PN <- runif(n, 0, 100)
alpha <- 0.5 * PN + rnorm(n, mean = 50, sd = 10)
alpha[c(20, 40)] <- c(150, 160) # arbitrarily chosen subjects for outliers
data <- data.frame(subjectID, PN, alpha)
model <- lmer(alpha ~ PN + (1 | subjectID), data = data)
x <- check_model(model)
# These two functions provide different results:
# `check_model()`
see:::.plot_diag_outliers_new(x$INFLUENTIAL) # `check_outliers()`
see:::plot.see_check_outliers(x$OUTLIERS) # Methods don't agree on observation 40
x$INFLUENTIAL[40, ]
#> Hat Cooks_Distance Predicted Residuals Std_Residuals Index
#> 40 0.04070025 0.9471816 65.41945 94.58055 94.58055 40
#> Influential
#> 40 Influential
attributes(x$OUTLIERS)$influential_obs[40, ]
#> Hat Cooks_Distance Predicted Residuals Std_Residuals Index
#> 40 0.04070025 0.9471816 65.41945 94.58055 94.58055 40
#> Influential
#> 40 OK
# Even though it is identified correctly here:
attributes(x$OUTLIERS)$data[40, ]
#> Row Distance_Cook Outlier_Cook Outlier
#> 40 40 0.9471816 1 1
# `check_model()`
attributes(x$INFLUENTIAL)$cook_levels
#> [1] 0.698073 As the cook’s distance is 0.94, which is higher than 0.7, it should be tagged as outlier though, so this appears like an internal bug from Created on 2024-04-28 with reprex v2.1.0 |
I have implemented a fix in PR #717, thank you for the report |
Hi,
-> When I look at the outlier plot, there are some samples that fall outside the contour line but are not flagged in red, as demonstrated in the performance tutorial. Also, when I check the model with the check_outliers function, it says I don't have any outliers.
So my question is why those samples are not flagged as outliers even though they are outside the contour lines. Has this something to do with that I am using mixed models or how could I explain this?
Thanks in advance and best,
Niklas
The text was updated successfully, but these errors were encountered: