Outlier detection in Linear mixed models failed? #711

tappituffi · 2024-04-16T15:07:23Z

Hi,
-> When I look at the outlier plot, there are some samples that fall outside the contour line but are not flagged in red, as demonstrated in the performance tutorial. Also, when I check the model with the check_outliers function, it says I don't have any outliers.
So my question is why those samples are not flagged as outliers even though they are outside the contour lines. Has this something to do with that I am using mixed models or how could I explain this?

Thanks in advance and best,
Niklas

rempsyc · 2024-04-18T09:31:06Z

Dear Niklas, thank you very much for following-up with this issue after your email. I was wondering if you could include some example code for both check_model() and check_outliers() so we are better able to investigate what is happening.

If using your own data, you probably cannot share it, however, you could share the code and output that generated it to give us an idea and test with a reproducible example on our end. Even better, would be if you would be able to generate a reprex using base R data, for example included in the help() examples of the linear mixed model function you use.

tappituffi · 2024-04-20T17:13:21Z

Hi,

I am not familiar with reprex etc but I could generate a reproducible code that has the same issue. When you run below code it will generate the check_model plots and shows only a red dot for 40. However, all other points outside of the contour line are not in red even though they should be outliers?
Also, when you run check_outlier, it only states the 40 as outlier and when you run plot(outlier) it doesn't even show the dot for 40 in red. So I guess this code shows the issue I have for my study. So my question is if points are outside the contour line, should they be regarded as outliers and hence colored in red or not?
Thanks in advance!

# Install necessary packages if they are not already installed
if (!require("lme4")) install.packages("lme4", dependencies = TRUE)
if (!require("performance")) install.packages("performance", dependencies = TRUE)

# Load the packages
library(lme4)
library(performance)
library(see)

# Create the data frame
set.seed(123)  # for reproducibility
n <- 100
subjectID <- paste("Subject", rep(1:10, each = 10))
PN <- runif(n, 0, 100)
alpha <- 0.5 * PN + rnorm(n, mean = 50, sd = 10)

# Introducing outliers
alpha[c(20, 40)] <- c(150, 160)  # arbitrarily chosen subjects for outliers

data <- data.frame(subjectID, PN, alpha)

# Fit the mixed model
model <- lmer(alpha ~ PN + (1 | subjectID), data = data)
outlier <- check_outliers(model)
outlier
plot(outlier)
check_model(model)

rempsyc · 2024-04-28T09:16:13Z

Thank you for providing reproducing code. For future reference, I have turned it into a reprex using the reprex package which allows us to also see the visual output of your code:

# Load the packages
library(lme4)
library(performance)
library(see)

# Create the data frame
set.seed(123)  # for reproducibility
n <- 100
subjectID <- paste("Subject", rep(1:10, each = 10))
PN <- runif(n, 0, 100)
alpha <- 0.5 * PN + rnorm(n, mean = 50, sd = 10)

# Introducing outliers
alpha[c(20, 40)] <- c(150, 160)  # arbitrarily chosen subjects for outliers

data <- data.frame(subjectID, PN, alpha)

# Fit the mixed model
model <- lmer(alpha ~ PN + (1 | subjectID), data = data)
outlier <- check_outliers(model)
outlier
#> 1 outlier detected: case 40.
#> - Based on the following method and threshold: cook (0.7).
#> - For variable: (Whole model).
# plot(outlier)

x <- check_model(model)

plot1 <- plot(x$OUTLIERS)
plot1

plot2 <- plot(x)$OUTLIERS
plot2

^{Created on 2024-04-28 with reprex v2.1.0}

rempsyc · 2024-04-28T10:34:52Z

We can see that plot(x$OUTLIERS) (check_outliers()) and plot(x)$OUTLIERS (from check_model()) provide different results. It seems that this difference comes from the see packages rather than the performance package. Specifically, it comes from the function used internally, .plot_diag_outliers_new().

# Prepare model
library(see)
library(lme4)
library(performance)
set.seed(123)  # for reproducibility
n <- 100
subjectID <- paste("Subject", rep(1:10, each = 10))
PN <- runif(n, 0, 100)
alpha <- 0.5 * PN + rnorm(n, mean = 50, sd = 10)
alpha[c(20, 40)] <- c(150, 160)  # arbitrarily chosen subjects for outliers
data <- data.frame(subjectID, PN, alpha)
model <- lmer(alpha ~ PN + (1 | subjectID), data = data)
x <- check_model(model)

# These two functions provide different results:
# `check_model()`
see:::.plot_diag_outliers_new(x$INFLUENTIAL)

# `check_outliers()`
see:::plot.see_check_outliers(x$OUTLIERS)

# Methods don't agree on observation 40
x$INFLUENTIAL[40, ]
#>           Hat Cooks_Distance Predicted Residuals Std_Residuals Index
#> 40 0.04070025      0.9471816  65.41945  94.58055      94.58055    40
#>    Influential
#> 40 Influential

attributes(x$OUTLIERS)$influential_obs[40, ]
#>           Hat Cooks_Distance Predicted Residuals Std_Residuals Index
#> 40 0.04070025      0.9471816  65.41945  94.58055      94.58055    40
#>    Influential
#> 40          OK

# Even though it is identified correctly here:
attributes(x$OUTLIERS)$data[40, ]
#>    Row Distance_Cook Outlier_Cook Outlier
#> 40  40     0.9471816            1       1

# `check_model()`
attributes(x$INFLUENTIAL)$cook_levels
#> [1] 0.698073

As the cook’s distance is 0.94, which is higher than 0.7, it should be tagged as outlier though, so this appears like an internal bug from check_outliers().

^{Created on 2024-04-28 with reprex v2.1.0}

rempsyc · 2024-04-28T11:21:32Z

I have implemented a fix in PR #717, thank you for the report

rempsyc added the Reprex 📊 We need a reproducible example for further investigation label Apr 18, 2024

rempsyc added a commit that referenced this issue Apr 28, 2024

fixes #711

8ab9b78

rempsyc mentioned this issue Apr 28, 2024

Fixes bug in outlier plot method #717

Merged

strengejacke closed this as completed in #717 Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outlier detection in Linear mixed models failed? #711

Outlier detection in Linear mixed models failed? #711

tappituffi commented Apr 16, 2024

rempsyc commented Apr 18, 2024

tappituffi commented Apr 20, 2024 •

edited by strengejacke

Loading

rempsyc commented Apr 28, 2024 •

edited

Loading

rempsyc commented Apr 28, 2024

rempsyc commented Apr 28, 2024

Outlier detection in Linear mixed models failed? #711

Outlier detection in Linear mixed models failed? #711

Comments

tappituffi commented Apr 16, 2024

rempsyc commented Apr 18, 2024

tappituffi commented Apr 20, 2024 • edited by strengejacke Loading

rempsyc commented Apr 28, 2024 • edited Loading

rempsyc commented Apr 28, 2024

rempsyc commented Apr 28, 2024

tappituffi commented Apr 20, 2024 •

edited by strengejacke

Loading

rempsyc commented Apr 28, 2024 •

edited

Loading