-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support: lme
#99
Comments
Are there updates on this development? Is the package now able to work with nlme::lme() models? |
@MrJerryTAO there has been no progress because the problem has not been resolved upstream in the library(nlme)
mod <- lme(distance ~ age, data = Orthodont, random = ~ Sex)
fm1 <- lme(distance ~ age, data = Orthodont)
fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
fm3 <- lme(distance ~ age, data = Orthodont, random = ~ Sex)
# missing Subject
insight::find_predictors(fm1)
# $conditional
# [1] "age"
# missing Subject
insight::find_predictors(fm2)
# $conditional
# [1] "age" "Sex"
# missing Subject and Sex
insight::find_predictors(fm3)
# $conditional
# [1] "age"
# error
insight::find_variables(fm3)
# Error in gregexpr(pattern = "\\|", re)[[1]]: subscript out of bounds |
Hi @vincentarelbundock, in my test of insight::find_, both works almost okay. insight::find_variables(fm3) had a problem in your case because the way specifying random effects were not recognized by {insight}, but using a complete formula for random effects solves the problem as following. Does it look promising to you? However, what is missing in both are terms specified in the weights = argument in lme() that models heteroscedasticity (how the residual standard deviation varies by a function of variables). But if the lme() model does not have a weights = {} component, I guess these insight retrieval functions should be fine.
It appears that insight::find_variables() returns more variable names than insight::find_predictors(). In my modeling as shown below, "milk" as an explanatory variable in weights = {} is not selected by either of the find_() functions. However, all other variables were retained by find_variables() and get_data()
|
No, we need full support, unfortunately. |
@vincentarelbundock, I just updated my example script output above with some new findings. If find_variables() returns all variable names relevant in a model, will it solve the problem? It does not seem so, as a lme() model without weights = {} argument still has NA as inferential statistics in predictions() which only provides point estimates. Or is it more about get_data() returning a data frame with all variables? It does not seem so either, as this function gives a data frame with the variables in find_variables(). Or is it because {marginaleffects} calls find_predictors() only and therefore miss many other variables? I had a hard time exploring how these insight::: functions are used in {marginaleffects}. |
We need all of those find_predictors |
Unfortunately I don't know what to do in this situation. I explored these {insight} functions, but I could not find out how {marginaleffects} uses them. It might be too complex for me to offer help, as I am not familiar with {insight} and how {marginaleffects} depends on it. I turned to {ggeffects} for nlme models. |
To be clear, the interaction between Glad to hear that |
Hi @vincentarelbundock, I reported the variable-left-out issue in {insight} easystats/insight#727. It appears to me that the only misbehaving function in {insights} of lme() objects is find_weights() as I illustrated in the issue above. Do you agree? However, when I fixed the find_weights() function, so it returns the variable name in the lme() weights = varFunc() argument and get_data() returns all required variables, using an earlier version of {marginaleffects} still gets only point estimates but no SE. See the following example summary(Model <- lme(
distance ~ age, data = Orthodont,
random = ~ 1 | Subject,
weights = varIdent(form = ~ 1 | Sex)))
find_predictors(Model)
'$conditional
[1] "age"'
find_variables(Model)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"'
find_weights(Model) # "varIdent(form = ~1 | Sex)" before fix
'"Sex"'
class(find_weights(Model)) # "character"
names(get_data(Model)) # Sex was left out
'"distance" "age" "Subject" "Sex"'
predictions(Model) # missing SE
predictions(Model, newdata = Orthodont) # same
" Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
25.6 NA NA NA NA NA
26.7 NA NA NA NA NA
27.8 NA NA NA NA NA
28.8 NA NA NA NA NA
21.8 NA NA NA NA NA
--- 98 rows omitted. See ?avg_predictions and ?print.marginaleffects ---" |
This sounds right, but I have not looked at it deeply enough to be 100% sure.
This is normal. I need to do other stuff in |
Would be great to see the updates in future versions! Please follow up on easystats/insight#727 if you need additional support in {insight}.
I prefer {marginaleffects} because the SE and the estimate are consistently (as far as I have found) on the same scale. In contrast with {ggeffects}, SE is usually on the linear predictor scale, no matter what estimate stands for. This remained a myth until I compared results between {marginaleffects} and {ggeffects} on a Poisson regression. The side effect of {ggeffects} depending on linear-predictor SEs is that {ggeffects} can give asymmetrical CIs (e.g. Poisson mean > 0, and probability within [0, 1]), whereas {marginaleffects} always gives symmetrical CIs that sometimes go beyond possible limits (e.g. Poisson mean < 0, and probability outside [0, 1]). My solution is to use response estimates (y) and response SEs SE(y) from {marginaleffects} and manually transform them onto the linear-predictor scale due to the link function y = g(x), recognizing that SE(y) = g'(x)SE(x), before building a CI based on SE(x). I do not use linear-predictor estimates (x) and SEs SE(x) from {marginaleffects} to find estimate and CI of mean response because the mean response does not equal to a nonlinear link function of mean linear predictor in glm() models, contrary to what you once suggested. It would be nice if {marginaleffects} produces asymmetrical, range-preserving CIs when necessary, using the method above. In my GitHub of some customized R functions, I proposed efficient dplyr programming to generate repeated-measurements-, variance-, and difference-adjusted, range-preserving CIs for means and proportions from individual scores and group estimates, which are robust to imbalanced factorial designs, unequal group variances, different group sizes, and missing observations. I used {marginaleffects} summary results as inputs of group estimates to my function, which produces difference-adjusted, range-preserving CIs. Though, the range-preserving technique here is based on functions like |
Thanks for the link and suggestions, but I am unlikely to implement that. You should feel free to publish an extension package. |
@MrJerryTAO there is now experimental support for library(nlme)
library(marginaleffects)
mod <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
avg_slopes(mod)
#
# Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# Sex Female - Male -2.32 0.7614 -3.05 0.0023 -3.813 -0.829
# age dY/dX 0.66 0.0616 10.72 <0.001 0.539 0.781
#
# Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
avg_comparisons(mod)
#
# Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# Sex Female - Male -2.32 0.7614 -3.05 0.0023 -3.813 -0.829
# age +1 0.66 0.0616 10.72 <0.001 0.539 0.781
#
# Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
avg_predictions(mod, by = "Sex")
#
# Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# Male 25.0 0.486 51.4 <0.001 24.0 25.9
# Female 22.6 0.586 38.6 <0.001 21.5 23.8
#
# Columns: Sex, estimate, std.error, statistic, p.value, conf.low, conf.high |
Excellent, @vincentarelbundock. Did not expect it to be so prompt. The current lme support in marginaleffect works mostly okay except
I also tested {ggeffects} given {insight} update. It has another series of errors and mistakes that I reported at strengejacke/ggeffects#295. Nevertheless, it does differentiate between population SE and individual SE by type = "fixed" and "random". Model set up. {insight} find_ and get_ seem to work well. library(nlme)
library(insight)
library(marginaleffects)
data("Orthodont", package = "nlme")
ggplot( # spaghetti plot with LOESS group mean
data = Orthodont,
aes(x = age, y = distance, group = Subject, color = Sex, fill = Sex)) +
geom_smooth(aes(
group = Sex), alpha = 0.6, method = "loess", span = 1, show.legend = F) +
geom_line(size = 0.2, alpha = 0.6) +
scale_color_manual(
breaks = c("Female", "Male"),
values = c("#D14343", "#2197DF")) +
scale_fill_manual(
breaks = c("Female", "Male"),
values = c("#D14343", "#2197DF")) +
scale_x_continuous(breaks = c(8, 10, 12, 14), minor_breaks = NULL) +
labs(x = "Age (year)", y = "Distance") +
theme_minimal() +
theme(
strip.placement = "outside",
strip.text = element_text(size = rel(1)),
legend.title = element_blank(),
legend.position = c(0.1, 0.7), # "bottom"
legend.direction = "vertical", legend.box = "vertical",
legend.margin = margin(0, 0, 0, 0),
legend.spacing = unit(0, "line"),
legend.background = element_blank(),
legend.text = element_text(size = rel(0.8)),
legend.key = element_blank(),
legend.key.size = unit(0.8, "line"),
legend.key.width = unit(1.2, "line"),
panel.grid.major.y = element_line(colour = "grey90", size = 0.2),
panel.grid.minor.y = element_line(colour = "grey90", size = 0.2),
panel.grid.major.x = element_line(colour = "grey90", size = 0.2),
panel.grid.minor.x = element_line(colour = "grey90", size = 0.2)
)
summary(Model <- lme( # using all formula components
distance ~ age, data = Orthodont, # fixed effect
random = list(~ 1 + age | Subject), # correlated random int + age
correlation = corAR1( # error correlation over time within Subject
form = ~ as.integer(as.factor(age)) | Subject), # 1:4 as time index
weights = varIdent(form = ~ 1 | Sex))) # heterosc: M/F has diff var
" AIC BIC logLik
436.6792 457.9867 -210.3396
Random effects:
Formula: ~1 + age | Subject
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 2.3152914 (Intr)
age 0.2402225 -0.619
Residual 1.5037380
Correlation Structure: AR(1)
Formula: ~as.integer(as.factor(age)) | Subject
Parameter estimate(s):
Phi
-0.2325323
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Sex
Parameter estimates:
Male Female
1.000000 0.401668
Fixed effects: distance ~ age
Value Std.Error DF t-value p-value
(Intercept) 17.115492 0.6384924 80 26.8061 0
age 0.624408 0.0617834 80 10.1064 0"
vcov(Model) # only among fixed effect coefficients
" (Intercept) age
(Intercept) 0.4599040 -0.024615698
age -0.0246157 0.002237791"
find_predictors(Model)
'$conditional
[1] "age"
$correlation
[1] "age" "Subject"'
find_predictors(Model, effects = "all", component = "all")
'$conditional
[1] "age"
$random
[1] "Subject"
$correlation
[1] "age" "Subject"'
find_variables(Model)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"
$correlation
[1] "age" "Subject"'
find_weights(Model) # "varIdent(form = ~1 | Sex)" before fix
'"Sex"'
class(find_weights(Model)) # "character"
names(get_data(Model)) # Sex left out before fix
'"distance" "age" "Subject" "Sex"' predict.lme() as a reference. It has a level = c() argument to include or not random effects by clustering level. # Prediction using nlme
Orthodont[Orthodont$Subject %in% c("M01", "F01"), ]
"Grouped Data: distance ~ age | Subject
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
65 21.0 8 F01 Female
66 20.0 10 F01 Female
67 21.5 12 F01 Female
68 23.0 14 F01 Female"
predict(Model, newdata = expand.grid(
age = c(8, 10, 12, 14), Sex = c("Male", "Female")),
level = 0) # mean of population. No SE or CI option.
# Male has the same prediction as Female: Sex is only in weights
"[1] 22.11075 23.35957 24.60839 25.85720 22.11075 23.35957 24.60839
[8] 25.85720"
predict(Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 0) # same result as above
predict(Model, newdata = expand.grid(
age = c(8, 10, 12, 14), Sex = c("Male", "Female")),
level = 1) # must supply cluster variables
"Error in predict.lme:
cannot evaluate groups for desired levels on 'newdata'"
predict(Model, newdata = expand.grid( # level = max() by default
age = c(8, 10, 12, 14), Sex = c("Male", "Female"))) # same error
predict(Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 1) # different when including random effects
" M01 M01 M01 M01 F01 F01 F01
24.77898 26.53948 28.29998 30.06047 20.10894 20.92936 21.74977
F01
22.57018"
predict(Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 0:1) # level can be a vector, 0 for population
" Subject predict.fixed predict.Subject
1 M01 22.11075 24.77898
2 M01 23.35957 26.53948
3 M01 24.60839 28.29998
4 M01 25.85720 30.06047
65 F01 22.11075 20.10894
66 F01 23.35957 20.92936
67 F01 24.60839 21.74977
68 F01 25.85720 22.57018" {marginaleffect} results. See comments inside. # Marginal effects using marginaleffects
predictions( # includes random effects (level=1) by default
Model, newdata = Orthodont[ # individual mean but SE constant by Subject
Orthodont$Subject %in% c("M01", "F01"), ]) # incorrect SE!!!
"Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % age Subject Sex
24.8 0.394 63.0 <0.001 24.0 25.6 8 M01 Male
26.5 0.410 64.7 <0.001 25.7 27.3 10 M01 Male
28.3 0.460 61.5 <0.001 27.4 29.2 12 M01 Male
30.1 0.535 56.2 <0.001 29.0 31.1 14 M01 Male
20.1 0.394 51.1 <0.001 19.3 20.9 8 F01 Female
20.9 0.410 51.1 <0.001 20.1 21.7 10 F01 Female
21.7 0.460 47.3 <0.001 20.8 22.7 12 F01 Female
22.6 0.535 42.2 <0.001 21.5 23.6 14 F01 Female"
predictions( # cannot predict individual mean without cluster variable
Model, newdata = expand.grid( # this should be desired and expected
age = c(8, 10, 12, 14), Sex = c("Male", "Female")))
"Error: Unable to compute predicted values with this model. You can try
to supply a different dataset to the `newdata` argument. This
error was also raised:
cannot evaluate groups for desired levels on 'newdata'"
predictions( # warning for using level =, but argument worked
Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 0) # gives population mean and ?perhaps population SE
"Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % age Subject Sex
22.1 0.394 56.2 <0.001 21.3 22.9 8 M01 Male
23.4 0.410 57.0 <0.001 22.6 24.2 10 M01 Male
24.6 0.460 53.5 <0.001 23.7 25.5 12 M01 Male
25.9 0.535 48.4 <0.001 24.8 26.9 14 M01 Male
22.1 0.394 56.2 <0.001 21.3 22.9 8 F01 Female
23.4 0.410 57.0 <0.001 22.6 24.2 10 F01 Female
24.6 0.460 53.5 <0.001 23.7 25.5 12 F01 Female
25.9 0.535 48.4 <0.001 24.8 26.9 14 F01 Female
Warning message:
These arguments are not supported for models of class `lme`: level. "
avg_predictions(Model) # random effect should average out; incorrect SE???
" Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
24 0.431 55.6 <0.001 23.1 24.8"
avg_predictions(Model, level = 0) # same result as above; but warning
avg_predictions( # different means; but why? possibly an individual mean
Model, by = "Sex") # with different random effects across ind by Sex
" Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
Male 24.9 0.431 57.7 <0.001 24.0 25.7
Female 22.7 0.431 52.5 <0.001 21.8 23.5"
avg_predictions( # same population mean by Sex
Model, by = "Sex", level = 0) # but SE does not vary by Sex!!!
" Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
Male 24 0.431 55.6 <0.001 23.1 24.8
Female 24 0.431 55.6 <0.001 23.1 24.8
Warning message:
These arguments are not supported for models of class `lme`: level."
plot_predictions(Model, condition = "Sex") # equal CI by Sex. Incorrect!!!
comparisons(Model, by = T) # Sex in weights not included?
# correct to do so: does not affect mean
" Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age +1 0.624 0.0618 10.1 <0.001 0.503 0.746"
avg_slopes(Model) # same result as above
comparisons( # two rows different due to random age effects differ by Sex
Model, by = "Sex") # but equal SE by Sex. Incorrect!!!
" Term Contrast Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age mean(+1) Male 0.713 0.0618 11.53 <0.001 0.592 0.834
age mean(+1) Female 0.496 0.0618 8.03 <0.001 0.375 0.617"
avg_slopes(Model, by = "Sex") # same result as above
comparisons( # same issue: equal SE by Sex. Incorrect!!!
Model, by = "Sex", level = 0) # level = worked but warning
"Term Contrast Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age mean(+1) Male 0.624 0.0618 10.1 <0.001 0.503 0.746
age mean(+1) Female 0.624 0.0618 10.1 <0.001 0.503 0.746
Warning message:
These arguments are not supported for models of class `lme`: level. "
avg_slopes(Model, by = "Sex", level = 0) # same as above
plot_comparisons( # plots comparisons(Model, by = "Sex"); incorrect equal CI
Model, variables = c("age"), by = c("Sex"))
plot_comparisons( # mean at ~0.88; neither population of individual
Model, variables = c("age"), condition = c("Sex")) # what is it???
marginal_means(Model) # individual means; strange ordering
" Term Value Mean Std. Error z Pr(>|z|) 2.5 % 97.5 %
Subject M16 23.0 0.431 53.4 <0.001 22.2 23.9
Subject M05 23.1 0.431 53.6 <0.001 22.3 24.0
...
Subject F11 26.4 0.431 61.1 <0.001 25.5 27.2"
marginal_means(Model, level = 1) # same result as above with warning
# same SE for all Subject? Does not look right!!!
marginal_means(Model, level = 0) # equal population mean with warning
" Term Value Mean Std. Error z Pr(>|z|) 2.5 % 97.5 %
Subject M16 24 0.431 55.6 <0.001 23.1 24.8
Subject M05 24 0.431 55.6 <0.001 23.1 24.8
....
Subject F11 24 0.431 55.6 <0.001 23.1 24.8"
hypotheses(Model) # no test on other parameters, like correlation structure
# coefficient and heteroscedasticity in variance function
" Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
b1 17.115 0.6385 26.8 <0.001 15.864 18.367
b2 0.624 0.0618 10.1 <0.001 0.503 0.746"
hypotheses(Model, "age = 1") # worked
" Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age = 1 -0.376 0.0618 -6.08 <0.001 -0.497 -0.254"
hypotheses(Model, "(Intercept) = 1") # could not refer to intercept
"object 'Intercept' not found"
hypotheses( # Error: Assertion on 'df' failed: Must have length 1
Model, df = insight::get_df(Model)) # df can be coef-specific |
Thanks for the reports.
hypotheses(Model, "`(Intercept)` = 1") |
Thank you for the update.
I don't think this is theoretically true. It depends one whether the prediction is for a population mean (fixed effect SD) or cluster mean (both fixed and random effect SD), corresponding to a confidence interval vs. prediction interval.
Weights in nlme::gls() and lme() may be defined very differently from those in other packages: both model the heteroscedasticity form to allow population SE changes with some predictors, and estimate coefficients in the variance effect; therefore, both confidence interval and prediction interval should change in width according to the variance function in weight = varFunc() if the predictor changes, not just in observed sample, but also in counterfactual predictions. See examples in lme() easystats/insight#727 (comment). In contrast, ggeffect() takes an incorrect approach to allow population SE to vary by some predictor as in the very last execution in strengejacke/ggeffects#295 (comment) where CI widens with age, but age has only a linear fixed and random effect on distance. Not sure how these age-varying SE and CI were calculated, but they look incorrect: weight = varFunc() coefficient estimates say that population SE and CI width should reduce with age in a power function. |
Thanks for the clarification. I don't know these model objects well enough to implement a whole new infrastructure to support all this. In your opinion, would it be sufficient to:
|
Possibly, but probably better to issue a warning instead because the SE the current infrastructure already got should be quite helpful. See predict.lm() help file with weights argument. It allows user to specify a numeric vector or formula to apply on new data. Warning is printed in certain cases.
The most useful resource I have found so far is To make a CI with random effects, so it measures the CI of cluster means (wider than that of the population mean), enlarge the SE to be Within each cluster, the error term is allowed to correlate through To make a prediction interval with observation-level uncertainty, so it measures the CI of possible Y (random-effects, cluster mean CI), further enlarge the SE to be |
Awwwww, I think I understand now! You are not saying that the standard errors are statistically incorrect. What you are saying is that these are not the quantities you want. To be clear:
So yes, you are right that the intervals do not account for the random effects structure. This is known. In fact, you should know that this is the same approach that we use everywhere in the package, including for For now, prediction intervals are out of scope, as are confidence intervals that account for the covariance structure of REs. |
I have not tried marginaleffects with lmer() models. What do you think it indicates if I use population mean standard errors for cluster mean predictions? i.e., in predictions(lme(), level = 1, 2, ...) for cluster mean estimates with SE equal to those of population mean estimates. The strategy ggeffects take in ggemmeans() and ggeffect()
Regarding comparisons() and slopes() with level = 0 vs level = 1, 2, ... |
If you want to really nail down the correct interpretation, I recommend you talk to a local |
Not supported by
insight
: easystats/insight#66User request:
https://twitter.com/aosmith16/status/1444733963151544321?s=21
Working
get_coef
get_vcov
get_predict
set_coef
Not working
insight::get_data
predict()
function.Example
The text was updated successfully, but these errors were encountered: