find_ and get_ methods with models of multiple formula components like clm() & lme() #727

DrJerryTAO · 2023-03-10T11:24:20Z

Hi @strengejacke, I spotted some leaving-out-variable issues with find_...() and get_...() in certain model objects with multiple formula components, especially in the following functions

find_predictors()
find_variables()
find_weights()
get_data()

which causes issues in other packages that depend one {insight}.

With nlme::lme(), variables in the weight = varFunc() are not retained. The resolution may require an update to the insight:::find_weights.default() method or adding a new method to find_weights(); or rather treat variables in weight = varFunc() as predictors and update the insight:::find_predictors.default() method or adding a new method to find_predictors().

summary(Model <- lme(
  distance ~ age, data = Orthodont, 
  random = ~ 1 | Subject, 
  weights = varIdent(form = ~ 1 | Sex)))
summary(Model <- lme(
  distance ~ age, data = Orthodont, 
  random = ~ Sex | Subject))
find_predictors(Model)
'$conditional
[1] "age"'
find_variables(Model)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"'
find_weights(Model) # Did not extract Sex as a variable
'"varIdent(form = ~1 | Sex)"'
class(find_weights(Model)) # "character"
names(get_data(Model)) # Sex was left out
'"distance" "age"      "Subject" '

I think the weights argument in lme() may behave very differently from those in other models, as it is not a numeric vector by which all variables are divided nor an integer vector of observation counts but a function modeling the heteroscedasticity. This is why I suspect that a good solution is to get the variable names through find_predictors() instead of find_weights(). The lme() document shows that weights is

an optional varFunc object or one-sided formula describing the within-group heteroscedasticity structure. If given as a formula, it is used as the argument to varFixed, corresponding to fixed variance weights. See the documentation on varClasses for a description of the available varFunc classes. Defaults to NULL, corresponding to homoscedastic within-group errors.

With ordinal::clm(), variables in the scale = ~ ... and nominal = ~ ... formulae are not retained. The resolution may require an update to the insight:::find_predictors.default() method or adding a new method to find_predictors().

summary(Model <- clm( # nominal + scale effects
  cyl ~ wt, scale = ~ vs, nominal = ~ hp, 
  data = mtcars %>% mutate(cyl = factor(cyl))))
coef(summary(Model))
"                     Estimate   Std. Error       z value  Pr(>|z|)
4|6.(Intercept) 96990.6137075 8.972213e+08  1.081011e-04 0.9999137
6|8.(Intercept) 16987.7165352 4.252071e+11  3.995163e-08 1.0000000
4|6.hp           -772.5798577 7.137124e+06 -1.082481e-04 0.9999136
6|8.hp             -9.4092060 1.757182e+09 -5.354714e-09 1.0000000
wt               4591.3466175 4.280643e+07  1.072583e-04 0.9999144
vs                 -0.3962198 1.837861e+04 -2.155874e-05 0.9999828"
find_variables(Model)
'$response
[1] "cyl"
$conditional
[1] "wt"'
find_predictors(Model)
'$conditional
[1] "wt"'
find_weights(Model) # NULL
names(get_data(Model)) # only "cyl" "wt" got picked

strengejacke · 2023-03-12T17:10:40Z

@vincentarelbundock I guess it would be good to have this issue resolved before the next CRAN submission?

vincentarelbundock · 2023-03-12T17:18:59Z

Yes, would be great, but I'm not sure I'll have much time in the coming weeks, unfortunately.

…e clm() & lme() Fixes #727

strengejacke · 2023-03-12T17:56:41Z

find_weights() now correctly extracts the name of the related weight-variable:

library(insight)
library(nlme)
data(Orthodont)
Orthodont$w <- abs(rnorm(nrow(Orthodont)))

m1 <- lme(
  distance ~ age, data = Orthodont, 
  random = ~ 1 | Subject, 
  weights = varIdent(form = ~ 1 | Sex))

m2 <- lme(
  distance ~ age, data = Orthodont, 
  random = ~ 1 | Subject)

m3 <- lme(
  distance ~ age,
  data = Orthodont,
  random = ~ 1 | Subject,
  weights = ~w
)

find_weights(m1)
#> [1] "Sex"
find_weights(m2)
#> NULL
find_weights(m3)
#> [1] "w"

^{Created on 2023-03-12 with reprex v2.0.2}

strengejacke · 2023-03-12T18:16:07Z

I think the clm-issues are also resolved:

library(insight)
library(ordinal)
data(mtcars)
m2 <- suppressWarnings(clm( # nominal + scale effects
  cyl ~ wt,
  scale = ~vs, nominal = ~hp,
  data = transform(mtcars, cyl = factor(cyl))
))

find_formula(m2)
#> $conditional
#> cyl ~ wt
#> 
#> $scale
#> ~vs
#> 
#> $nominal
#> ~hp
#> 
#> attr(,"class")
#> [1] "insight_formula" "list"

find_variables(m2)
#> $response
#> [1] "cyl"
#> 
#> $conditional
#> [1] "wt"
#> 
#> $scale
#> [1] "vs"
#> 
#> $nominal
#> [1] "hp"

head(get_data(m2))
#>                   cyl    wt vs  hp
#> Mazda RX4           6 2.620  0 110
#> Mazda RX4 Wag       6 2.875  0 110
#> Datsun 710          4 2.320  1  93
#> Hornet 4 Drive      6 3.215  1 110
#> Hornet Sportabout   8 3.440  0 175
#> Valiant             6 3.460  1 105

find_predicors(m2)
#> Error in find_predicors(m2): could not find function "find_predicors"

head(get_predictors(m2))
#>                      wt vs  hp
#> Mazda RX4         2.620  0 110
#> Mazda RX4 Wag     2.875  0 110
#> Datsun 710        2.320  1  93
#> Hornet 4 Drive    3.215  1 110
#> Hornet Sportabout 3.440  0 175
#> Valiant           3.460  1 105

get_parameters(m2)
#>         Parameter      Estimate
#> 1 4|6.(Intercept) 96990.6137075
#> 2 6|8.(Intercept) 16987.7165352
#> 3          4|6.hp  -772.5798577
#> 4          6|8.hp    -9.4092060
#> 5              wt  4591.3466175
#> 6              vs    -0.3962198

parameters::model_parameters(m2, effects = "all", components = "all")
#> # Intercept
#> 
#> Parameter       | Log-Odds |       SE |                       95% CI |         z |      p
#> -----------------------------------------------------------------------------------------
#> 4|6 (Intercept) | 96990.61 | 8.97e+08 | [       -1.76e+09, 1.76e+09] |  1.08e-04 | > .999
#> 6|8 (Intercept) | 16987.72 | 4.25e+11 | [       -8.33e+11, 8.33e+11] |  4.00e-08 | > .999
#> 4|6 hp          |  -772.58 | 7.14e+06 | [       -1.40e+07, 1.40e+07] | -1.08e-04 | > .999
#> 6|8 hp          |    -9.41 | 1.76e+09 | [       -3.44e+09, 3.44e+09] | -5.35e-09 | > .999
#> 
#> # Location Parameters
#> 
#> Parameter | Log-Odds |       SE |                   95% CI |        z |      p
#> ------------------------------------------------------------------------------
#> wt        |  4591.35 | 4.28e+07 | [   -8.39e+07, 8.39e+07] | 1.07e-04 | > .999
#> 
#> # Scale Parameters
#> 
#> Parameter | Log-Odds |       SE |                95% CI |         z |      p
#> ----------------------------------------------------------------------------
#> vs        |    -0.40 | 18378.61 | [-36021.82, 36021.02] | -2.16e-05 | > .999
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald z-distribution approximation.
#> 
#> The model has a log- or logit-link. Consider using `exponentiate =
#>   TRUE` to interpret coefficients as ratios.

^{Created on 2023-03-12 with reprex v2.0.2}

strengejacke · 2023-03-12T18:21:33Z

@vincentarelbundock can you check if #729 fixes the marginal effects issues?

DrJerryTAO · 2023-03-12T20:13:25Z

Great. Just a heads-up: in clm() and other discrete choice models, when scale effects are included, which do not measure log-odds, coefficients in location and intercept parameters no longer measure log-odds directly, and exp(coef) are no longer odds ratios. So the following parameter interpretation can be misleading.

Parameter | Log-Odds | SE | 95% CI | z | p

vincentarelbundock · 2023-03-12T21:07:21Z

@strengejacke it looks like it is working for ordinal::clm, but I won't know if it works with lme until I try to add full support for it. For that, I at least need to wait for the functionality to be merged in insight master, so don't wait for a review of those before merging.

Thanks for the updates!

…e clm() & lme() (#729) * find_ and get_ methods with models of multiple formula components like clm() & lme() Fixes #727 * fix lme issue * news, version * fix for clm * fix test

DrJerryTAO · 2023-03-13T11:29:08Z

Hi @vincentarelbundock,
For lme() model objects, find_weights() does find all variable names used in weights = ... and get_data() retrieve all variables as expected. However, because weight = varFunc() can contain multiple terms, it might be better to treat it as a predictor or formula component to be listed in find_variables() instead. Otherwise, it has caused problems like strengejacke/ggeffects#295 in other packages that expect one single character from find_weights().

strengejacke · 2023-03-13T11:31:44Z

Do you have an example where it fails for multiple predictors in find_weights()? From a conceptional perspective, I'm a bit hesitant to include all weight-predictors in the output of find_variables().

DrJerryTAO · 2023-03-13T12:11:37Z

See strengejacke/ggeffects#295 (comment). The problem is in other packages that expect one single character from find_weights().

strengejacke · 2023-03-13T12:26:51Z

what would get_weights() to be expected to return? This here?

library(nlme)

data("Orthodont", package = "nlme")
m <- lme( # a model of variance only
  distance ~ 1, data = Orthodont, # grand mean
  weights = varConstPower(form = ~ age | Sex))

attributes(m$modelStruct$varStruct)$weights
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 
#>       Male       Male       Male       Male       Male       Male       Male 
#> 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 
#>       Male     Female     Female     Female     Female     Female     Female 
#> 0.03039196 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239 
#>     Female     Female     Female     Female     Female     Female     Female 
#> 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 
#>     Female     Female     Female     Female     Female     Female     Female 
#> 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109 
#>     Female     Female     Female     Female     Female     Female     Female 
#> 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866 
#>     Female     Female     Female     Female     Female     Female     Female 
#> 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239 
#>     Female     Female     Female     Female     Female     Female     Female 
#> 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 
#>     Female     Female     Female 
#> 0.07491239 0.07506866 0.07519109

^{Created on 2023-03-13 with reprex v2.0.2}

DrJerryTAO · 2023-03-13T15:39:04Z

what would get_weights() to be expected to return?

I am not entirely sure. It depends on how other functions and packages use get_weights(). I think you are on the right path but what you got are in-sample estimates. But I suspect that it should be coefficients of the variance function to be evaluated with future values of predictors. So, it will be coef(object$modelStruct$varStruct) plus several other $modelStruct$varStruct components. Using the same model

summary(Model <- lme( # a model of variance only
  distance ~ 1, data = Orthodont, # grand mean
  weights = varConstPower(form = ~ age | Sex))) # var = b0 + |age|^b1
"       AIC      BIC    logLik
  514.5723 533.2821 -250.2861

Random effects:
 Formula: ~1 | Subject
        (Intercept) Residual
StdDev:     1.84895 0.109692

Variance function:
 Structure: Constant plus power of variance covariate, different strata
 Formula: ~age | Sex 
 Parameter estimates:
              Male     Female
const 0.0001144912 13.0138472
power 1.3237960288 -0.4748515

Fixed effects:  distance ~ 1 
               Value Std.Error DF  t-value p-value
(Intercept) 23.42394 0.4046777 81 57.88296       0"

The model above in Variance function describes that the weights for the first Subject over four measurements are

1 / ( # same as varWeights
  exp(Model$modelStruct$varStruct[["const"]]['Male']) + 
    Model$data$age[1:4] ^
    Model$modelStruct$varStruct[["power"]]['Male'] )

As confirmed below

class(Model$modelStruct$varStruct) # "varConstPower" "varFunc"
Model$sigma # 0.109692
Model$modelStruct$varStruct # prints exp(const) and power
summary(Model$modelStruct$varStruct) # prints exp(const) and power
str(summary(Model$modelStruct$varStruct)) # many components
attributes(Model$modelStruct$varStruct) # # many components

varWeights(Model$modelStruct$varStruct)[1:4] # sample estimates
"      Male       Male       Male       Male 
0.06375142 0.04744622 0.03727197 0.03039196"
attributes(Model$modelStruct$varStruct)$weights[5:8] # Subject M02, same
"      Male       Male       Male       Male 
0.06375142 0.04744622 0.03727197 0.03039196"
coef( # coef() const different from Parameter estimates in summary() print
  Model$modelStruct$varStruct) 
"  const.Male const.Female   power.Male power.Female 
  -9.0750128    2.5660140    1.3237960   -0.4748515"
Model$modelStruct$varStruct[["const"]] # same as coef($varStruct)
exp(Model$modelStruct$varStruct[["const"]]) # same in summary() print
Model$modelStruct$varStruct[["power"]] # same in summary() print
1 / ( # same as varWeights
  exp(Model$modelStruct$varStruct[["const"]]['Male']) + 
    Model$data$age[1:4] ^
    Model$modelStruct$varStruct[["power"]]['Male'] ) 
"0.06375142 0.04744622 0.03727197 0.03039196"

I am not entirely sure, but I think that the population SE is related to both age and Sex in the way that

For Male, population SE = residual_SE * (0.0001144912 + age^1.3237960288)
For Female, population SE = residual_SE * (13.0138472 + age^1.3237960288)

mean(Model$sigma / varWeights(Model$modelStruct$varStruct)) # 2.132228
summary(Model <- lme(distance ~ 1, data = Orthodont))
Model$sigma # 2.220312 similar to above

Given that age could change in a future prediction and the prediction could be made for either Sex group, the population SE of the Male group at age 7 and 15 (out of sample range) should be calculated as

Model$sigma * (0.0001144912 + 7^1.3237960288) # 1.441837
Model$sigma * (0.0001144912 + 15^1.3237960288) # 3.954406

where Model$sigma / varWeights(Model$modelStruct$varStruct) cannot be used because it is static and does not change with age.

For weights = varIdent(), the coefficient must be interpreted differently. So the evaluation of weights = in prediction needs to be varClass specific.

data("Orthodont", package = "nlme")
summary(Model <- lme( # a model of variance only
  distance ~ 1, data = Orthodont, # grand mean
  weights = varIdent(form = ~ 1 | Sex)))
"Random effects:
 Formula: ~1 | Subject
        (Intercept) Residual
StdDev:    2.016065 2.591358

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Sex 
 Parameter estimates:
     Male    Female 
1.0000000 0.5651101"

Model$sigma # 2.591358
str(summary(Model$modelStruct$varStruct)) # many components
varWeights(Model$modelStruct$varStruct) # sample estimates
"Male 1.000000 Female 1.769567"
1/varWeights(Model$modelStruct$varStruct) # Male has larger SE
"1.0000000 0.5651101 " # matches Parameter estimates above
coef(Model$modelStruct$varStruct) # -0.5707348
exp(coef(Model$modelStruct$varStruct)) # 0.5651101 same as SE ratio

vincentarelbundock · 2023-03-13T17:49:10Z

I had never used these model-fitting functions before, and am I just now learning that weights does something very different in those models.

I do not believe that we should use the same get_weights() function for a different functionality, even if that different functionality shares the same name. get_weights() should behave consistently, and simply not work for models where weights play a different role.

strengejacke · 2023-03-13T18:13:58Z

So what does this mean for get_weights() for this example:

lme( distance ~ 1, data = Orthodont,  weights = varConstPower(form = ~ age | Sex))

Should it return NULL, and only work when weights is a one-sided formula indicating an actual weight-variable in the data?
Should it return a data frame with variables age and Sex? (and their original values from the data frame)
Should it return a vector of weights that is saved as attribute?

vincentarelbundock · 2023-03-13T18:17:53Z

What does the function return for all other models? Just a numeric vector, right? If that's irrelevant, I vote for raising an informative error. But I don't have an especially strong view, except against proposals like "returning coefficients of the variance function", which would be very different from what we have now.

strengejacke · 2023-03-13T18:22:17Z

Just a numeric vector, right?

Yes. that's what usually is returned.

vincentarelbundock · 2023-03-13T18:26:51Z

Right, so I would say return a numeric vector or an error. But again, I can't have a strong view because I don't know if/when that's a relevant data structure.

DrJerryTAO · 2023-03-14T10:38:24Z

Yes. that's what usually is returned.
Right, so I would say return a numeric vector or an error. But again, I can't have a strong view because I don't know if/when that's a relevant data structure.

The current resolution seems the best before some more work is done to extract estimated variance functions from lme() as a regular function like w = function(x, coef){} where coef is estimated in lme(). I tried but couldn't figure out how to form a function w. There are multiple different varFunc classes and each seems to behave differently. Once done, it would be possible to calculate SE for prediction intervals by making an additive term to fixed-effect SE. Or customize predict.lm(..., weights = 1/w) when generating prediction intervals, because predict.lm() does allow weights to be a user-supplied function of variables supplied in newdata.

Should it return NULL, and only work when weights is a one-sided formula indicating an actual weight-variable in the data?

No. When weights is a one-sided formula, it is still a varFunc class, and the variable could be assessed very differently depending on the specific varFunc used, as shown https://fw8051statistics4ecologists.netlify.app/gls.html#variance-increasing-with-x_i

Should it return a data frame with variables age and Sex? (and their original values from the data frame)

Not sure. This is stored in the attribute() though.

Should it return a vector of weights that is saved as attribute?

Seems best. Not sure how other functions and packages use the weights. In WLS and GLS models, weight are mostly useful to get the correct vcov estimates. But since vcov() can easily extract the already estimated vcov, I don't know if weights can be useful again. But perhaps when making in-sample prediction intervals, where sigma/weights of lme() and gls() can be used to adjust SE and CI over the sample ranges. But it won't work correctly (or as intended) for newdata where weights-variables are given but actual weights cannot be evaluated to find the prediction variance.

This was referenced Mar 10, 2023

Incorrect standard errors in predictions() of clm() with scale effects vincentarelbundock/marginaleffects#718

Closed

Support: lme vincentarelbundock/marginaleffects#99

Closed

strengejacke added the Bug 🐛 Something isn't working label Mar 12, 2023

strengejacke self-assigned this Mar 12, 2023

strengejacke added a commit that referenced this issue Mar 12, 2023

find_ and get_ methods with models of multiple formula components lik…

19ddd2f

…e clm() & lme() Fixes #727

strengejacke mentioned this issue Mar 12, 2023

find_ and get_ methods with models of multiple formula components like clm() & lme() #729

Merged

strengejacke mentioned this issue Mar 12, 2023

parameter interpretation can be misleading for clm easystats/parameters#855

Closed

strengejacke closed this as completed in #729 Mar 12, 2023

florentdery mentioned this issue Mar 24, 2023

Irregularities in ggpredict() with clm object ? strengejacke/ggeffects#298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

find_ and get_ methods with models of multiple formula components like clm() & lme() #727

find_ and get_ methods with models of multiple formula components like clm() & lme() #727

DrJerryTAO commented Mar 10, 2023 •

edited

Loading

strengejacke commented Mar 12, 2023

vincentarelbundock commented Mar 12, 2023

strengejacke commented Mar 12, 2023

strengejacke commented Mar 12, 2023

strengejacke commented Mar 12, 2023

DrJerryTAO commented Mar 12, 2023 •

edited

Loading

vincentarelbundock commented Mar 12, 2023

DrJerryTAO commented Mar 13, 2023

strengejacke commented Mar 13, 2023

DrJerryTAO commented Mar 13, 2023

strengejacke commented Mar 13, 2023

DrJerryTAO commented Mar 13, 2023 •

edited

Loading

vincentarelbundock commented Mar 13, 2023

strengejacke commented Mar 13, 2023

vincentarelbundock commented Mar 13, 2023

strengejacke commented Mar 13, 2023

vincentarelbundock commented Mar 13, 2023

DrJerryTAO commented Mar 14, 2023

find_ and get_ methods with models of multiple formula components like clm() & lme() #727

find_ and get_ methods with models of multiple formula components like clm() & lme() #727

Comments

DrJerryTAO commented Mar 10, 2023 • edited Loading

strengejacke commented Mar 12, 2023

vincentarelbundock commented Mar 12, 2023

strengejacke commented Mar 12, 2023

strengejacke commented Mar 12, 2023

strengejacke commented Mar 12, 2023

DrJerryTAO commented Mar 12, 2023 • edited Loading

vincentarelbundock commented Mar 12, 2023

DrJerryTAO commented Mar 13, 2023

strengejacke commented Mar 13, 2023

DrJerryTAO commented Mar 13, 2023

strengejacke commented Mar 13, 2023

DrJerryTAO commented Mar 13, 2023 • edited Loading

vincentarelbundock commented Mar 13, 2023

strengejacke commented Mar 13, 2023

vincentarelbundock commented Mar 13, 2023

strengejacke commented Mar 13, 2023

vincentarelbundock commented Mar 13, 2023

DrJerryTAO commented Mar 14, 2023

DrJerryTAO commented Mar 10, 2023 •

edited

Loading

DrJerryTAO commented Mar 12, 2023 •

edited

Loading

DrJerryTAO commented Mar 13, 2023 •

edited

Loading