-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find_ and get_ methods with models of multiple formula components like clm() & lme() #727
Comments
@vincentarelbundock I guess it would be good to have this issue resolved before the next CRAN submission? |
Yes, would be great, but I'm not sure I'll have much time in the coming weeks, unfortunately. |
library(insight)
library(nlme)
data(Orthodont)
Orthodont$w <- abs(rnorm(nrow(Orthodont)))
m1 <- lme(
distance ~ age, data = Orthodont,
random = ~ 1 | Subject,
weights = varIdent(form = ~ 1 | Sex))
m2 <- lme(
distance ~ age, data = Orthodont,
random = ~ 1 | Subject)
m3 <- lme(
distance ~ age,
data = Orthodont,
random = ~ 1 | Subject,
weights = ~w
)
find_weights(m1)
#> [1] "Sex"
find_weights(m2)
#> NULL
find_weights(m3)
#> [1] "w" Created on 2023-03-12 with reprex v2.0.2 |
I think the clm-issues are also resolved: library(insight)
library(ordinal)
data(mtcars)
m2 <- suppressWarnings(clm( # nominal + scale effects
cyl ~ wt,
scale = ~vs, nominal = ~hp,
data = transform(mtcars, cyl = factor(cyl))
))
find_formula(m2)
#> $conditional
#> cyl ~ wt
#>
#> $scale
#> ~vs
#>
#> $nominal
#> ~hp
#>
#> attr(,"class")
#> [1] "insight_formula" "list"
find_variables(m2)
#> $response
#> [1] "cyl"
#>
#> $conditional
#> [1] "wt"
#>
#> $scale
#> [1] "vs"
#>
#> $nominal
#> [1] "hp"
head(get_data(m2))
#> cyl wt vs hp
#> Mazda RX4 6 2.620 0 110
#> Mazda RX4 Wag 6 2.875 0 110
#> Datsun 710 4 2.320 1 93
#> Hornet 4 Drive 6 3.215 1 110
#> Hornet Sportabout 8 3.440 0 175
#> Valiant 6 3.460 1 105
find_predicors(m2)
#> Error in find_predicors(m2): could not find function "find_predicors"
head(get_predictors(m2))
#> wt vs hp
#> Mazda RX4 2.620 0 110
#> Mazda RX4 Wag 2.875 0 110
#> Datsun 710 2.320 1 93
#> Hornet 4 Drive 3.215 1 110
#> Hornet Sportabout 3.440 0 175
#> Valiant 3.460 1 105
get_parameters(m2)
#> Parameter Estimate
#> 1 4|6.(Intercept) 96990.6137075
#> 2 6|8.(Intercept) 16987.7165352
#> 3 4|6.hp -772.5798577
#> 4 6|8.hp -9.4092060
#> 5 wt 4591.3466175
#> 6 vs -0.3962198
parameters::model_parameters(m2, effects = "all", components = "all")
#> # Intercept
#>
#> Parameter | Log-Odds | SE | 95% CI | z | p
#> -----------------------------------------------------------------------------------------
#> 4|6 (Intercept) | 96990.61 | 8.97e+08 | [ -1.76e+09, 1.76e+09] | 1.08e-04 | > .999
#> 6|8 (Intercept) | 16987.72 | 4.25e+11 | [ -8.33e+11, 8.33e+11] | 4.00e-08 | > .999
#> 4|6 hp | -772.58 | 7.14e+06 | [ -1.40e+07, 1.40e+07] | -1.08e-04 | > .999
#> 6|8 hp | -9.41 | 1.76e+09 | [ -3.44e+09, 3.44e+09] | -5.35e-09 | > .999
#>
#> # Location Parameters
#>
#> Parameter | Log-Odds | SE | 95% CI | z | p
#> ------------------------------------------------------------------------------
#> wt | 4591.35 | 4.28e+07 | [ -8.39e+07, 8.39e+07] | 1.07e-04 | > .999
#>
#> # Scale Parameters
#>
#> Parameter | Log-Odds | SE | 95% CI | z | p
#> ----------------------------------------------------------------------------
#> vs | -0.40 | 18378.61 | [-36021.82, 36021.02] | -2.16e-05 | > .999
#>
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#> using a Wald z-distribution approximation.
#>
#> The model has a log- or logit-link. Consider using `exponentiate =
#> TRUE` to interpret coefficients as ratios. Created on 2023-03-12 with reprex v2.0.2 |
@vincentarelbundock can you check if #729 fixes the marginal effects issues? |
Great. Just a heads-up: in clm() and other discrete choice models, when scale effects are included, which do not measure log-odds, coefficients in location and intercept parameters no longer measure log-odds directly, and exp(coef) are no longer odds ratios. So the following parameter interpretation can be misleading.
|
@strengejacke it looks like it is working for Thanks for the updates! |
Hi @vincentarelbundock, |
Do you have an example where it fails for multiple predictors in |
See strengejacke/ggeffects#295 (comment). The problem is in other packages that expect one single character from find_weights(). |
what would library(nlme)
data("Orthodont", package = "nlme")
m <- lme( # a model of variance only
distance ~ 1, data = Orthodont, # grand mean
weights = varConstPower(form = ~ age | Sex))
attributes(m$modelStruct$varStruct)$weights
#> Male Male Male Male Male Male Male
#> 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197
#> Male Male Male Male Male Male Male
#> 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622
#> Male Male Male Male Male Male Male
#> 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142
#> Male Male Male Male Male Male Male
#> 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196
#> Male Male Male Male Male Male Male
#> 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197
#> Male Male Male Male Male Male Male
#> 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622
#> Male Male Male Male Male Male Male
#> 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142
#> Male Male Male Male Male Male Male
#> 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197 0.03039196
#> Male Male Male Male Male Male Male
#> 0.06375142 0.04744622 0.03727197 0.03039196 0.06375142 0.04744622 0.03727197
#> Male Female Female Female Female Female Female
#> 0.03039196 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239
#> Female Female Female Female Female Female Female
#> 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279
#> Female Female Female Female Female Female Female
#> 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109
#> Female Female Female Female Female Female Female
#> 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866
#> Female Female Female Female Female Female Female
#> 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279 0.07491239
#> Female Female Female Female Female Female Female
#> 0.07506866 0.07519109 0.07470279 0.07491239 0.07506866 0.07519109 0.07470279
#> Female Female Female
#> 0.07491239 0.07506866 0.07519109 Created on 2023-03-13 with reprex v2.0.2 |
I am not entirely sure. It depends on how other functions and packages use get_weights(). I think you are on the right path but what you got are in-sample estimates. But I suspect that it should be coefficients of the variance function to be evaluated with future values of predictors. So, it will be summary(Model <- lme( # a model of variance only
distance ~ 1, data = Orthodont, # grand mean
weights = varConstPower(form = ~ age | Sex))) # var = b0 + |age|^b1
" AIC BIC logLik
514.5723 533.2821 -250.2861
Random effects:
Formula: ~1 | Subject
(Intercept) Residual
StdDev: 1.84895 0.109692
Variance function:
Structure: Constant plus power of variance covariate, different strata
Formula: ~age | Sex
Parameter estimates:
Male Female
const 0.0001144912 13.0138472
power 1.3237960288 -0.4748515
Fixed effects: distance ~ 1
Value Std.Error DF t-value p-value
(Intercept) 23.42394 0.4046777 81 57.88296 0" The model above in
As confirmed below class(Model$modelStruct$varStruct) # "varConstPower" "varFunc"
Model$sigma # 0.109692
Model$modelStruct$varStruct # prints exp(const) and power
summary(Model$modelStruct$varStruct) # prints exp(const) and power
str(summary(Model$modelStruct$varStruct)) # many components
attributes(Model$modelStruct$varStruct) # # many components
varWeights(Model$modelStruct$varStruct)[1:4] # sample estimates
" Male Male Male Male
0.06375142 0.04744622 0.03727197 0.03039196"
attributes(Model$modelStruct$varStruct)$weights[5:8] # Subject M02, same
" Male Male Male Male
0.06375142 0.04744622 0.03727197 0.03039196"
coef( # coef() const different from Parameter estimates in summary() print
Model$modelStruct$varStruct)
" const.Male const.Female power.Male power.Female
-9.0750128 2.5660140 1.3237960 -0.4748515"
Model$modelStruct$varStruct[["const"]] # same as coef($varStruct)
exp(Model$modelStruct$varStruct[["const"]]) # same in summary() print
Model$modelStruct$varStruct[["power"]] # same in summary() print
1 / ( # same as varWeights
exp(Model$modelStruct$varStruct[["const"]]['Male']) +
Model$data$age[1:4] ^
Model$modelStruct$varStruct[["power"]]['Male'] )
"0.06375142 0.04744622 0.03727197 0.03039196" I am not entirely sure, but I think that the population SE is related to both age and Sex in the way that
Given that age could change in a future prediction and the prediction could be made for either Sex group, the population SE of the Male group at age 7 and 15 (out of sample range) should be calculated as
where For weights = varIdent(), the coefficient must be interpreted differently. So the evaluation of weights = in prediction needs to be varClass specific. data("Orthodont", package = "nlme")
summary(Model <- lme( # a model of variance only
distance ~ 1, data = Orthodont, # grand mean
weights = varIdent(form = ~ 1 | Sex)))
"Random effects:
Formula: ~1 | Subject
(Intercept) Residual
StdDev: 2.016065 2.591358
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Sex
Parameter estimates:
Male Female
1.0000000 0.5651101"
Model$sigma # 2.591358
str(summary(Model$modelStruct$varStruct)) # many components
varWeights(Model$modelStruct$varStruct) # sample estimates
"Male 1.000000 Female 1.769567"
1/varWeights(Model$modelStruct$varStruct) # Male has larger SE
"1.0000000 0.5651101 " # matches Parameter estimates above
coef(Model$modelStruct$varStruct) # -0.5707348
exp(coef(Model$modelStruct$varStruct)) # 0.5651101 same as SE ratio |
I had never used these model-fitting functions before, and am I just now learning that I do not believe that we should use the same |
So what does this mean for lme( distance ~ 1, data = Orthodont, weights = varConstPower(form = ~ age | Sex))
|
What does the function return for all other models? Just a numeric vector, right? If that's irrelevant, I vote for raising an informative error. But I don't have an especially strong view, except against proposals like "returning coefficients of the variance function", which would be very different from what we have now. |
Yes. that's what usually is returned. |
Right, so I would say return a numeric vector or an error. But again, I can't have a strong view because I don't know if/when that's a relevant data structure. |
The current resolution seems the best before some more work is done to extract estimated variance functions from lme() as a regular function like w = function(x, coef){} where coef is estimated in lme(). I tried but couldn't figure out how to form a function w. There are multiple different varFunc classes and each seems to behave differently. Once done, it would be possible to calculate SE for prediction intervals by making an additive term to fixed-effect SE. Or customize predict.lm(..., weights = 1/w) when generating prediction intervals, because predict.lm() does allow weights to be a user-supplied function of variables supplied in newdata.
No. When weights is a one-sided formula, it is still a varFunc class, and the variable could be assessed very differently depending on the specific varFunc used, as shown https://fw8051statistics4ecologists.netlify.app/gls.html#variance-increasing-with-x_i
Not sure. This is stored in the attribute() though.
Seems best. Not sure how other functions and packages use the weights. In WLS and GLS models, weight are mostly useful to get the correct vcov estimates. But since vcov() can easily extract the already estimated vcov, I don't know if weights can be useful again. But perhaps when making in-sample prediction intervals, where sigma/weights of lme() and gls() can be used to adjust SE and CI over the sample ranges. But it won't work correctly (or as intended) for newdata where weights-variables are given but actual weights cannot be evaluated to find the prediction variance. |
Hi @strengejacke, I spotted some leaving-out-variable issues with find_...() and get_...() in certain model objects with multiple formula components, especially in the following functions
which causes issues in other packages that depend one {insight}.
With nlme::lme(), variables in the weight = varFunc() are not retained. The resolution may require an update to the insight:::find_weights.default() method or adding a new method to find_weights(); or rather treat variables in weight = varFunc() as predictors and update the insight:::find_predictors.default() method or adding a new method to find_predictors().
I think the weights argument in lme() may behave very differently from those in other models, as it is not a numeric vector by which all variables are divided nor an integer vector of observation counts but a function modeling the heteroscedasticity. This is why I suspect that a good solution is to get the variable names through find_predictors() instead of find_weights(). The lme() document shows that weights is
With ordinal::clm(), variables in the scale = ~ ... and nominal = ~ ... formulae are not retained. The resolution may require an update to the insight:::find_predictors.default() method or adding a new method to find_predictors().
The text was updated successfully, but these errors were encountered: