Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are signifcane asterisks reliable when using robust linear models (rlm) in combination with estimate plots (plot_model)? #942

Open
eimichae85 opened this issue Jun 21, 2024 · 3 comments

Comments

@eimichae85
Copy link

Hello,
due to some outliers I have to work with robust linear models (using the rlm() function, MASS package). I would then like to plot my model results with the plot_model function (package: sjPlot) as I find it a neat way to visualize results.

The summary of the rlm()-based model only provides t-values but not p-values (or significance asterisks). However, when I subject the rlm()-model to the plot_model function, plot_model() somehow provides significance asterisks.

  1. How is this possible?
  2. Are these significance levels (asterisks indicating p-value ranges) reliable? if so why would I not get them in the summary output of the rlm() model?

Thank you very much for your feedback
Michael

Below is an example code and the data to run it

`library(sjPlot)
library(MASS)

#DATA
tab<-structure(list(Site = structure(c(2L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
1L), levels = c("Alpthal", "Birmensdorf"), class = "factor", contrasts = "contr.sum"),
Treatment = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), levels = c("control", "4000_100", "4000_30", "4000_50"
), class = "factor"), C50 = c(490139.1569, 210684.3876, 177220.7609,
144855.0927, 143589.8017, 346457.7683, 208220.2951, 763146.691,
337140.0098, 473545.559, 133118.6341, 196323.6835, 324372.0377,
660669.6405, 221500.1217, 130437.8918, 228093.4435, 64947.9752,
98456.18456, 108942.3415, 258502.6415, 38607.72435, 60942.19915,
276165.4695, 193098.7689, 215195.3215, 175787.9858, 200215.9341,
80513.00079, 120383.9266, 127098.0298, 0, 36507.1922, 63337.42105,
119012.168, 136744.0301, 243138.5785, 237136.1503, 143608.6254,
196767.0193, 72837.33976, 37430.74448, 180981.6339, 21085.30347,
62652.19874, 430516.4452, 101975.2863, 286981.5731, 174188.796,
148825.2492, 376042.6975, 0, 361023.8245, 73044.75764, 317343.18,
155962.495)), row.names = c(NA, -56L), class = "data.frame")

##Define the factors
tab$Site<-as.factor(tab$Site)
tab$Treatment<-as.factor(tab$Treatment)

contrasts(tab$Site) <- "contr.sum"
contrasts(tab$Treatment)<- "contr.sum"

tab$Treatment <- relevel(tab$Treatment, ref="control") #only use for graphics, NOT statistics

modRob<-rlm(C50~Treatment*Site,data=tab,psi=psi.huber,method="M",maxit=50)
summary(modRob)

plot_model(modRob, vline.color = "red", colors = c("black","black"),
sort.est=F,show.values = TRUE,
value.offset = .45,value.size = 6,title = "",
dot.size=5.3, line.size =0.8 ,
transform = NULL, order.terms = c(2,3,1,4,6,7,5), auto.label = FALSE)
`

@strengejacke
Copy link
Owner

See discussion here:
https://stat.ethz.ch/pipermail/r-help/2006-July/108659.html

I'd say, as long as you're transparent about the approximation method (and therefore, you could try parameters::model_parameters() and look at the output, which should indicate the approximation method), it's ok to compute them. Compare to mixed models, where you have many different methods (wald, satterthwaite, kenward-rogers, ...), none of which one would say it's perfectly accurate - they're all approximations.

@eimichae85
Copy link
Author

Thanks a lot for your swift reply.
I was not aware of the possibility to use parameters::model_parameters(model)
It works perfectly and gives me indeed p-values.
Also the approximation method is mentioned: "p-values (two-tailed) computed using a Wald t-distribution
approximation."

I will mention this assuming that this is also what the plot_model() output displays.

Cheers!

@strengejacke
Copy link
Owner

Yes, plot_model() uses the parameters package to compute the summary tables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants