Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue generating adjusted predictions with wbm-models #622

Open
Tan2525 opened this issue Dec 17, 2024 · 2 comments
Open

Issue generating adjusted predictions with wbm-models #622

Tan2525 opened this issue Dec 17, 2024 · 2 comments

Comments

@Tan2525
Copy link

Tan2525 commented Dec 17, 2024

I'm facing some issue generating adjusted predictions with wbm models. My intention is to generate an interaction plot with ggpredict. However, it keeps issuing the following error:

Error in complete.cases(data[[variable]]) : 
  no input has determined the number of cases
In addition: Warning message:
In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded

I'm unable to resolve this error. May I know if the ggeffects package is compatible with wbm panel models?

I've included a replication dataset here: dataset__wide.csv.
Below is the code to prepare the dataset and run the model.

# Read csv.
dataset__wide <- read.csv(file = "dataset__wide.csv")

# Pivot data to long.
dataset__long <- dataset__wide %>%
  tidyr::pivot_longer(
    # Exclude the time-invariant variables
    !c(
      ID, 
      Control,
    ),
    names_to = "Variables", values_to = "Values"
  ) %>%
  dplyr::mutate(
    
    # Create a variable keeping track of the waves.
    Wave = case_when(
      str_detect(string = Variables, pattern = "t1") ~ 0,
      str_detect(string = Variables, pattern = "t2") ~ 1,
      TRUE ~ NA_real_
    ),

    # Create a variable to standardize the variable names.
    Variable = case_when(
      !(is.na(Variables)) ~ str_replace_all(string = Variables, pattern = "(_+((t1)|(t2)))", replacement = ""),
      TRUE ~ Variables
    ),
    
  ) %>% 
  dplyr::select(
    !Variables
  ) %>%
  tidyr::pivot_wider(names_from = Variable, values_from = "Values", values_fill = NA_real_)

# Create a panel data frame. 
dataset__long__panel <- panel_data(data = dataset__long, id = ID, wave = Wave)

# Fit the panel model
panel_model <- wbm(
  formula = DV ~ IV + M | Control | IV*M, 
  data = dataset__long__panel,
  family = binomial(link = "logit"),
  use.wave = TRUE,
  wave.factor = TRUE,
  weights = Weights,
  scale = TRUE,
  model = "between",
  control = glmerControl(optimizer = "bobyqa")
)

# Compute adjusted predictions
ggpredict(panel_model, terms = c("IV", "M"), bias_correction = TRUE) %>% plot()
@strengejacke
Copy link
Owner

This could be an issue in panelr's predict() method. Maybe @jacob-long can help finding out whether this problem is related to predict(), or the ggeffects package?

FWIW, ggemmeans() works.

dataset__wide <- read.csv(file = "~/../Downloads/dataset__wide.csv")
library(panelr)
# Pivot data to long.
dataset__long <- dataset__wide |>
  tidyr::pivot_longer(
    # Exclude the time-invariant variables
    !c(
      ID, 
      Control,
    ),
    names_to = "Variables", values_to = "Values"
  ) |>
  dplyr::mutate(
    
    # Create a variable keeping track of the waves.
    Wave = dplyr::case_when(
      stringr::str_detect(string = Variables, pattern = "t1") ~ 0,
      stringr::str_detect(string = Variables, pattern = "t2") ~ 1,
      TRUE ~ NA_real_
    ),

    # Create a variable to standardize the variable names.
    Variable = dplyr::case_when(
      !(is.na(Variables)) ~ stringr::str_replace_all(string = Variables, pattern = "(_+((t1)|(t2)))", replacement = ""),
      TRUE ~ Variables
    ),
    
  ) |> 
  dplyr::select(
    !Variables
  ) |>
  tidyr::pivot_wider(names_from = Variable, values_from = "Values", values_fill = NA_real_)

# Create a panel data frame. 
dataset__long__panel <- panel_data(data = dataset__long, id = ID, wave = Wave)

# Fit the panel model
panel_model <- wbm(
  formula = DV ~ IV + M | Control | IV*M, 
  data = dataset__long__panel,
  family = binomial(link = "logit"),
  use.wave = TRUE,
  wave.factor = TRUE,
  weights = Weights,
  scale = TRUE,
  model = "between",
  control = glmerControl(optimizer = "bobyqa")
)

d <- expand.grid(lapply(dataset__long__panel[c("IV", "M")], unique))
predict(panel_model, newdata = d)
#> Error in complete.cases(data[[variable]]): no input has determined the number of cases

d <- ggeffects::data_grid(panel_model, c("IV", "M"))
predict(panel_model, newdata = d)
#> Unordered factor wave variable was converted to ordered. You should check
#> that the order is correct.
#> Error in complete.cases(data[[variable]]): no input has determined the number of cases

Created on 2024-12-17 with reprex v2.1.1

@Tan2525
Copy link
Author

Tan2525 commented Dec 18, 2024

Thanks @strengejacke for the suggestion to use ggemmeans(). It successfully generated the adjusted predictions and using these, I was able to generate an interaction plot.
Rplot.

# Compute estimated predictions, with margin = "marginalmeans".
int_dat <- predict_response(
  model = panel_model,
  terms = c("IV", "M"),
  margin = "marginalmeans"
)

# Generate the interaction plot. 
ggplot(data = int_dat) + 
  geom_line(aes(x = x, y = predicted, colour = group))

You can consider my initial issue (of generating adjusted predictions) solved. However, to shed more light on the earlier problem, I was interested in the methodology applied by this paper: https://doi.org/10.1177/1940161224129270. The author(s) have (graciously) provided the replication dataset [(.Rdata)] (https://drive.google.com/file/d/1a-OPCFA3N0ZC8MNvtA2yp-AzVyXCrV75/view?usp=sharing) and code to generate the interaction plot online. I have extracted the relevant portions of the code below. I find it odd that ggpredict successfully ran for their model/dataset but encountered issues with my case, eventhough the model type is the same (wbm).

####Code for models of trust as a moderator####
library(panelr)
library(ggeffects)
library(ggplot2)

# Load the replication dataset. 
load(file = "replication.RData")

####main models reported in the paper for H3####
mtrust1 <- wbm(aff_pol_PT3 ~ partisan_right2_log + partisan_left2_log + no_partisan2_log + POL_INTEREST| 
                 partisan_right2_log*TRUST_GENERAL + partisan_left2_log*TRUST_GENERAL + no_partisan2_log*TRUST_GENERAL + 
                 GENDER + EDU + AGE + christians + race_rc, use.wave = TRUE, wave.factor = TRUE, data=replication)

ggpredict(mtrust1, terms = c("partisan_right2_log", "TRUST_GENERAL[1,3,5]")) %>% plot() + ylim(-4, 4) + 
  labs(y = "Social polarization", 
       x = "Frequency of use of right-leaning news sources",
       title = "Impact of trust and right-leaning news consumption on social polarization", 
       color = "Trust in news")  +
  scale_fill_manual(values = c("#F8766D", "#619CFF", "#00BA38")) +
  scale_color_manual(
    values = c("#F8766D", "#619CFF", "#00BA38"),
    labels = c("Do not trust at all", "Neither", "Trust completely"))

Anyways, just wanted to share that this package is really valuable to academic research and I would like to express my sincerest thanks to you (and the other maintainers) for all your efforts on this amazing package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants