Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAs and 1s in ancombc2 primary result table #266

Open
lmartinezgili opened this issue May 28, 2024 · 6 comments
Open

NAs and 1s in ancombc2 primary result table #266

lmartinezgili opened this issue May 28, 2024 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@lmartinezgili
Copy link

Dear Huang,

Thank you for the ANCOMBC package.

When I run ancombc2 - with fixed and random effects (in R 4.4.0) - I noticed that for some of the features all the "lfc_" columns appear as NA and the "p_" and "q_" columns as 1.

In addition, I get a warning message:
"In rbind(c((Intercept) = 83.9999999980089, age = 83.9999999980259, :
number of columns of result is not a multiple of vector length (arg 18)"

This happened independent of taxa level tested (I tried genus, family and order). I could not find any posted issue describing that, and I cannot see any particular problem with the features that appear as NA. I used only default settings.

Do you know what could be the issue?

Thank you

@Maggie8888
Copy link
Collaborator

It is possible your mix-effect model did not fit your data well. You can try to change the fixed effect variables and random effect variable in your model.

@lmartinezgili
Copy link
Author

I have removed the random effect to see if it was related to that and used a different dataset and still get NA for some taxa when using ancombc2.

What would you suggest is the potential problem?

@Maggie8888
Copy link
Collaborator

Could you please specify your model and provide some summary statistics of the variables you included in your model?

@Maggie8888 Maggie8888 added the help wanted Extra attention is needed label Sep 27, 2024
@simone-anza
Copy link

Hi @Maggie8888 ,

I got the same error as well. Unfortunately, I can't reproduce the errror because I work with sensitive data. However, here is my formula

out = ancombc2(data = physeq,
              fix_formula = "prenatal_SD * breast_milk_cat + 
                         delivery+ timepoint+
                         Child_sex",
              rand_formula = "(1|dyad_id)",
              p_adj_method = "BH",
              prv_cut = 0.1,
              lib_cut = 0, 
              group = "timepoint",
              struc_zero = T,
              neg_lb = T,
              em_control = list(tol = 1e-05, max_iter = 100),
              alpha = 0.05,
              global = F)

I did some testing and it is not related to structural zeros andd neg_lb parameters or grouping var, it seems associated with a specific vairabile, in my case breast_milk_cat , my best guessing is that there is not enough variability in my breast_milk_cat and that causes some convergence problem perhaps. When I remove it as in the interaction and code it is as a main variable all goes smoothly

@kvaher
Copy link

kvaher commented Dec 5, 2024

Hi @Maggie8888,
I am having similar issues as described above. When including more than two covariates in the fix_formula in addition to my main variable of interest (qchat_score_corrected), I observe NAs in lfc and se columns, and values of 1.00 in p and q columns.
This is the formula I am using, all variables in the fix_formula are continuous:

output <- ancombc2(data=physeq_filtered,
                             assay_name = "otu_table",
                             fix_formula = "qchat_score_corrected + GA_birth + PMA_sample_w + SIMD_perinatal + bmi_at_booking + maternal_age + birthweight_z_score",
                             struc_zero = FALSE, # setting structural_zero as FALSE because we are not doing a group comparison
                             group = NULL,
                             prv_cut = 0, # we have species already filtered so don't need to add another filtering here
                             lib_cut = 0, # do not filter out samples based on library size (this is also the default)
                             p_adj_method = "BH", # benjamini-hochberg method so it's the same as in other differential abundance methods we are using (Maaslin2)
                             alpha = 0.25, # set significance threshold 0.25 to be in line with Maaslin2
                             pseudo_sens = TRUE
                             )

When fix_formula = "qchat_score_corrected + GA_birth + PMA_sample_w" everything seems to be running smoothly.

Any advice is highly appreciated, thank you!

@aimirza
Copy link

aimirza commented Dec 10, 2024

I resolved the issue! Through simulated data, I discovered that random effect models are incompatible with the ANCOMBC2 global test (see #262 ). Additionally, the global test can fail if the model includes too many variables relative to the number of observations—each category within a variable is treated as a separate parameter. In my case, removing a single continuous variable resolved the issue.

I suspect the failure occurred because the variance-covariance matrix (vcov_hat) becomes too complex or ill-conditioned when the model is overparameterized. This matrix is critical for the global test, and if it's singular or poorly estimated, the test cannot compute valid statistics AKA fails. Simplifying the model by removing non-critical variables or categories can prevent this issue.

Global tests in general are inherently more sensitive to overparameterization because they depend on the joint behavior of all model parameters, while non-global tests evaluate parameters independently. That explains why pairwise comparisons and Dunn's-like test did not fail whereas on the global test failed, in my case at least. In general, it has been long suggested to use the "10 events per variable" rule otherwise models with fewer than 10 observations per variable tend to overfit and produce unstable estimates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants