Skip to content

Commit

Permalink
test radEmu
Browse files Browse the repository at this point in the history
  • Loading branch information
hansvancalster committed Dec 2, 2024
1 parent aede881 commit 598af28
Showing 1 changed file with 24 additions and 3 deletions.
27 changes: 24 additions & 3 deletions source/rmarkdown/compositional_analysis/test_rademu.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -168,12 +168,16 @@ Identify taxa for which you want robust score tests:

For instance, which taxa at depth 0 - 10 are more likely to occur or not occur in Natuurgrasland compared to the reference category Akker:

The following code takes a long time to run.
The following code chunk takes a very long time to run and is therefore not evaluated.
But it can be run in parallel: https://statdivlab.github.io/radEmu/articles/parallel_radEmu.html

A trial for one taxon took >30 minutes (did not wait to finish) to calculate the score test.
This would need to be multiplied by the number of taxa of interest to get the time to run the chunk below.


```{r}


```{r eval=FALSE}
coefselection <- coeftab |>
filter(
covariate == "Landgebruik_MBAGNatuurgrasland",
Expand All @@ -191,7 +195,7 @@ m_refit <- emuFit(
Y = physeq_Olig01_Annelida_genus,
test_kj = data.frame(
k = covariate_to_test,
j = taxa_to_test),
j = taxa_to_test[1]),
fitted_model = m_fit,
refit = FALSE,
run_score_tests = TRUE
Expand Down Expand Up @@ -227,5 +231,22 @@ as_tibble(m_refit$coef) |>



More details of radEmu approach:

- ‘we introduce a Firth penalty on β derived from a formal equivalence between our model and the multinomial logistic model’ (Clausen en Willis, 2024, p. 6)

- ‘we first introduce a Poisson log likelihood for our model.’ (Clausen en Willis, 2024, p. 6)

- ‘Note that the profile likelihood (8) is equal to a multinomial log likelihood with a logistic link (up to a constant). This accords with both known results about the marginal Poisson distribution of multinomial random variables (Birch, 1963), and our observations on identifiability of β (only p × (J − 1) parameters can be identified in a multinomial logistic regression of a J-dimensional outcome on p regressors)’ (Clausen en Willis, 2024, p. 6)

- they show that the model is robust against model-misspecification: simulated data generated under a zero-inflated negative binomial (instead of poisson) still had favourable type I error
- but a drawback is that you cannot rely on the confidence intervals to judge significance: the score test needs to be computed. In case of modelmisspecification, I think the confidence intervals will likely be too narrow (given that data are usually overdispersed compared to the Poisson)

- compared to `sccomp` this is a different approach:
- focusses on compositional nature of the data
- independent beta-binomial models under constraint that sum of probabilities across taxa equals 1
- logit-based fold differences
- compared to the multinomial logistic model, sccomp can cope directly with excess variance through the beta-binomial (instead of relying on 'robust' procedures to make inferences). This is important for this type of data.
- see Table 1 of [Mangiola et al 2023](https://doi.org/10.1073/pnas.2203828120) where the radEmu approach can be fit in (most similar to ANCOM-BC2 according to Clausen en Willis 2024 paper).


0 comments on commit 598af28

Please sign in to comment.