test radEmu

inbo · Dec 2, 2024 · 598af28 · 598af28
1 parent aede881
commit 598af28
Showing 1 changed file with 24 additions and 3 deletions.
diff --git a/source/rmarkdown/compositional_analysis/test_rademu.Rmd b/source/rmarkdown/compositional_analysis/test_rademu.Rmd
@@ -168,12 +168,16 @@ Identify taxa for which you want robust score tests:
 
 For instance, which taxa at depth 0 - 10 are more likely to occur or not occur in Natuurgrasland compared to the reference category Akker:
 
-The following code takes a long time to run.
+The following code chunk takes a very long time to run and is therefore not evaluated.
 But it can be run in parallel: https://statdivlab.github.io/radEmu/articles/parallel_radEmu.html
 
+A trial for one taxon took >30 minutes (did not wait to finish) to calculate the score test.
+This would need to be multiplied by the number of taxa of interest to get the time to run the chunk below.
 
 
-```{r}
+
+
+```{r eval=FALSE}
 coefselection <- coeftab |> 
   filter(
     covariate == "Landgebruik_MBAGNatuurgrasland",
@@ -191,7 +195,7 @@ m_refit <- emuFit(
   Y = physeq_Olig01_Annelida_genus,
   test_kj = data.frame(
     k = covariate_to_test, 
-    j = taxa_to_test),
+    j = taxa_to_test[1]),
   fitted_model = m_fit,
   refit = FALSE,
   run_score_tests = TRUE
@@ -227,5 +231,22 @@ as_tibble(m_refit$coef) |>
 
 
 
+More details of radEmu approach:
+
+- ‘we introduce a Firth penalty on β derived from a formal equivalence between our model and the multinomial logistic model’ (Clausen en Willis, 2024, p. 6)
+
+- ‘we first introduce a Poisson log likelihood for our model.’ (Clausen en Willis, 2024, p. 6)
+
+- ‘Note that the profile likelihood (8) is equal to a multinomial log likelihood with a logistic link (up to a constant). This accords with both known results about the marginal Poisson distribution of multinomial random variables (Birch, 1963), and our observations on identifiability of β (only p × (J − 1) parameters can be identified in a multinomial logistic regression of a J-dimensional outcome on p regressors)’ (Clausen en Willis, 2024, p. 6)
+
+- they show that the model is robust against model-misspecification: simulated data generated under a zero-inflated negative binomial (instead of poisson) still had favourable type I error
+    - but a drawback is that you cannot rely on the confidence intervals to judge significance: the score test needs to be computed. In case of modelmisspecification, I think the confidence intervals will likely be too narrow (given that data are usually overdispersed compared to the Poisson)
+
+- compared to `sccomp` this is a different approach:
+		- focusses on compositional nature of the data
+		- independent beta-binomial models under constraint that sum of probabilities across taxa equals 1
+		- logit-based fold differences
+		- compared to the multinomial logistic model, sccomp can cope directly with excess variance through the beta-binomial (instead of relying on 'robust' procedures to make inferences). This is important for this type of data.
+		- see Table 1 of [Mangiola et al 2023](https://doi.org/10.1073/pnas.2203828120) where the radEmu approach can be fit in (most similar to ANCOM-BC2 according to Clausen en Willis 2024 paper).