Skip to content

Data, scripts, and outputs for the Nat. Comm. paper "Analysis of Composition of Microbiomes with Bias Correction"

License

Notifications You must be signed in to change notification settings

simone-anza/ANCOM-BC-Code-Archive

 
 

Repository files navigation

User Manual for ANCOM-BC Function

DOI

This is the repository archiving data and scripts for reproducing results presented in the Nat. Comm. paper ANCOM-BC.

For the corresponding R package, refer to ANCOMBC repository.

The current code implements ANCOM-BC in cross-sectional datasets for comparing the change of absolute abundance for each taxon among different experimental groups.

R-package dependencies

The following libraries need to be included for the R code to run:

library(dplyr)
library(nloptr)

Instructions for use

Data preprocess

Usage

  • feature_table_pre_process(feature.table, meta.data, sample.var, group.var, zero.cut, lib.cut, neg.lb)

Arguments

  • feature.table: Data frame or matrix representing observed OTU table with OTUs (or taxa) in rows and samples in columns.
  • meta.data: Data frame or matrix of all variables and covariates of interest.
  • sample.var: Character. The name of column storing sample IDs.
  • group.var: Character. The name of the main variable of interest. ANCOM-BC v1.0 only supports discrete group.var and aims to compare the change of absolute abundance across different levels of group.var.
  • zero.cut: Numerical fraction between 0 and 1. Taxa with proportion of zeroes greater than zero.cut are not included in the analysis.
  • lib.cut: Numeric. Samples with library size less than lib.cut are not included in the analysis.
  • neg.lb: Logical. TRUE indicates a taxon would be classified as a structural zero in the corresponding experimental group using its asymptotic lower bound.

Value

  • feature.table: A data frame of pre-processed OTU table.
  • library.size: A numeric vector of library sizes after pre-processing.
  • group.name: A character vector of levels of group.var.
  • group.ind: A numeric vector. Each sample is assigned to a number indicating its group label for better internal process.
  • structure.zeros: A matrix consists of 0 and 1s with 1 indicating the taxon is identified as a structural zero in the corresponding group.

ANCOM-BC main function

Usage:

  • ANCOM_BC(feature.table, grp.name, grp.ind, struc.zero, adj.method, tol.EM, max.iterNum, perNum, alpha)

Arguments:

  • feature.table: Data frame or matrix representing the pre-processed OTU table with OTUs (or taxa) in rows and samples in columns.
  • grp.name: A character vector indicating the levels of group.
  • grp.ind: A numeric vector indicating group assignment for each sample. 1 corresponds to the 1st level of grp.name, 2 corresponds to the 2nd level of grp.name, etc.
  • struc.zero: A matrix consists of 0 and 1s with 1 indicating the taxon is identified as a structural zero in the corresponding group.
  • adj.method: Character. Returns p-values adjusted using the specified method, including "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".
  • tol.EM: Numeric. The iteration convergence tolerance for E-M algorithm.
  • max.iterNum: Numeric. The maximum number of iterations for E-M algorithm.
  • perNum: Numeric. The maximum number of permutations. This argument is active only if there exist more than 2 groups.
  • alpha: Numeric. Level of significance.

Value:

  • feature.table: Data frame or matrix. Return the input feature.table.
  • res: Data frame. The primary result of ANCOM-BC consisting of:
    • mean.difference: Numeric. The estimated mean difference of absolute abundance between groups in log scale (natural log);
    • se: Numeric. The standard error of mean.difference;
    • W: Numeric. mean.difference/se, which is the test statistic of ANCOM-BC.
    • p.val: Numeric. P-value obtained from two-sided Z-test using the test statistic W.
    • q.val. Numeric. Q-value obtained by applying adj.method to p-val.
    • diff.abn. Logical. TRUE if the taxon has q.val less than alpha.
  • d: A numeric vector. Estimated sampling fractions in log scale (natural log).
  • mu: A numeric vector. Estimated log (natural log) mean absolute abundance for each group.
  • bias.em: Numeric. Estimated mean difference of log (natural log) sampling fractions between groups through E-M algorithm.
  • bias.wls: Numeric. Estimated mean difference of log (natural log) sampling fractions between groups through weighted least squares.

Flowchart of ANCOM-BC

Examples

# Load example data
data(dietswap)
pseq = dietswap
n_taxa = ntaxa(pseq)
n_samp = nsamples(pseq)
# Metadata
meta_data = meta(pseq)
# Taxonomy table
taxonomy = tax_table(pseq)
# Absolute abundances
otu_absolute = abundances(pseq)

Two-group comparison

# Pre-processing
feature.table = otu_absolute; sample.var = "sample"; group.var = "nationality"; 
zero.cut = 0.90; lib.cut = 1000; neg.lb = TRUE
pre.process = feature_table_pre_process(feature.table, meta_data, sample.var, 
                                        group.var, zero.cut, lib.cut, neg.lb)
feature.table = pre.process$feature.table
group.name = pre.process$group.name
group.ind = pre.process$group.ind
struc.zero = pre.process$structure.zeros

# Paras for ANCOM-BC
grp.name = group.name; grp.ind = group.ind; adj.method = "bonferroni"
tol.EM = 1e-5; max.iterNum = 100; perNum = 1000; alpha = 0.05

out = ANCOM_BC(feature.table, grp.name, grp.ind, struc.zero,
               adj.method, tol.EM, max.iterNum, perNum, alpha)
res = cbind(taxon = rownames(out$feature.table), out$res)
write_csv(res, "demo_two_group.csv")

Expected run time: 6s (R version 3.5.1 (2018-07-02); Platform: x86_64-apple-darwin15.6.0 (64-bit); Running under: macOS 10.15.1.)

Multi-group comparison

# Pre-processing
feature.table = otu_absolute; sample.var = "sample"; group.var = "bmi_group"; 
zero.cut = 0.90; lib.cut = 1000; neg.lb = TRUE
pre.process = feature_table_pre_process(feature.table, meta_data, sample.var, 
                                        group.var, zero.cut, lib.cut, neg.lb)
feature.table = pre.process$feature.table
group.name = pre.process$group.name
group.ind = pre.process$group.ind
struc.zero = pre.process$structure.zeros

# Paras for ANCOM-BC
grp.name = group.name; grp.ind = group.ind; adj.method = "bonferroni"
tol.EM = 1e-5; max.iterNum = 100; perNum = 1000; alpha = 0.05

out = ANCOM_BC(feature.table, grp.name, grp.ind, struc.zero,
               adj.method, tol.EM, max.iterNum, perNum, alpha)
res = cbind(taxon = rownames(out$feature.table), out$res)
write_csv(res, "demo_multi_group.csv")

Expected run time: 19s (R version 3.5.1 (2018-07-02); Platform: x86_64-apple-darwin15.6.0 (64-bit); Running under: macOS 10.15.1.)

About

Data, scripts, and outputs for the Nat. Comm. paper "Analysis of Composition of Microbiomes with Bias Correction"

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 97.8%
  • R 2.2%