diff --git a/README.md b/README.md index de6520e..d715967 100644 --- a/README.md +++ b/README.md @@ -68,24 +68,20 @@ Before you download the Tourmaline commands and directory structure from GitHub, #### Option 1: Native installation -To run Tourmaline natively on a Mac (Intel) or Linux system, start with a Conda installation of Snakemake v7.30.1. We recommend using [Miniconda with a python >= 3.8](https://docs.conda.io/en/latest/miniconda.html): +To run Tourmaline natively on a Mac (Intel) or Linux system, start with a Conda installation of QIIME 2 (for Linux, change "osx" to "linux"): ```bash -conda install -c conda-forge -c bioconda snakemake=7.30.1 -``` - -Then install QIIME 2 with conda (**for Linux, change "osx" to "linux"**): - -```bash -wget https://data.qiime2.org/distro/core/qiime2-2023.5-py38-osx-conda.yml -conda env create -n qiime2-2023.5 --file qiime2-2023.5-py38-osx-conda.yml +wget https://data.qiime2.org/distro/core/qiime2-2023.2-py38-osx-conda.yml +# make sure you are using the most updated conda +conda update conda +conda env create -n qiime2-2023.2 --file qiime2-2023.2-py38-osx-conda.yml ``` Activate the environment and install the other Conda- or PIP-installable dependencies: ``` -conda activate qiime2-2023.5 -conda install -c conda-forge -c bioconda biopython muscle clustalo tabulate +conda activate qiime2-2023.2 +conda install -c conda-forge -c bioconda snakemake biopython muscle clustalo tabulate conda install -c conda-forge deicode pip install empress qiime dev refresh-cache @@ -99,9 +95,9 @@ Follow these instructions for Macs with M1/M2 chips. **First, set your Terminal application to run in [Rosetta mode](https://academy.bigbinary.com/learn-rubyonrails/setting-up-macos).** ```bash -wget https://data.qiime2.org/distro/core/qiime2-2023.5-py38-osx-conda.yml -CONDA_SUBDIR=osx-64 conda env create -n qiime2-2023.5 --file qiime2-2023.5-py38-osx-conda.yml -conda activate qiime2-2023.5 +wget https://data.qiime2.org/distro/core/qiime2-2023.2-py38-osx-conda.yml +CONDA_SUBDIR=osx-64 conda env create -n qiime2-2023.2 --file qiime2-2023.2-py38-osx-conda.yml +conda activate qiime2-2023.2 conda config --env --set subdir osx-64 ``` @@ -158,8 +154,8 @@ Download reference database sequence and taxonomy files, named `refseqs.qza` and ```bash cd tourmaline/01-imported -wget https://data.qiime2.org/2023.5/common/silva-138-99-seqs-515-806.qza -wget https://data.qiime2.org/2023.5/common/silva-138-99-tax-515-806.qza +wget https://data.qiime2.org/2023.2/common/silva-138-99-seqs-515-806.qza +wget https://data.qiime2.org/2023.2/common/silva-138-99-tax-515-806.qza ln -s silva-138-99-seqs-515-806.qza refseqs.qza ln -s silva-138-99-tax-515-806.qza reftax.qza ``` @@ -197,12 +193,12 @@ Now edit, replace, or store the required input files as described here: Shown here is the DADA2 paired-end workflow. See the Wiki's [Run](https://github.com/aomlomics/tourmaline/wiki/4-Run) page for complete instructions on all steps, denoising methods, and filtering modes. -Note that any of the commands below can be run with various options, including `--printshellcmds` to see the shell commands being executed and `--dryrun` to display which rules would be run but not execute them. To generate a graph of the rules that will be run from any Snakemake command, see the section "Directed acyclic graph (DAG)" on the [Run](https://github.com/aomlomics/tourmaline/wiki/4-Run) page. **Always include the --use-conda option.** +Note that any of the commands below can be run with various options, including `--printshellcmds` to see the shell commands being executed and `--dryrun` to display which rules would be run but not execute them. To generate a graph of the rules that will be run from any Snakemake command, see the section "Directed acyclic graph (DAG)" on the [Run](https://github.com/aomlomics/tourmaline/wiki/4-Run) page. From the `tourmaline` directory (which you may rename), run Snakemake with the *denoise* rule as the target, changing the number of cores to match your machine: ```bash -snakemake --use-conda dada2_pe_denoise --cores 4 +snakemake dada2_pe_denoise --cores 4 ``` Pausing after the *denoise* step allows you to make changes before proceeding: @@ -216,19 +212,19 @@ Pausing after the *denoise* step allows you to make changes before proceeding: Continue the workflow without filtering (for now). If you are satisfied with your parameters and files, run the *taxonomy* rule (for unfiltered data): ```bash -snakemake --use-conda dada2_pe_taxonomy_unfiltered --cores 4 +snakemake dada2_pe_taxonomy_unfiltered --cores 4 ``` Next, run the *diversity* rule (for unfiltered data): ```bash -snakemake --use-conda dada2_pe_diversity_unfiltered --cores 4 +snakemake dada2_pe_diversity_unfiltered --cores 4 ``` Finally, run the *report* rule (for unfiltered data): ```bash -snakemake --use-conda dada2_pe_report_unfiltered --cores 4 +snakemake dada2_pe_report_unfiltered --cores 4 ``` #### Filtered mode @@ -243,19 +239,19 @@ After viewing the *unfiltered* results—the taxonomy summary and taxa barplot, Now we are ready to filter the representative sequences and feature table, generate new summaries, and generate a new taxonomy bar plot, by running the *taxonomy* rule (for filtered data): ```bash -snakemake --use-conda dada2_pe_taxonomy_filtered --cores 4 +snakemake dada2_pe_taxonomy_filtered --cores 4 ``` Next, run the *diversity* rule (for filtered data): ```bash -snakemake --use-conda dada2_pe_diversity_filtered --cores 4 +snakemake dada2_pe_diversity_filtered --cores 4 ``` Finally, run the *report* rule (for filtered data): ```bash -snakemake --use-conda dada2_pe_report_filtered --cores 1 +snakemake dada2_pe_report_filtered --cores 1 ``` ### View output diff --git a/Snakefile b/Snakefile index 7d23de6..3512178 100644 --- a/Snakefile +++ b/Snakefile @@ -1,9 +1,9 @@ -#import pandas as pd -#import numpy as np -#from qiime2 import Artifact -#import matplotlib.pyplot as plt -#import seaborn as sns -#from tabulate import tabulate +import pandas as pd +import numpy as np +from qiime2 import Artifact +import matplotlib.pyplot as plt +import seaborn as sns +from tabulate import tabulate # GLOBALS ---------------------------------------------------------------------- @@ -298,8 +298,6 @@ rule deblur_se_report_filtered: rule check_metadata: output: touch("01-imported/check_metadata.done") - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "if [ -r '00-data/metadata.tsv' ]; then " @@ -314,11 +312,31 @@ rule summarize_metadata: output: "01-imported/metadata_summary.md", "01-imported/metadata_columns.txt" - conda: - "qiime2-2023.5" threads: config["other_threads"] - shell: - "python scripts/summarize_metadata.py {input.metadata} {output[0]} {output[1]}" + run: + df = pd.read_csv(input.metadata, sep='\t') + cols = df.columns + df2 = pd.DataFrame(columns =[0,1], index=cols) + for col in cols: + if col in df.columns: + vc = df[col].value_counts() + if vc.index.shape == (0,): + df2.loc[col, 0] = '(no values in column)' + df2.loc[col, 1] = '--' + else: + df2.loc[col, 0] = vc.index[0] + df2.loc[col, 1] = vc.values[0] + else: + df2.loc[col, 0] = '(column not provided)' + df2.loc[col, 1] = '--' + df2.columns = ['Most common value', 'Count'] + df2.index.name = 'Column name' + outstr = tabulate(df2, tablefmt="pipe", headers="keys") + with open(output[0], 'w') as target: + target.write(outstr) + target.write('\n') + with open(output[1], 'w') as target: + [target.write('%s\n' % i) for i in cols] rule check_inputs_params_pe: input: @@ -326,11 +344,8 @@ rule check_inputs_params_pe: check="01-imported/check_metadata.done" params: column=config["beta_group_column"], - classifymethod=config["classify_method"], output: touch("01-imported/check_inputs_params_pe.done") - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "if [ -r '01-imported/fastq_pe.qza' ]; then " @@ -339,28 +354,17 @@ rule check_inputs_params_pe: " echo 'OK: FASTQ manifest file 00-data/manifest_pe.csv found; it will be used to create FASTQ archive 01-imported/fastq_pe.qza.'; " "else echo 'Error: FASTQ sequence data not found; either 00-data/manifest_pe.csv or 01-imported/fastq_pe.qza is required.' && exit 1; " "fi; " - "if [ {params.classifymethod} = naive-bayes ]; then " - " if [ -r '01-imported/classifier.qza' ]; then " - " echo 'OK: Reference sequences classifier 01-imported/classifier.qza found; reference sequences FASTA file 00-data/refseqs.fna not required.'; " - " elif [ -r '01-imported/refseqs.qza' ]; then " - " echo 'OK: Reference sequences archive 01-imported/refseqs.qza found; reference sequences FASTA file 00-data/refseqs.fna not required.'; " - " elif [ -r '00-data/refseqs.fna' ]; then " - " echo 'OK: Reference sequences FASTA file 00-data/refseqs.fna found; it will be used to create reference sequences archive 01-imported/refseqs.qza.'; " - " else echo 'Error: Reference sequences not found; either 01-imported/classifier.qza or 00-data/refseqs.fna or 01-imported/refseqs.qza is required.' && exit 1; " - " fi; " - "elif [ -r '01-imported/refseqs.qza' ]; then " + "if [ -r '01-imported/refseqs.qza' ]; then " " echo 'OK: Reference sequences archive 01-imported/refseqs.qza found; reference sequences FASTA file 00-data/refseqs.fna not required.'; " "elif [ -r '00-data/refseqs.fna' ]; then " " echo 'OK: Reference sequences FASTA file 00-data/refseqs.fna found; it will be used to create reference sequences archive 01-imported/refseqs.qza.'; " "else echo 'Error: Reference sequences not found; either 00-data/refseqs.fna or 01-imported/refseqs.qza is required.' && exit 1; " "fi; " - "if [ {params.classifymethod} != naive-bayes ]; then " - " if [ -r '01-imported/reftax.qza' ]; then " - " echo 'OK: Reference taxonomy archive 01-imported/reftax.qza found; reference taxonomy file 00-data/reftax.tsv not required.'; " - " elif [ -r '00-data/reftax.tsv' ]; then " - " echo 'OK: Reference taxonomy file 00-data/reftax.tsv found; it will be used to create reference taxonomy archive 01-imported/reftax.qza.'; " - " else echo 'Error: Reference taxonomy not found; either 00-data/reftax.tsv or 01-imported/reftax.qza is required.' && exit 1; " - " fi; " + "if [ -r '01-imported/reftax.qza' ]; then " + " echo 'OK: Reference taxonomy archive 01-imported/reftax.qza found; reference taxonomy file 00-data/reftax.tsv not required.'; " + "elif [ -r '00-data/reftax.tsv' ]; then " + " echo 'OK: Reference taxonomy file 00-data/reftax.tsv found; it will be used to create reference taxonomy archive 01-imported/reftax.qza.'; " + "else echo 'Error: Reference taxonomy not found; either 00-data/reftax.tsv or 01-imported/reftax.qza is required.' && exit 1; " "fi; " "if grep -q ^{params.column}$ {input}; then " " echo 'OK: Metadata contains the column \"{params.column}\" that is specified as beta_group_column in config.yaml.'; " @@ -374,8 +378,6 @@ rule check_inputs_params_se: column=config["beta_group_column"], output: touch("01-imported/check_inputs_params_se.done") - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "if [ -r '01-imported/fastq_se.qza' ]; then " @@ -408,8 +410,6 @@ rule import_ref_seqs: "00-data/refseqs.fna" output: "01-imported/refseqs.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools import " @@ -422,8 +422,6 @@ rule import_ref_tax: "00-data/reftax.tsv" output: "01-imported/reftax.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools import " @@ -438,8 +436,6 @@ rule import_fastq_demux_pe: "01-imported/check_inputs_params_pe.done" output: "01-imported/fastq_pe.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools import " @@ -454,8 +450,6 @@ rule import_fastq_demux_se: "01-imported/check_inputs_params_se.done" output: "01-imported/fastq_se.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools import " @@ -472,8 +466,6 @@ rule summarize_fastq_demux_pe: "01-imported/check_inputs_params_pe.done" output: "01-imported/fastq_summary.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime demux summarize " @@ -486,8 +478,6 @@ rule summarize_fastq_demux_se: "01-imported/check_inputs_params_se.done" output: "01-imported/fastq_summary.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime demux summarize " @@ -499,8 +489,6 @@ rule export_fastq_summary_to_counts: "01-imported/fastq_summary.qzv" output: "01-imported/fastq_counts.tsv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "unzip -qq -o {input} -d temp0; " @@ -512,11 +500,14 @@ rule describe_fastq_counts: "01-imported/fastq_counts.tsv" output: "01-imported/fastq_counts_describe.md" - conda: - "qiime2-2023.5" threads: config["other_threads"] - shell: - "python scripts/describe_fastq_counts.py {input} {output}" + run: + s = pd.read_csv(input[0], index_col=0, sep='\t') + t = s.describe() + outstr = tabulate(pd.DataFrame(t.iloc[1:,0]), tablefmt="pipe", headers=['Statistic (n=%s)' % t.iloc[0,0].astype(int), 'Fastq sequences per sample']) + with open(output[0], 'w') as target: + target.write(outstr) + target.write('\n') # RULES: DENOISE --------------------------------------------------------------- @@ -541,8 +532,6 @@ rule denoise_dada2_pe: table="02-output-dada2-pe-unfiltered/00-table-repseqs/table.qza", repseqs="02-output-dada2-pe-unfiltered/00-table-repseqs/repseqs.qza", stats="02-output-dada2-pe-unfiltered/00-table-repseqs/dada2_stats.qza" - conda: - "qiime2-2023.5" threads: config["dada2pe_threads"] shell: "qiime dada2 denoise-paired " @@ -583,10 +572,6 @@ rule denoise_dada2_se: table="02-output-dada2-se-unfiltered/00-table-repseqs/table.qza", repseqs="02-output-dada2-se-unfiltered/00-table-repseqs/repseqs.qza", stats="02-output-dada2-se-unfiltered/00-table-repseqs/dada2_stats.qza" - conda: - "qiime2-2023.5" - conda: - "qiime2-2023.5" threads: config["dada2se_threads"] shell: "qiime dada2 denoise-single " @@ -624,10 +609,6 @@ rule denoise_deblur_se: table="02-output-deblur-se-unfiltered/00-table-repseqs/table.qza", repseqs="02-output-deblur-se-unfiltered/00-table-repseqs/repseqs.qza", stats="02-output-deblur-se-unfiltered/00-table-repseqs/deblur_stats.qza" - conda: - "qiime2-2023.5" - conda: - "qiime2-2023.5" threads: config["deblur_threads"] shell: "qiime deblur denoise-other " @@ -655,10 +636,6 @@ rule summarize_feature_table: metadata="00-data/metadata.tsv" output: "02-output-{method}-{filter}/00-table-repseqs/table_summary.qzv" - conda: - "qiime2-2023.5" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime feature-table summarize " @@ -666,29 +643,11 @@ rule summarize_feature_table: "--m-sample-metadata-file {input.metadata} " "--o-visualization {output}" -rule summarize_repseqs: - input: - stats="02-output-{method}-{filter}/00-table-repseqs/dada2_stats.qza" - output: - "02-output-{method}-{filter}/00-table-repseqs/dada2_stats.qzv" - conda: - "qiime2-2023.5" - conda: - "qiime2-2023.5" - threads: config["other_threads"] - shell: - "qiime metadata tabulate " - "--m-input-file {input.stats} " - "--o-visualization {output}" - - rule export_table_to_biom: input: "02-output-{method}-{filter}/00-table-repseqs/table.qza" output: "02-output-{method}-{filter}/00-table-repseqs/table.biom" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools export " @@ -701,8 +660,6 @@ rule summarize_biom_samples: "02-output-{method}-{filter}/00-table-repseqs/table.biom" output: "02-output-{method}-{filter}/00-table-repseqs/table_summary_samples.txt" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "biom summarize-table " @@ -716,8 +673,6 @@ rule summarize_biom_features: "02-output-{method}-{filter}/00-table-repseqs/table.biom" output: "02-output-{method}-{filter}/00-table-repseqs/table_summary_features.txt" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "biom summarize-table " @@ -732,8 +687,6 @@ rule visualize_repseqs: "02-output-{method}-{filter}/00-table-repseqs/repseqs.qza" output: "02-output-{method}-{filter}/00-table-repseqs/repseqs.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime feature-table tabulate-seqs " @@ -745,8 +698,6 @@ rule export_repseqs_to_fasta: "02-output-{method}-{filter}/00-table-repseqs/repseqs.qza" output: "02-output-{method}-{filter}/00-table-repseqs/repseqs.fasta" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools export " @@ -759,8 +710,6 @@ rule repseqs_detect_amplicon_locus: "02-output-{method}-{filter}/00-table-repseqs/repseqs.fasta" output: "02-output-{method}-{filter}/00-table-repseqs/repseqs_amplicon_type.txt" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "python scripts/detect_amplicon_locus.py -i {input} > {output}" @@ -770,8 +719,6 @@ rule repseqs_lengths: "02-output-{method}-{filter}/00-table-repseqs/repseqs.fasta" output: "02-output-{method}-{filter}/00-table-repseqs/repseqs_lengths.tsv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "perl scripts/fastaLengths.pl {input} > {output}" @@ -781,11 +728,14 @@ rule repseqs_lengths_describe: "02-output-{method}-{filter}/00-table-repseqs/repseqs_lengths.tsv" output: "02-output-{method}-{filter}/00-table-repseqs/repseqs_lengths_describe.md" - conda: - "qiime2-2023.5" threads: config["other_threads"] - shell: - "python scripts/repseqs_lengths_describe.py {input} {output}" + run: + s = pd.read_csv(input[0], header=None, index_col=0, sep='\t') + t = s.describe() + outstr = tabulate(t.iloc[1:], tablefmt="pipe", headers=['Statistic (n=%s)' % t.iloc[0].values[0].astype(int), 'Sequence length']) + with open(output[0], 'w') as target: + target.write(outstr) + target.write('\n') # RULES: TAXONOMY -------------------------------------------------------------- @@ -800,8 +750,6 @@ rule feature_classifier: classifymethod=config["classify_method"], classifyparams=config["classify_parameters"], searchout="02-output-{method}-unfiltered/01-taxonomy/search_results.qza" - conda: - "qiime2-2023.5" threads: config["feature_classifier_threads"] shell: "echo classify_method: {params.classifymethod}; " @@ -842,8 +790,6 @@ rule visualize_taxonomy: "02-output-{method}-{filter}/01-taxonomy/taxonomy.qza" output: "02-output-{method}-{filter}/01-taxonomy/taxonomy.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime metadata tabulate " @@ -857,8 +803,6 @@ rule taxa_barplot: metadata="00-data/metadata.tsv" output: "02-output-{method}-{filter}/01-taxonomy/taxa_barplot.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime taxa barplot " @@ -875,8 +819,6 @@ rule export_taxa_biom: "02-output-{method}-{filter}/01-taxonomy/taxa_sample_table.tsv" params: taxalevel=config["classify_taxalevel"] - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime taxa collapse " @@ -898,8 +840,6 @@ rule export_taxonomy_to_tsv: "02-output-{method}-{filter}/01-taxonomy/taxonomy.qza" output: "02-output-{method}-{filter}/01-taxonomy/taxonomy.tsv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools export " @@ -912,8 +852,6 @@ rule import_taxonomy_to_qza: "02-output-{method}-{filter}/01-taxonomy/taxonomy.tsv" output: "02-output-{method}-{filter}/01-taxonomy/taxonomy.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools import " @@ -934,8 +872,6 @@ rule align_repseqs: params: method=config["alignment_method"], muscle_iters=config["alignment_muscle_iters"] - conda: - "qiime2-2023.5" threads: config["alignment_threads"], shell: "if [ {params.method} = muscle ]; then " @@ -988,8 +924,6 @@ rule phylogeny_fasttree: "02-output-{method}-{filter}/02-alignment-tree/aligned_repseqs.qza" output: "02-output-{method}-{filter}/02-alignment-tree/unrooted_tree.qza" - conda: - "qiime2-2023.5" threads: config["phylogeny_fasttree_threads"] shell: "qiime phylogeny fasttree " @@ -1002,8 +936,6 @@ rule phylogeny_midpoint_root: "02-output-{method}-{filter}/02-alignment-tree/unrooted_tree.qza" output: "02-output-{method}-{filter}/02-alignment-tree/rooted_tree.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime phylogeny midpoint-root " @@ -1019,8 +951,6 @@ rule visualize_tree: outliers="02-output-{method}-{filter}/02-alignment-tree/outliers.qza" output: "02-output-{method}-{filter}/02-alignment-tree/rooted_tree.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime empress community-plot " @@ -1036,8 +966,6 @@ rule alignment_count_gaps: "02-output-{method}-{filter}/02-alignment-tree/aligned_repseqs.fasta" output: "02-output-{method}-{filter}/02-alignment-tree/aligned_repseqs_gaps.tsv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "bash scripts/alignment_count_gaps.sh < {input} > {output}" @@ -1064,8 +992,6 @@ rule alignment_detect_outliers: threshold = config["odseq_threshold"] output: "02-output-{method}-{filter}/02-alignment-tree/aligned_repseqs_outliers.tsv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "Rscript --vanilla scripts/run_odseq.R {input} {params.metric} {params.replicates} {params.threshold} temp_odseq; " @@ -1084,20 +1010,46 @@ rule tabulate_plot_repseq_properties: propdescribe="02-output-{method}-{filter}/02-alignment-tree/repseqs_properties_describe.md", proppdf="02-output-{method}-{filter}/02-alignment-tree/repseqs_properties.pdf", outliersforqza="02-output-{method}-{filter}/02-alignment-tree/outliers.tsv" - conda: - "qiime2-2023.5" threads: config["other_threads"] - shell: - "python scripts/plot_repseq_properties.py {input.lengths} {input.gaps} {input.outliers} {input.taxonomy} " - "{input.table} {output.proptsv} {output.propdescribe} {output.proppdf} {output.outliersforqza}" + run: + lengths = pd.read_csv(input['lengths'], names=['length'], index_col=0, sep='\t') + gaps = pd.read_csv(input['gaps'], names=['gaps'], index_col=0, sep='\t') + outliers = pd.read_csv(input['outliers'], names=['outlier'], index_col=0, sep='\t') + taxonomy = Artifact.load(input['taxonomy']) + taxonomydf = taxonomy.view(view_type=pd.DataFrame) + taxonomydf['level_1'] = [x.split(';')[0] for x in taxonomydf['Taxon']] + table = Artifact.load(input['table']) + tabledf = table.view(view_type=pd.DataFrame) + merged = pd.merge(lengths, gaps, left_index=True, right_index=True, how='outer') + merged = pd.merge(merged, outliers, left_index=True, right_index=True, how='outer') + merged = pd.merge(merged, taxonomydf['Taxon'], left_index=True, right_index=True, how='outer') + merged = pd.merge(merged, taxonomydf['level_1'], left_index=True, right_index=True, how='outer') + merged = pd.merge(merged, tabledf.sum().rename('observations'), left_index=True, right_index=True, how='outer') + merged.columns = ['length', 'gaps', 'outlier', 'taxonomy', 'taxonomy_level_1', 'observations'] + merged.index.name = 'featureid' + merged['log10(observations)'] = [np.log10(x) for x in merged['observations']] + merged.sort_values('log10(observations)', ascending=False, inplace=True) + merged.to_csv(output['proptsv'], index=True, sep='\t') + t = merged.describe() + tcolumns = t.columns + tcolumns = tcolumns.insert(0, 'Statistic (n=%s)' % t.iloc[0].values[0].astype(int)) + outstr = tabulate(t.iloc[1:], tablefmt="pipe", headers=tcolumns) + with open(output['propdescribe'], 'w') as target: + target.write(outstr) + target.write('\n') + g = sns.relplot(data=merged, x='length', y='gaps', col='outlier', hue='taxonomy_level_1', size='log10(observations)', sizes=(1,500), edgecolor = 'none', alpha=0.7) + g.set_axis_labels('length (bp) not including gaps', 'gaps (bp) in multiple sequence alignment') + plt.savefig(output['proppdf'], bbox_inches='tight') + outliers.columns = ['Outlier'] + outliers.index.name = 'Feature ID' + outliers = outliers*1 + outliers.to_csv(output['outliersforqza'], index=True, sep='\t') rule import_outliers_to_qza: input: "02-output-{method}-{filter}/02-alignment-tree/outliers.tsv" output: "02-output-{method}-{filter}/02-alignment-tree/outliers.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime tools import " @@ -1111,8 +1063,6 @@ rule tabulate_repseqs_to_filter: output: outliers="02-output-{method}-{filter}/02-alignment-tree/repseqs_to_filter_outliers.tsv", unassigned="02-output-{method}-{filter}/02-alignment-tree/repseqs_to_filter_unassigned.tsv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "cat {input.proptsv} | grep -i 'outlier\|true' | cut -f1,4 > {output.outliers}; " @@ -1140,8 +1090,6 @@ rule filter_sequences_table: output: repseqs="02-output-{method}-filtered/00-table-repseqs/repseqs.qza", table="02-output-{method}-filtered/00-table-repseqs/table.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: # FILTER SEQUENCES BY TAXONOMY @@ -1210,11 +1158,17 @@ rule filter_taxonomy: output: taxonomytsv="02-output-{method}-filtered/01-taxonomy/taxonomy.tsv", taxonomyqza="02-output-{method}-filtered/01-taxonomy/taxonomy.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] - shell: - "python scripts/filter_taxonomy.py {input.taxonomy} {input.repseqs} {output.taxonomytsv} {output.taxonomyqza}" + run: + df_taxonomy = pd.read_csv(input['taxonomy'], index_col=0, sep='\t') + df_repseqs = pd.read_csv(input['repseqs'], header=None, index_col=0, sep='\t') + keep_ids = df_repseqs.index + df_taxonomy_filtered = df_taxonomy.loc[list(keep_ids)] + df_taxonomy_filtered.to_csv(output['taxonomytsv'], sep='\t') + artifact_taxonomy_filtered = Artifact.import_data('FeatureData[Taxonomy]', df_taxonomy_filtered) + artifact_taxonomy_filtered.save(output['taxonomyqza']) + + # RULES: DIVERSITY ------------------------------------------------------------- @@ -1228,8 +1182,6 @@ rule diversity_alpha_rarefaction: maxdepth=config["alpha_max_depth"] output: "02-output-{method}-{filter}/03-alpha-diversity/alpha_rarefaction.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime diversity alpha-rarefaction " @@ -1268,8 +1220,6 @@ rule diversity_core_metrics_phylogenetic: weightedunifracemperor="02-output-{method}-{filter}/04-beta-diversity/weighted_unifrac_emperor.qzv", jaccardemperor="02-output-{method}-{filter}/04-beta-diversity/jaccard_emperor.qzv", braycurtisemperor="02-output-{method}-{filter}/04-beta-diversity/bray_curtis_emperor.qzv" - conda: - "qiime2-2023.5" threads: config["diversity_core_metrics_phylogenetic_threads"] shell: "qiime diversity core-metrics-phylogenetic " @@ -1302,8 +1252,6 @@ rule diversity_alpha_group_significance: metadata="00-data/metadata.tsv" output: "02-output-{method}-{filter}/03-alpha-diversity/{metric}_group_significance.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime diversity alpha-group-significance " @@ -1321,8 +1269,6 @@ rule diversity_beta_group_significance: pairwise=config["beta_group_pairwise"] output: "02-output-{method}-{filter}/04-beta-diversity/{metric}_group_significance.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime diversity beta-group-significance " @@ -1344,8 +1290,6 @@ rule deicode_auto_rpca: output: biplot="02-output-{method}-{filter}/04-beta-diversity/deicode_biplot.qza", distancematrix="02-output-{method}-{filter}/04-beta-diversity/deicode_distance_matrix.qza" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime deicode auto-rpca " @@ -1365,8 +1309,6 @@ rule emperor_biplot: numfeatures=config["deicode_num_features"] output: emperor="02-output-{method}-{filter}/04-beta-diversity/deicode_biplot_emperor.qzv" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "qiime emperor biplot " @@ -1415,8 +1357,6 @@ rule generate_report_md: refdatabase=config["database_name"] output: "03-reports/report_{method}_{filter}.md" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "echo '# Tourmaline Report' > {output};" @@ -1624,8 +1564,6 @@ rule generate_report_html: theme=config["report_theme"] output: "03-reports/report_{method}_{filter}.html" - conda: - "qiime2-2023.5" threads: config["other_threads"] shell: "pandoc -i {input} -o {output};" diff --git a/config.yaml b/config.yaml index 8db9524..3e9ab9b 100644 --- a/config.yaml +++ b/config.yaml @@ -1,5 +1,5 @@ # config.yaml - configuration file for the Tourmaline Snakemake workflow -# Compatable with qiime2-2023.5 +# Compatable with qiime2-2023.2 # User MUST edit these parameters before running their own data. # Detailed instructions: https://github.com/aomlomics/tourmaline/wiki. diff --git a/docker/Dockerfile b/docker/Dockerfile index c835a65..5bc4a7b 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -1,9 +1,9 @@ # Use the official QIIME 2 image as a parent image -FROM quay.io/qiime2/core:2023.5 +FROM quay.io/qiime2/core:2023.2 # Label information LABEL maintainer="Luke Thompson" -LABEL description="Docker image to build the Tourmaline Snakemake workflow for QIIME 2 v.2023.5" +LABEL description="Docker image to build the Tourmaline Snakemake workflow for QIIME 2 v.2023.2" # Set up bash environment: aliases, colors, history RUN echo "alias cd..='cd ..'" >> ~/.bashrc @@ -29,20 +29,14 @@ RUN apt-get update -y && \ RUN pip install empress # Add conda installation dir to $PATH (instead of doing 'conda activate') -#ENV PATH="/opt/conda/envs/snakemake/bin:$PATH" -ENV PATH /opt/conda/bin:${PATH} - -# Install snakemake environment -RUN /bin/bash -c "conda update -n base -c defaults conda" -RUN /bin/bash -c "conda create -y -c conda-forge -c bioconda -n snakemake snakemake snakemake-minimal --only-deps" -RUN echo "source activate snakemake" > ~/.bashrc -ENV PATH /opt/conda/envs/snakemake/bin:${PATH} +ENV PATH="/opt/conda/envs/qiime2-2023.2/bin:$PATH" # This is necessary to install snakemake using conda -#SHELL ["conda", "run", "-n", "qiime2-2023.2", "/bin/bash", "-c"] +SHELL ["conda", "run", "-n", "qiime2-2023.2", "/bin/bash", "-c"] -# Install tourmaline dependencies using conda -RUN conda install -n qiime2-2023.5 -c conda-forge -c bioconda biopython muscle clustalo tabulate -RUN conda install -n qiime2-2023.5 -c conda-forge deicode -#RUN qiime dev refresh-cache -RUN conda install -n qiime2-2023.5 -c bioconda bioconductor-msa bioconductor-odseq +# Install snakemake and other dependencies using conda +RUN conda update -n base -c defaults conda +RUN conda install -c conda-forge -c bioconda snakemake biopython muscle clustalo tabulate +RUN conda install -c conda-forge deicode +RUN qiime dev refresh-cache +RUN conda install -c bioconda bioconductor-msa bioconductor-odseq diff --git a/scripts/describe_fastq_counts.py b/scripts/describe_fastq_counts.py deleted file mode 100644 index 161ece5..0000000 --- a/scripts/describe_fastq_counts.py +++ /dev/null @@ -1,30 +0,0 @@ -#!/usr/bin/env python - -import pandas as pd -import sys -from tabulate import tabulate - -# usage -usage = ''' -describe_fastq_counts.py fastq_counts_tsv output1_md - -''' - -if len(sys.argv) < 1: - print(usage) - sys.exit() - -# input paths -IN = sys.argv[1] - -# output paths -OUT = sys.argv[2] # 'fastq_counts_describe.md' - - -s = pd.read_csv(IN, index_col=0, sep='\t') -t = s.describe() -outstr = tabulate(pd.DataFrame(t.iloc[1:,0]), tablefmt="pipe", headers=['Statistic (n=%s)' % t.iloc[0,0].astype(int), 'Fastq sequences per sample']) -with open(OUT, 'w') as target: - target.write(outstr) - target.write('\n') - diff --git a/scripts/filter_taxonomy.py b/scripts/filter_taxonomy.py deleted file mode 100644 index 3d145cd..0000000 --- a/scripts/filter_taxonomy.py +++ /dev/null @@ -1,30 +0,0 @@ -#!/usr/bin/env python - -import pandas as pd -import sys -from qiime2 import Artifact - -# usage -usage = ''' -filter_taxonomy.py taxonomy repseqs taxonomytsv taxonomyqza -''' - -if len(sys.argv) < 3: - print(usage) - sys.exit() - -# input paths -taxonomy = sys.argv[1] -repseqs = sys.argv[2] - -# output paths -taxonomytsv = sys.argv[3] # -taxonomyqza = sys.argv[4] - -df_taxonomy = pd.read_csv(taxonomy, index_col=0, sep='\t') -df_repseqs = pd.read_csv(repseqs, header=None, index_col=0, sep='\t') -keep_ids = df_repseqs.index -df_taxonomy_filtered = df_taxonomy.loc[list(keep_ids)] -df_taxonomy_filtered.to_csv(taxonomytsv, sep='\t') -artifact_taxonomy_filtered = Artifact.import_data('FeatureData[Taxonomy]', df_taxonomy_filtered) -artifact_taxonomy_filtered.save(taxonomyqza) \ No newline at end of file diff --git a/scripts/plot_repseq_properties.py b/scripts/plot_repseq_properties.py deleted file mode 100644 index e9a4f29..0000000 --- a/scripts/plot_repseq_properties.py +++ /dev/null @@ -1,66 +0,0 @@ -#!/usr/bin/env python - -import pandas as pd -import numpy as np -import sys -from tabulate import tabulate -from qiime2 import Artifact -import seaborn as sns -import matplotlib.pyplot as plt - -# usage -usage = ''' -plot_repseq_properties.py repseqs_lengths_tsv aligned_repseqs_gaps aligned_repseqs_outliers taxonomy_qza table_qza repseqs_properties repseqs_properties_describe repseqs_properties_pdf outliers_tsv - -''' - -if len(sys.argv) < 8: - print(usage) - sys.exit() - -# input paths -lengths = sys.argv[1] -gaps = sys.argv[2] -outliers = sys.argv[3] -taxonomy = sys.argv[4] -table = sys.argv[5] - -# output paths -proptsv = sys.argv[6] # -propdescribe = sys.argv[7] -proppdf = sys.argv[8] -outliersforqza = sys.argv[9] - -lengths = pd.read_csv(lengths, names=['length'], index_col=0, sep='\t') -gaps = pd.read_csv(gaps, names=['gaps'], index_col=0, sep='\t') -outliers = pd.read_csv(outliers, names=['outlier'], index_col=0, sep='\t') -taxonomy = Artifact.load(taxonomy) -taxonomydf = taxonomy.view(view_type=pd.DataFrame) -taxonomydf['level_1'] = [x.split(';')[0] for x in taxonomydf['Taxon']] -table = Artifact.load(table) -tabledf = table.view(view_type=pd.DataFrame) -merged = pd.merge(lengths, gaps, left_index=True, right_index=True, how='outer') -merged = pd.merge(merged, outliers, left_index=True, right_index=True, how='outer') -merged = pd.merge(merged, taxonomydf['Taxon'], left_index=True, right_index=True, how='outer') -merged = pd.merge(merged, taxonomydf['level_1'], left_index=True, right_index=True, how='outer') -merged = pd.merge(merged, tabledf.sum().rename('observations'), left_index=True, right_index=True, how='outer') -merged.columns = ['length', 'gaps', 'outlier', 'taxonomy', 'taxonomy_level_1', 'observations'] -merged.index.name = 'featureid' -merged['log10(observations)'] = [np.log10(x) for x in merged['observations']] -merged.sort_values('log10(observations)', ascending=False, inplace=True) -merged.to_csv(proptsv, index=True, sep='\t') -t = merged.describe() -tcolumns = t.columns -tcolumns = tcolumns.insert(0, 'Statistic (n=%s)' % t.iloc[0].values[0].astype(int)) -outstr = tabulate(t.iloc[1:], tablefmt="pipe", headers=tcolumns) -with open(propdescribe, 'w') as target: - target.write(outstr) - target.write('\n') -g = sns.relplot(data=merged, x='length', y='gaps', col='outlier', hue='taxonomy_level_1', size='log10(observations)', sizes=(1,500), edgecolor = 'none', alpha=0.7) -g.set_axis_labels('length (bp) not including gaps', 'gaps (bp) in multiple sequence alignment') -plt.savefig(proppdf, bbox_inches='tight') -outliers.columns = ['Outlier'] -outliers.index.name = 'Feature ID' -outliers = outliers*1 -outliers.to_csv(outliersforqza, index=True, sep='\t') - diff --git a/scripts/repseqs_lengths_describe.py b/scripts/repseqs_lengths_describe.py deleted file mode 100644 index ca9549a..0000000 --- a/scripts/repseqs_lengths_describe.py +++ /dev/null @@ -1,29 +0,0 @@ -#!/usr/bin/env python - -import pandas as pd -import sys -from tabulate import tabulate - -# usage -usage = ''' -repseqs_lengths_describe.py repseqs_lengths_tsv repseqs_lengths_md - -''' - -if len(sys.argv) < 1: - print(usage) - sys.exit() - -# input paths -IN = sys.argv[1] - -# output paths -OUT = sys.argv[2] # 'repseqs_lengths_describe.md' - -s = pd.read_csv(IN, header=None, index_col=0, sep='\t') -t = s.describe() -outstr = tabulate(t.iloc[1:], tablefmt="pipe", headers=['Statistic (n=%s)' % t.iloc[0].values[0].astype(int), 'Sequence length']) -with open(OUT, 'w') as target: - target.write(outstr) - target.write('\n') - diff --git a/scripts/summarize_metadata.py b/scripts/summarize_metadata.py deleted file mode 100644 index 2bc7f5d..0000000 --- a/scripts/summarize_metadata.py +++ /dev/null @@ -1,46 +0,0 @@ -#!/usr/bin/env python - -import pandas as pd -from tabulate import tabulate -import sys - -# usage -usage = ''' -summarize_metadata.py metadata output1_md output2_txt - -''' - -if len(sys.argv) < 2: - print(usage) - sys.exit() - -# input paths -metadata = sys.argv[1] - -# output paths -output1 = sys.argv[2] # 'manifest_pe.csv' -output2 = sys.argv[3] # 'manifest_se.csv' - -df = pd.read_csv(metadata, sep='\t') -cols = df.columns -df2 = pd.DataFrame(columns =[0,1], index=cols) -for col in cols: - if col in df.columns: - vc = df[col].value_counts() - if vc.index.shape == (0,): - df2.loc[col, 0] = '(no values in column)' - df2.loc[col, 1] = '--' - else: - df2.loc[col, 0] = vc.index[0] - df2.loc[col, 1] = vc.values[0] - else: - df2.loc[col, 0] = '(column not provided)' - df2.loc[col, 1] = '--' -df2.columns = ['Most common value', 'Count'] -df2.index.name = 'Column name' -outstr = tabulate(df2, tablefmt="pipe", headers="keys") -with open(output1, 'w') as target: - target.write(outstr) - target.write('\n') -with open(output2, 'w') as target: - [target.write('%s\n' % i) for i in cols] diff --git a/tourmaline-qiime2-2023.2-py38-osx-M1-conda-minimal-sorted.yml b/tourmaline-qiime2-2023.2-py38-osx-M1-conda-minimal-sorted.yml new file mode 100644 index 0000000..fa0617d --- /dev/null +++ b/tourmaline-qiime2-2023.2-py38-osx-M1-conda-minimal-sorted.yml @@ -0,0 +1,635 @@ +channels: + - qiime2/label/r2023.2 + - conda-forge + - bioconda + - defaults +dependencies: + - _r-mutex=1.0.1 + - altair=4.2.2 + - anyio=3.6.2 + - appnope=0.1.3 + - argcomplete=2.0.0 + - argon2-cffi-bindings=21.2.0 + - argon2-cffi=21.3.0 + - astor=0.8.1 + - asttokens=2.2.1 + - atpublic=3.0.1 + - attrs=22.2.0 + - backcall=0.2.0 + - backports.functools_lru_cache=1.6.4 + - backports=1.0 + - beautifulsoup4=4.11.2 + - bibtexparser=1.4.0 + - bioconductor-ancombc=2.0.1 + - bioconductor-beachmat=2.14.0 + - bioconductor-biobase=2.58.0 + - bioconductor-biocbaseutils=1.0.0 + - bioconductor-biocgenerics=0.44.0 + - bioconductor-biocneighbors=1.16.0 + - bioconductor-biocparallel=1.32.5 + - bioconductor-biocsingular=1.14.0 + - bioconductor-biomformat=1.26.0 + - bioconductor-biostrings=2.66.0 + - bioconductor-dada2=1.26.0 + - bioconductor-data-packages=20230202 + - bioconductor-decipher=2.26.0 + - bioconductor-decontam=1.18.0 + - bioconductor-delayedarray=0.24.0 + - bioconductor-delayedmatrixstats=1.20.0 + - bioconductor-dirichletmultinomial=1.40.0 + - bioconductor-genomeinfodb=1.34.8 + - bioconductor-genomeinfodbdata=1.2.9 + - bioconductor-genomicalignments=1.34.0 + - bioconductor-genomicranges=1.50.0 + - bioconductor-iranges=2.32.0 + - bioconductor-matrixgenerics=1.10.0 + - bioconductor-mia=1.6.0 + - bioconductor-msa + - bioconductor-multiassayexperiment=1.24.0 + - bioconductor-multtest=2.54.0 + - bioconductor-odseq + - bioconductor-phyloseq=1.42.0 + - bioconductor-rhdf5=2.42.0 + - bioconductor-rhdf5filters=1.10.0 + - bioconductor-rhdf5lib=1.20.0 + - bioconductor-rhtslib=2.0.0 + - bioconductor-rsamtools=2.14.0 + - bioconductor-s4vectors=0.36.0 + - bioconductor-scaledmatrix=1.6.0 + - bioconductor-scater=1.26.0 + - bioconductor-scuttle=1.8.0 + - bioconductor-shortread=1.56.0 + - bioconductor-singlecellexperiment=1.20.0 + - bioconductor-sparsematrixstats=1.10.0 + - bioconductor-summarizedexperiment=1.28.0 + - bioconductor-treeio=1.22.0 + - bioconductor-treesummarizedexperiment=2.6.0 + - bioconductor-xvector=0.38.0 + - bioconductor-zlibbioc=1.44.0 + - biom-format=2.1.12 + - biopython + - blast=2.12.0 + - bleach=6.0.0 + - bokeh=3.0.3 + - bowtie2=2.5.1 + - brotli-bin=1.0.9 + - brotli=1.0.9 + - brotlipy=0.7.0 + - bwidget=1.9.14 + - bzip2=1.0.8 + - c-ares=1.18.1 + - ca-certificates=2022.12.7 + - cachecontrol=0.12.11 + - cached_property=1.5.2 + - cairo=1.16.0 + - cctools_osx-64=973.0.1 + - certifi + - cffi=1.15.1 + - charset-normalizer=2.1.1 + - clang-14=14.0.6 + - clang=14.0.6 + - clang_osx-64=14.0.6 + - clangxx=14.0.6 + - clangxx_osx-64=14.0.6 + - click=8.1.3 + - clustalo + - colorama=0.4.6 + - comm=0.1.2 + - compiler-rt=14.0.6 + - compiler-rt_osx-64=14.0.6 + - contourpy=1.0.7 + - cryptography=39.0.1 + - curl=7.88.1 + - cutadapt=4.2 + - cycler=0.11.0 + - cython=0.29.33 + - deblur=1.1.1 + - debugpy=1.6.6 + - decorator=4.4.2 + - defusedxml=0.7.1 + - deicode + - dendropy=4.5.2 + - dill=0.3.6 + - dnaio=0.10.0 + - emperor=1.0.3 + - entrez-direct=16.2 + - entrypoints=0.4 + - exceptiongroup=1.1.0 + - executing=1.2.0 + - expat=2.5.0 + - fastcluster=1.2.6 + - fasttree=2.1.11 + - flit-core=3.8.0 + - flufl.lock=7.1 + - font-ttf-dejavu-sans-mono=2.37 + - font-ttf-inconsolata=3.000 + - font-ttf-source-code-pro=2.038 + - font-ttf-ubuntu=0.83 + - fontconfig=2.14.2 + - fonts-conda-ecosystem=1 + - fonts-conda-forge=1 + - fonttools=4.38.0 + - formulaic=0.5.2 + - freetype=2.12.1 + - fribidi=1.0.10 + - future=0.18.3 + - gettext=0.21.1 + - gfortran_impl_osx-64=11.3.0 + - gfortran_osx-64=11.3.0 + - glpk=5.0 + - gmp=6.2.1 + - gneiss=0.4.6 + - graphite2=1.3.13 + - graphlib-backport=1.0.3 + - gsl=2.7 + - h5py=2.10.0 + - harfbuzz=6.0.0 + - hdf5=1.10.6 + - hdmedians=0.14.2 + - hmmer=3.1b2 + - htslib=1.16 + - icu=70.1 + - idna=3.4 + - ijson=3.2.0.post0 + - importlib-metadata=4.8.3 + - importlib_metadata=4.8.3 + - importlib_resources=5.12.0 + - iniconfig=2.0.0 + - interface_meta=1.3.0 + - iow=1.0.5 + - ipykernel=6.21.2 + - ipython=8.10.0 + - ipython_genutils=0.2.0 + - ipywidgets=8.0.4 + - iqtree=2.2.0.3 + - isa-l=2.30.0 + - isl=0.25 + - jedi=0.18.2 + - jinja2=3.1.2 + - joblib=1.2.0 + - jpeg=9e + - jq=1.6 + - jsonschema=4.17.3 + - jupyter_client=8.0.3 + - jupyter_core=5.2.0 + - jupyter_events=0.6.3 + - jupyter_server=2.3.0 + - jupyter_server_terminals=0.4.4 + - jupyterlab_pygments=0.2.2 + - jupyterlab_widgets=3.0.5 + - kiwisolver=1.4.4 + - krb5=1.20.1 + - lcms2=2.14 + - ld64_osx-64=609 + - lerc=4.0.0 + - libblas=3.9.0 + - libbrotlicommon=1.0.9 + - libbrotlidec=1.0.9 + - libbrotlienc=1.0.9 + - libcblas=3.9.0 + - libclang-cpp14=14.0.6 + - libcurl=7.88.1 + - libcxx=15.0.7 + - libdeflate=1.13 + - libedit=3.1.20191231 + - libev=4.33 + - libffi=3.4.2 + - libgfortran-devel_osx-64=11.3.0 + - libgfortran5=11.3.0 + - libgfortran=5.0.0 + - libglib=2.74.1 + - libiconv=1.17 + - libidn2=2.3.4 + - liblapack=3.9.0 + - liblapacke=3.9.0 + - libllvm11=11.1.0 + - libllvm14=14.0.6 + - libnghttp2=1.51.0 + - libopenblas=0.3.21 + - libpng=1.6.39 + - libsodium=1.0.18 + - libsqlite=3.40.0 + - libssh2=1.10.0 + - libtiff=4.4.0 + - libunistring=0.9.10 + - libwebp-base=1.2.4 + - libxcb=1.13 + - libxml2=2.10.3 + - libxslt=1.1.37 + - libzlib=1.2.13 + - llvm-openmp=15.0.7 + - llvm-tools=14.0.6 + - llvmlite=0.39.1 + - lockfile=0.12.2 + - lxml=4.9.2 + - lz4-c=1.9.4 + - lz4=4.3.2 + - mafft=7.515 + - make=4.3 + - markupsafe=2.1.2 + - matplotlib-base=3.6.0 + - matplotlib-inline=0.1.6 + - matplotlib=3.6.0 + - mistune=2.0.5 + - mpc=1.3.1 + - mpfr=4.1.0 + - msgpack-python=1.0.4 + - munkres=1.1.4 + - muscle + - natsort=8.3.0 + - nbclassic=0.5.2 + - nbclient=0.7.2 + - nbconvert-core=7.2.9 + - nbconvert-pandoc=7.2.9 + - nbconvert=7.2.9 + - nbformat=5.7.3 + - ncurses=6.3 + - nest-asyncio=1.5.6 + - networkx=3.0 + - nlopt=2.7.1 + - nose=1.3.7 + - notebook-shim=0.2.2 + - notebook=6.5.2 + - numba=0.56.4 + - numpy=1.23.5 + - oniguruma=6.9.8 + - openjdk=17.0.3 + - openjpeg=2.5.0 + - openssl=3.0.8 + - packaging=23.0 + - pandas=1.5.3 + - pandoc=2.19.2 + - pandocfilters=1.5.0 + - pango=1.50.13 + - parso=0.8.3 + - patsy=0.5.3 + - pbzip2=1.1.13 + - pcre2=10.40 + - pcre=8.45 + - perl-archive-tar=2.40 + - perl-carp=1.50 + - perl-common-sense=3.75 + - perl-compress-raw-bzip2=2.201 + - perl-compress-raw-zlib=2.202 + - perl-encode=3.19 + - perl-exporter-tiny=1.002002 + - perl-exporter=5.74 + - perl-extutils-makemaker=7.66 + - perl-io-compress=2.201 + - perl-io-zlib=1.14 + - perl-json-xs=2.34 + - perl-json=4.10 + - perl-list-moreutils-xs=0.430 + - perl-list-moreutils=0.430 + - perl-parent=0.241 + - perl-pathtools=3.75 + - perl-scalar-list-utils=1.63 + - perl-storable=3.15 + - perl-types-serialiser=1.01 + - perl=5.32.1 + - pexpect=4.8.0 + - pickleshare=0.7.5 + - pigz=2.6 + - pillow=9.2.0 + - pip=23.0.1 + - pip: + - empress + - pixman=0.40.0 + - pkgutil-resolve-name=1.3.10 + - platformdirs=3.0.0 + - pluggy=1.0.0 + - prometheus_client=0.16.0 + - prompt-toolkit=3.0.36 + - psutil=5.9.4 + - pthread-stubs=0.4 + - ptyprocess=0.7.0 + - pure_eval=0.2.2 + - pycparser=2.21 + - pygments=2.14.0 + - pynndescent=0.5.8 + - pyopenssl=23.0.0 + - pyparsing=3.0.9 + - pyrsistent=0.19.3 + - pysocks=1.7.1 + - pytest=7.2.1 + - python-dateutil=2.8.2 + - python-fastjsonschema=2.16.3 + - python-isal=1.1.0 + - python-json-logger=2.0.7 + - python=3.8.16 + - python_abi=3.8 + - pytz=2022.7.1 + - pyyaml=6.0 + - pyzmq=25.0.0 + - q2-alignment=2023.2.0 + - q2-composition=2023.2.0 + - q2-cutadapt=2023.2.0 + - q2-dada2=2023.2.0 + - q2-deblur=2023.2.0 + - q2-demux=2023.2.0 + - q2-diversity-lib=2023.2.0 + - q2-diversity=2023.2.0 + - q2-emperor=2023.2.0 + - q2-feature-classifier=2023.2.0 + - q2-feature-table=2023.2.0 + - q2-fragment-insertion=2023.2.0 + - q2-gneiss=2023.2.0 + - q2-longitudinal=2023.2.0 + - q2-metadata=2023.2.0 + - q2-mystery-stew=2023.2.0 + - q2-phylogeny=2023.2.0 + - q2-quality-control=2023.2.0 + - q2-quality-filter=2023.2.0 + - q2-sample-classifier=2023.2.0 + - q2-taxa=2023.2.0 + - q2-types=2023.2.0 + - q2-vsearch=2023.2.0 + - q2cli=2023.2.0 + - q2galaxy=2023.2.0 + - q2templates=2023.2.0 + - qiime2=2023.2.0 + - r-acepack=1.4.1 + - r-ade4=1.7_22 + - r-ape=5.7 + - r-askpass=1.1 + - r-assertthat=0.2.1 + - r-backports=1.4.1 + - r-base=4.2.2 + - r-base64enc=0.1_3 + - r-beeswarm=0.4.0 + - r-bh=1.81.0_1 + - r-bibtex=0.5.1 + - r-bit64=4.0.5 + - r-bit=4.0.5 + - r-bitops=1.0_7 + - r-blob=1.2.3 + - r-boot=1.3_28.1 + - r-broom=1.0.3 + - r-bslib=0.4.2 + - r-cachem=1.0.7 + - r-cairo=1.6_0 + - r-callr=3.7.3 + - r-cellranger=1.1.0 + - r-checkmate=2.1.0 + - r-class=7.3_21 + - r-cli=3.6.0 + - r-clipr=0.8.0 + - r-cluster=2.1.4 + - r-codetools=0.2_19 + - r-colorspace=2.1_0 + - r-cpp11=0.4.3 + - r-crayon=1.5.2 + - r-curl=4.3.3 + - r-cvxr=1.0_11 + - r-data.table=1.14.8 + - r-dbi=1.1.3 + - r-dbplyr=2.3.1 + - r-deldir=1.0_6 + - r-desctools=0.99.48 + - r-digest=0.6.31 + - r-doparallel=1.0.17 + - r-dorng=1.8.6 + - r-dplyr=1.1.0 + - r-dqrng=0.3.0 + - r-dtplyr=1.3.0 + - r-e1071=1.7_13 + - r-ecosolver=0.5.4 + - r-ellipsis=0.3.2 + - r-emmeans=1.8.4_1 + - r-energy=1.7_11 + - r-estimability=1.4.1 + - r-evaluate=0.20 + - r-exact=3.2 + - r-expm=0.999_7 + - r-fansi=1.0.4 + - r-farver=2.1.1 + - r-fastmap=1.1.1 + - r-fnn=1.1.3.1 + - r-forcats=1.0.0 + - r-foreach=1.5.2 + - r-foreign=0.8_84 + - r-formatr=1.14 + - r-formula=1.2_5 + - r-frictionless=1.0.2 + - r-fs=1.6.1 + - r-futile.logger=1.4.3 + - r-futile.options=1.0.1 + - r-gargle=1.3.0 + - r-gbrd=0.4_11 + - r-generics=0.1.3 + - r-getopt=1.20.3 + - r-ggbeeswarm=0.7.1 + - r-ggplot2=3.4.1 + - r-ggrastr=1.0.1 + - r-ggrepel=0.9.3 + - r-gld=2.6.6 + - r-glue=1.6.2 + - r-gmp=0.7_1 + - r-googledrive=2.0.0 + - r-googlesheets4=1.0.1 + - r-gridextra=2.3 + - r-gsl=2.1_8 + - r-gtable=0.3.1 + - r-haven=2.5.1 + - r-highr=0.10 + - r-hmisc=4.8_0 + - r-hms=1.1.2 + - r-htmltable=2.4.1 + - r-htmltools=0.5.4 + - r-htmlwidgets=1.6.1 + - r-httr=1.4.5 + - r-hwriter=1.3.2.1 + - r-ids=1.0.1 + - r-igraph=1.4.1 + - r-interp=1.1_3 + - r-irlba=2.3.5.1 + - r-isoband=0.2.7 + - r-iterators=1.0.14 + - r-jpeg=0.1_10 + - r-jquerylib=0.1.4 + - r-jsonlite=1.8.4 + - r-knitr=1.42 + - r-labeling=0.4.2 + - r-lambda.r=1.2.4 + - r-lattice=0.20_45 + - r-latticeextra=0.6_30 + - r-lazyeval=0.2.2 + - r-lifecycle=1.0.3 + - r-lme4=1.1_31 + - r-lmertest=3.1_3 + - r-lmom=2.9 + - r-lubridate=1.9.2 + - r-magrittr=2.0.3 + - r-mass=7.3_58.2 + - r-matrix=1.5_3 + - r-matrixstats=0.63.0 + - r-memoise=2.0.1 + - r-mgcv=1.8_41 + - r-mime=0.12 + - r-minqa=1.2.5 + - r-modelr=0.1.10 + - r-munsell=0.5.0 + - r-mvtnorm=1.1_3 + - r-nlme=3.1_162 + - r-nloptr=2.0.3 + - r-nnet=7.3_18 + - r-numderiv=2016.8_1.1 + - r-openssl=2.0.5 + - r-optparse=1.7.3 + - r-osqp=0.6.0.8 + - r-permute=0.9_7 + - r-pheatmap=1.0.12 + - r-pillar=1.8.1 + - r-pixmap=0.4_12 + - r-pkgconfig=2.0.3 + - r-pkgmaker=0.32.8 + - r-plogr=0.2.0 + - r-plyr=1.8.8 + - r-png=0.1_8 + - r-prettyunits=1.1.1 + - r-processx=3.8.0 + - r-progress=1.2.2 + - r-proxy=0.4_27 + - r-ps=1.7.2 + - r-purrr=1.0.1 + - r-r6=2.5.1 + - r-ragg=1.2.4 + - r-rappdirs=0.3.3 + - r-rbibutils=2.2.13 + - r-rcolorbrewer=1.1_3 + - r-rcpp=1.0.10 + - r-rcppannoy=0.0.20 + - r-rcppeigen=0.3.3.9.3 + - r-rcpphnsw=0.4.1 + - r-rcppml=0.3.7 + - r-rcppparallel=5.1.6 + - r-rcppprogress=0.4.2 + - r-rcurl=1.98_1.10 + - r-rdpack=2.4 + - r-readr=2.1.4 + - r-readxl=1.4.2 + - r-registry=0.5_1 + - r-rematch2=2.1.2 + - r-rematch=1.0.1 + - r-reprex=2.0.2 + - r-reshape2=1.4.4 + - r-rlang=1.0.6 + - r-rmarkdown=2.20 + - r-rmpfr=0.9_1 + - r-rngtools=1.5.2 + - r-rootsolve=1.8.2.3 + - r-rpart=4.1.19 + - r-rspectra=0.16_1 + - r-rsqlite=2.2.20 + - r-rstudioapi=0.14 + - r-rsvd=1.0.5 + - r-rtsne=0.16 + - r-rvest=1.0.3 + - r-sass=0.4.5 + - r-scales=1.2.1 + - r-scs=3.0_1 + - r-selectr=0.4_2 + - r-sitmo=2.0.2 + - r-snow=0.4_4 + - r-sp=1.6_0 + - r-statmod=1.5.0 + - r-stringi=1.7.12 + - r-stringr=1.5.0 + - r-survival=3.5_3 + - r-sys=3.4.1 + - r-systemfonts=1.0.4 + - r-textshaping=0.3.6 + - r-tibble=3.1.8 + - r-tidyr=1.3.0 + - r-tidyselect=1.2.0 + - r-tidytree=0.4.2 + - r-tidyverse=1.3.2 + - r-timechange=0.2.0 + - r-tinytex=0.44 + - r-tzdb=0.3.0 + - r-utf8=1.2.3 + - r-uuid=1.1_0 + - r-uwot=0.1.14 + - r-vctrs=0.5.2 + - r-vegan=2.6_4 + - r-vipor=0.4.5 + - r-viridis=0.6.2 + - r-viridislite=0.4.1 + - r-vroom=1.6.1 + - r-withr=2.5.0 + - r-xfun=0.37 + - r-xml2=1.3.3 + - r-xtable=1.8_4 + - r-yaml=2.3.7 + - r-yulab.utils=0.0.6 + - raxml=8.2.12 + - readline=8.1.2 + - requests=2.28.2 + - rfc3339-validator=0.1.4 + - rfc3986-validator=0.1.1 + - samtools=1.16.1 + - scikit-bio=0.5.7 + - scikit-learn=0.24.1 + - scipy=1.8.1 + - seaborn-base=0.12.2 + - seaborn=0.12.2 + - send2trash=1.8.0 + - sepp=4.3.10 + - setuptools=67.4.0 + - sigtool=0.1.3 + - six=1.16.0 + - snakemake + - sniffio=1.3.0 + - sortmerna=2.0 + - soupsieve=2.3.2.post1 + - stack_data=0.6.2 + - statsmodels=0.13.5 + - tabulate + - tapi=1100.0.11 + - tbb=2021.8.0 + - terminado=0.17.1 + - threadpoolctl=3.1.0 + - tinycss2=1.2.1 + - tk=8.6.12 + - tktable=2.10 + - toml=0.10.2 + - tomli=2.0.1 + - toolz=0.12.0 + - tornado=6.2 + - tqdm=4.64.1 + - traitlets=5.9.0 + - typing-extensions=4.4.0 + - typing_extensions=4.4.0 + - tzlocal=2.1 + - umap-learn=0.5.3 + - unicodedata2=15.0.0 + - unifrac-binaries=1.1.1 + - unifrac=1.1.1 + - urllib3=1.26.14 + - vsearch=2.22.1 + - wcwidth=0.2.6 + - webencodings=0.5.1 + - websocket-client=1.5.1 + - wget=1.20.3 + - wheel=0.38.4 + - widgetsnbextension=4.0.5 + - wrapt=1.15.0 + - xmltodict=0.13.0 + - xopen=1.7.0 + - xorg-kbproto=1.0.7 + - xorg-libice=1.0.10 + - xorg-libsm=1.2.3 + - xorg-libx11=1.7.2 + - xorg-libxau=1.0.9 + - xorg-libxdmcp=1.1.3 + - xorg-libxt=1.2.1 + - xorg-xproto=7.0.31 + - xyzservices=2023.2.0 + - xz=5.2.6 + - yaml=0.2.5 + - yq=3.1.1 + - zeromq=4.3.4 + - zipp=3.15.0 + - zlib=1.2.13 + - zstandard=0.19.0 + - zstd=1.5.2