Public Health Bioinformatics v1.2.0 Release Notes

This minor release introduces three new workflows and resolves various bugs.

New workflows:

TheiaMeta_Illumina_PE_PHB
This workflow offers a versatile approach to de novo metagenomic assembly, providing the option to use either reference-based or reference-independent metagenomic assembly. Taxonomic characterization is also performed with Kraken2.
CZGenEpi_Prep_PHB
The CZGenEpi_Prep workflow formats metadata and assembly files for seamless integration with the Chan Zuckerberg GEN EPI platform.
Samples_to_Ref_Tree_PHB
In this workflow, Nextclade is used to rapidly place new samples onto an existing reference phylogenetic tree. Phylogenetic placement is done by comparing the mutations of the query sequence (relative to the reference) with the mutations of every node and tip in the reference tree, and finding the node which has the most similar set of mutations. This operation is repeated for each query sequence, until all of them are placed onto the tree.

Changes in existing workflows

Kraken2_SE_PHB
Kraken2 output files were not being correctly identified by the single-end standalone workflow, causing it to fail unexpectedly Output files should now populate on the Terra datatable correctly.
KMC
The output type of est_genome_size is now an int so data can be sorted numerically in a Terra datatable when running TheiaProk_ONT. Additionally, this task no longer runs unnecessarily for the TheiaCoV_ONT workflow.
TS_MLST
The database had been updated as of August 2023.

New outputs:
- ts_mlst_docker

Mycobacterium tuberculosis changes

TBProfiler
The default variant caller has been adjusted to FreeBayes to accurately identify resistance-conferring deletions and multi-nucleotide variants (MNVs),
tbp-parser
A TBProfiler parsing module has been added to apply variant interpretation logic based on recommendations by the WHO, CDC and CDPH to produce antitubercular drug resistance calls. Additionally, a set of machine and human-interpretable files are produced to facilitate data sharing and interpretation. Find the source code here.

New inputs:
- tbprofiler_output_seq_method_type (default="WGS")
- tbprofiler_operator (default="")
- tbp_parser_min_depth (default=10)
- tbp_parser_coverage_threshold (default=100)
- tbp_parser_debug (default=false)
- tbp_parser_docker_image (default="us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.0.1")
New outputs:
- tbprofiler_lims_report_csv
- tbprofiler_looker_csv
- tbprofiler_laboratorian_report_csv
- tbprofiler_resistance_genes_percent_coverage
- tbp_parser_genome_percent_coverage
- tbp_parser_version
- tbp_parser_docker
Clockwork
The clockwork module has been added to decontaminate read files of sequencing data that may come from a nontuberculous mycobacteria (NTM) or human genome.

New outputs:
- clockwork_decontaminated_read1
- clockwork_decontaminated_read2
TBDB
The TBProfiler module uses a database called TBDB. We have modified the code to allow for custom databases to be used in place of the default TBDB. Additionally, we have created a custom database including mutations from TBDB, the WHO catalog, and a list of mutations included in the CDC's MTB pipeline Varpipe.

By default, TBProfiler runs with the default database. If the Boolean input tbprofiler_run_custom_db is set to true and no database is provided by the user, a database containing both TBProfiler's TBDB and CDC Varpipe's collection of resistance conferring mutations will be used by TBProfiler. In this database, the duplicate entries have been manually curated by removing the TBDB entry in favor of Varpipe's mutation annotation.

New inputs:
- tbprofiler_run_custom_db (default=false)
- tbprofiler_custom_db (default="gs://theiagen-public-files/terra/theiaprok-files/tbdb_varpipe_combined.tar.gz")

Bug Fixes

In the KMC task, the -n flag has been added to the echo command to avoid newline characters
An optional snippy_core_bed file input has been added to the Snippy_Tree workflow to enable site masking, and thereby exposing this optional input to the Snippy_Streamline workflow.
The memory input for quast has been adjusted to match the style guide in TheiaEuk_Illumina_PE_PHB workflow.
The version_capture task now uses a Docker image hosted on Theiagen's Google Artifact Registry (GAR) instead of DockerHub; we also exposed docker as an optional input for this task.
The plasmidfinder output parsing was overambitious when removing duplicates and removed every instance of a duplicate, instead of just one. This has been resolved.

What's Changed

Create issue templates by @sage-wright in #175
Add preemptibles, shorter version string by @aofarrel in #185
Fix kraken2_standalone for SE data by @cimendes in #178
Patch theiaprok ont - change est_genome_size to Int by @cimendes in #179
plasmidfinder task bugfix and updates by @kapsakcj in #191
TheiaMeta: Viral Metagenomics workflow by @cimendes in #64
adding bed file input by @jrotieno in #190
Jro mpxv global tree by @jrotieno in #160
Adding tbp_parser and clockwork to TheiaProk by @frankambrosio3 in #192
KMC on TheiaProk_ONT and TheiaCoV_ONT by @cimendes in #193
CZGenEpi_Prep_PHB workflow by @sage-wright in #161
update ts mlst docker (August 2023) by @cimendes in #195
TBDB with varpipe by @cimendes in #197
Smw tbprofiler continuing dev by @sage-wright in #199
adjusted call block for quast in theiaeuk_illumina_pe_PHB workflow: m… by @kapsakcj in #200
add -n to echo command in kmc to avoid new line by @frankambrosio3 in #201
switch default docker image for version_capture to GAR-hosted image; CI change to micromamba by @kapsakcj in #198
update version by @sage-wright in #204
revert ncbi scrub changes to commid id 4e0fa54 by @cimendes in #205

Full Changelog: v1.1.0...v1.2.0

View our documentation here!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.0

Public Health Bioinformatics v1.2.0 Release Notes

This minor release introduces three new workflows and resolves various bugs.

New workflows:

Changes in existing workflows

Mycobacterium tuberculosis changes

Bug Fixes

What's Changed

Contributors