v1.2.0
Public Health Bioinformatics v1.2.0 Release Notes
This minor release introduces three new workflows and resolves various bugs.
New workflows:
-
TheiaMeta_Illumina_PE_PHB
This workflow offers a versatile approach to de novo metagenomic assembly, providing the option to use either reference-based or reference-independent metagenomic assembly. Taxonomic characterization is also performed with Kraken2. -
CZGenEpi_Prep_PHB
The CZGenEpi_Prep workflow formats metadata and assembly files for seamless integration with the Chan Zuckerberg GEN EPI platform. -
Samples_to_Ref_Tree_PHB
In this workflow, Nextclade is used to rapidly place new samples onto an existing reference phylogenetic tree. Phylogenetic placement is done by comparing the mutations of the query sequence (relative to the reference) with the mutations of every node and tip in the reference tree, and finding the node which has the most similar set of mutations. This operation is repeated for each query sequence, until all of them are placed onto the tree.
Changes in existing workflows
-
Kraken2_SE_PHB
Kraken2 output files were not being correctly identified by the single-end standalone workflow, causing it to fail unexpectedly Output files should now populate on the Terra datatable correctly. -
KMC
The output type ofest_genome_size
is now anint
so data can be sorted numerically in a Terra datatable when running TheiaProk_ONT. Additionally, this task no longer runs unnecessarily for the TheiaCoV_ONT workflow. -
TS_MLST
The database had been updated as of August 2023.New outputs:
ts_mlst_docker
Mycobacterium tuberculosis changes
-
TBProfiler
The default variant caller has been adjusted to FreeBayes to accurately identify resistance-conferring deletions and multi-nucleotide variants (MNVs), -
tbp-parser
A TBProfiler parsing module has been added to apply variant interpretation logic based on recommendations by the WHO, CDC and CDPH to produce antitubercular drug resistance calls. Additionally, a set of machine and human-interpretable files are produced to facilitate data sharing and interpretation. Find the source code here.New inputs:
tbprofiler_output_seq_method_type
(default="WGS")tbprofiler_operator
(default="")tbp_parser_min_depth
(default=10)tbp_parser_coverage_threshold
(default=100)tbp_parser_debug
(default=false)tbp_parser_docker_image
(default="us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.0.1")
New outputs:
tbprofiler_lims_report_csv
tbprofiler_looker_csv
tbprofiler_laboratorian_report_csv
tbprofiler_resistance_genes_percent_coverage
tbp_parser_genome_percent_coverage
tbp_parser_version
tbp_parser_docker
-
Clockwork
Theclockwork
module has been added to decontaminate read files of sequencing data that may come from a nontuberculous mycobacteria (NTM) or human genome.New outputs:
clockwork_decontaminated_read1
clockwork_decontaminated_read2
-
TBDB
The TBProfiler module uses a database called TBDB. We have modified the code to allow for custom databases to be used in place of the default TBDB. Additionally, we have created a custom database including mutations from TBDB, the WHO catalog, and a list of mutations included in the CDC's MTB pipeline Varpipe.By default, TBProfiler runs with the default database. If the Boolean input
tbprofiler_run_custom_db
is set to true and no database is provided by the user, a database containing both TBProfiler's TBDB and CDC Varpipe's collection of resistance conferring mutations will be used by TBProfiler. In this database, the duplicate entries have been manually curated by removing the TBDB entry in favor of Varpipe's mutation annotation.New inputs:
tbprofiler_run_custom_db
(default=false)tbprofiler_custom_db
(default="gs://theiagen-public-files/terra/theiaprok-files/tbdb_varpipe_combined.tar.gz")
Bug Fixes
- In the
KMC
task, the -n flag has been added to theecho
command to avoid newline characters - An optional
snippy_core_bed
file input has been added to the Snippy_Tree workflow to enable site masking, and thereby exposing this optional input to the Snippy_Streamline workflow. - The
memory
input for quast has been adjusted to match the style guide in TheiaEuk_Illumina_PE_PHB workflow. - The
version_capture
task now uses a Docker image hosted on Theiagen's Google Artifact Registry (GAR) instead of DockerHub; we also exposeddocker
as an optional input for this task. - The
plasmidfinder
output parsing was overambitious when removing duplicates and removed every instance of a duplicate, instead of just one. This has been resolved.
What's Changed
- Create issue templates by @sage-wright in #175
- Add preemptibles, shorter version string by @aofarrel in #185
- Fix kraken2_standalone for SE data by @cimendes in #178
- Patch theiaprok ont - change est_genome_size to Int by @cimendes in #179
- plasmidfinder task bugfix and updates by @kapsakcj in #191
- TheiaMeta: Viral Metagenomics workflow by @cimendes in #64
- adding bed file input by @jrotieno in #190
- Jro mpxv global tree by @jrotieno in #160
- Adding tbp_parser and clockwork to TheiaProk by @frankambrosio3 in #192
- KMC on TheiaProk_ONT and TheiaCoV_ONT by @cimendes in #193
- CZGenEpi_Prep_PHB workflow by @sage-wright in #161
- update ts mlst docker (August 2023) by @cimendes in #195
- TBDB with varpipe by @cimendes in #197
- Smw tbprofiler continuing dev by @sage-wright in #199
- adjusted call block for quast in theiaeuk_illumina_pe_PHB workflow: m… by @kapsakcj in #200
- add -n to echo command in kmc to avoid new line by @frankambrosio3 in #201
- switch default docker image for version_capture to GAR-hosted image; CI change to micromamba by @kapsakcj in #198
- update version by @sage-wright in #204
- revert ncbi scrub changes to commid id 4e0fa54 by @cimendes in #205
Full Changelog: v1.1.0...v1.2.0