v2.2.0
Public Health Bioinformatics v2.2.0 Minor Release Notes
This minor release adds two new workflows, Create_Terra_Table_PHB and Snippy_Streamline_FASTA_PHB, and makes significant improvements to the TheiaProk, TheiaCoV, TheiaMeta, and Freyja workflow series. Additionally, several bug fixes have been made.
Full release notes can be found here!
Find our documentation here!
🆕 New workflows:
-
- The manual creation of Terra tables can be tedious and error-prone. This workflow will automatically create your Terra data table when provided with the location of the files. It can import assembly, paired-end (Illumina) and single-end (Illumina and Oxford Nanopore) data.
- Import the workflow from Dockstore.
-
- Since Snippy_Variants_PHB is now compatible with assembled sequences as input in FASTA format, we have developed Snippy_Streamline_FASTA, an all-in-one approach to generating a reference-based phylogeny using the Snippy tools, mirroring the Snippy_Streamline_PHB workflow. By default, it runs Snippy_Variants and Snippy_Tree, but will optionally run Assembly_Fetch if a reference genome is not provided.
- Import the workflow from Dockstore.
🚀 Changes to existing workflows:
-
All TheiaProk Workflows
- Genomic characterization with
emmtyper
is now enabled for Streptococcus pyogenes. (Thanks, @sam-baird!) - When
call_ani
istrue
, failures will no longer occur if multiple hits have the same score. - Support for Vibrio parahaemolyticus, Vibrio vulnificus and Enterobacter asburiae was added to the AMRFinderPlus task
- VirulenceFinder now runs on Shigella sonnei samples.
- The Docker containers for AMRFinderPlus, tbp-parser and mlst have been updated:
- AMRFinderPlus:
3.12.8-2024-07-22.1
- tbp-parser:
tbp-parser:1.6.0
- mlst:
2.23.0-2024-08-01
- AMRFinderPlus:
- Genomic characterization can now be skipped by setting the new optional input
perform_characterization
tofalse
. - The GAMBIT prokaryotic database has been updated to
v2.0.0-20240628
. - Optional inputs are now available for all tasks within the
merlin_magic
subworkflow.
- Genomic characterization with
-
All TheiaCoV Workflows
- GenoFLU has been added for H5N1 influenza typing.
- Additional VADR output files have been exposed:
File? vadr_feature_tbl_pass
File? vadr_feature_tbl_fail
File? vadr_classification_summary_file
File? vadr_all_outputs_tar_gz
- Aligned FASTQs no longer contain supplemental/secondary alignments.
-
TheiaCoV_Illumina_PE_PHB and TheiaCoV_ONT_PHB
- Workflow will no longer fail if an assembly cannot be produced. The
assembly_fasta
column will say "Assembly could not be generated".
- Workflow will no longer fail if an assembly cannot be produced. The
-
TheiaEuk_Illumina_PE_PHB
- TheiaEuk no longer abruptly fails if an organism outside of the expected list of taxa is detected by GAMBIT.
- All optional inputs and docker containers for taxa-specific sub-modules have been exposed.
-
All ONT workflows (TheiaProk and TheiaCoV)
- KMC is no longer used for genome-size prediction. Instead, for TheiaProk, the expected genome length is now set to 5 Mb, which is around 0.7 Mb larger than the average bacterial genome length. For TheiaCoV, species have default genome lengths associated with their organism tag.
-
TheiaCoV and TheiaMeta workflows
- The human read removal tool (HRRT) has been updated to
v2.2.1
. For paired-end data, reads are first interleaved to guarantee that no mates are orphaned by this tool.
- The human read removal tool (HRRT) has been updated to
-
All Freyja Workflows
- Freyja has been updated for all workflows to version
1.5.1
. - SARS-CoV-2 UShER barcodes file is now a .feather file.
- Freyja_FASTQ_PHB is now compatible with Illumina paired-end, Illumina single-end and Oxford Nanopore data. A new input
ont
has been added to control workflow behavior. - The UShER barcodes and lineage files used are now exposed as outputs in Freyja_FASTQ_PHB
- Freyja has been updated for all workflows to version
-
Snippy_Variants_PHB
- In addition to reads, paired-end, and single-end, assemblies are now accepted as input. If Illumina sequencing data is to be used, use the
read1
and optionally, theread2
, optional inputs to pass the forward and reverse-facing reads respectively, If assembled genomes are to be used, use theassembly_fasta
input and omitread1
andread2
.
- In addition to reads, paired-end, and single-end, assemblies are now accepted as input. If Illumina sequencing data is to be used, use the
-
SRA_Fetch_PHB
- SRA-Lite files are now detected when it's a low-quality file.
-
Augur_PHB
- mpox mutation context has been added to the
auspice_input_json
output which displays the fraction of G->A or C->T.
- mpox mutation context has been added to the
-
GAMBIT_Query_PHB
- The GAMBIT prokaryotic database has been updated to
v2.0.0-20240628
.
- The GAMBIT prokaryotic database has been updated to
-
Mercury_Prep_N_Batch_PHB
- Mercury has been moved to its own repository at https://github.com/theiagen/mercury.
- Mercury now processes BioSample & SRA metadata for flu
What's Changed
- [TheiaProk] Add emmtyper task for Streptococcus pyogenes by @sam-baird in #524
- [SRA-Fetch] Detect SRA-Lite when it's low quality file by @cimendes in #512
- Adding the Create_Terra_Table_PHB workflow by @sage-wright in #533
- [Create_Terra_Table] recognize fastq files that end in .fq by @sage-wright in #535
- [TheiaProk - ANI] prevent failures when multiple top hits have the same score by @sage-wright in #532
- [TheiaCoV] Flu: Prevent workflow failures when assembly cannot be produced; generate NanoPlot outputs regardless of assembly success by @sage-wright in #530
- [theiaprok] amrfinderplus: add support for Vibrio parahaemolyticus, Vibrio vulnificus, Enterobacter asburiae. Fix C diff bug by @kapsakcj in #542
- [TheiaCoV] Add GenoFLU for flu whole-genome genotyping by @sage-wright in #540
- [TheiaProk] Merlin_magic subwf bugfix: run virulencefinder on Shigella sonnei by @kapsakcj in #543
- [TheiaCoV and TheiaMeta] Update hrrt (ncbi-scrub) to version 2.2.1 and optimise task by @cimendes in #527
- [TheiaCoV and TheiaMeta - HRRT] Patch bug by removing unneeded awk verification by @cimendes in #550
- Create CODEOWNERS by @AndrewLangvt in #554
- [TheiaProk] Add additional input enabling characterization by @sage-wright in #547
- Updating templates & broken links in the readme by @sage-wright in #555
- [TheiaEuk] Fix bug where String outputs were being passed as File for Snippy_variants by @cimendes in #574
- [TheiaProk] update tbp-parser to latest version by @sage-wright in #576
- [Create_Terra_Table] fix bug, and enable ability for users to provide their own file ending suffixes by @sage-wright in #575
- [theiacov] Add additional vadr output files & tarball; upgrade VADR docker by @kapsakcj in #556
- [ONT] Remove KMC by @sage-wright in #578
- [Create_Terra_Table] fix sample name identification bug by @sage-wright in #581
- [Freyja] - Update to version v1.5.1 and make it compatible with ILMN PE, ILMN SE and ONT by @cimendes in #548
- [Merlin_Magic] Expose optional inputs and all docker images used in Merlin to the user by @sage-wright in #562
- Add MPXV mutation context to Augur_PHB by @jrotieno in #526
- [theiacov_illumina wfs] Output aligned FASTQs without supplemental/secondary alignments by @kapsakcj in #582
- [TheiaProk - GAMBIT and GAMBIT_Query] Update to latest db v2.0.0-20240628 by @cimendes in #539
- [Mercury_Prep_N_Batch] Enable flu compatibility and move Mercury into its own GitHub repository by @sage-wright in #506
- Updating amrfinderplus docker to 3.12.8-2024-07-22.1 by @jrotieno in #586
- [Snippy_Variants and Snippy_Streamline_FASTA] Accept assembly_fasta as input by @cimendes in #569
- Updating mlst docker image to 2.23.0-2024-08-01 by @jrotieno in #587
- Upgrade PHB to version 2.2.0 by @jrotieno in #591
- [Mercury] Resolve bug when skipping NCBI submission by @sage-wright in #593
- [TheiaCoV wf defaults] update defaults: pangolin and some nextclade_dataset_tags to latest as of 2024-08-26 by @kapsakcj in #594
- [TheiaCoV_Illumina_SE] Fix kraken2 raw input and expose kraken2 dehosted outputs by @cimendes in #597
Full Changelog: v2.1.0...v2.2.0