v1.3.0
Public Health Bioinformatics v1.3.0 Release Notes
This minor release introduces two new workflows, improves on several workflows, and resolves various bugs
Full release notes can be found here.
🆕 New workflows:
-
TheiaCoV_FASTA_Batch_PHB
- This workflow implements TheiaCoV_FASTA for many SARS-CoV-2 samples at once.
- This a set-level workflow that populates the results to a sample-level data table in Terra.bio
- Currently, this workflow only runs Pangolin4 and NextClade
- Import the workflow from Dockstore
-
Rename_FASTQ_PHB
- This workflow is a utility to quickly and easily rename a set of FASTQ files, either paired-end or single-end.
- Import the workflow from Dockstore
🚀 Changes to existing workflows:
-
TheiaCoV_ONT_PHB
- Influenza is now supported. Use
"flu"
for theorganism
optional input String parameter."sars-cov-2"
and"HIV"
tracks are unchanged.
- Influenza is now supported. Use
-
TheiaProk Workflow Series
- If user-input (
expected_taxon
) or predicted taxon by Gambit belongs to theShigella
genus, the Extensively Drug-Resistant phenotype is predicted using the new resfinder pointfinder database. - If user-input (
expected_taxon
) or predicted taxon by Gambit is the Mycobacterium tuberculosis species, bcftools indexes and merges all potential VCF files created by TbProfiler (both .bcf and .gz files). - Kraken2 has been added as an optional module (except for TheiaProk_ONT_PHB). If
call_kraken
istrue
, a database must be provided throughkraken_db
. - Two new optional inputs were added to control ANIm behaviour:
ani_threshold
(default85.00
) andpercent_bases_aligned_threshold
(default70.00
).
- If user-input (
-
TheiaCoV_FASTA_PHB
- The list of allowed input
organism
now includes"sars-cov-2"
(default),"rsv_a"
,"rsv_b"
,"WNV"
,"MPXV"
and"flu"
.
- The list of allowed input
-
TheiaCoV_Illumina_PE_PHB
- If organism is set as
"flu"
, the workflow searches for antiviral mutations in the HA, NA, PA, PB1 and PB2 assembly segments, targeting the following 10 antivirals.: A_315675, compound_367, Favipiravir, Fludase, L_742_001, Laninamivir, Peramivir, Pimodivir, Xofluza and Zanamivir.
- If organism is set as
-
All Illumina SE and PE Workflows
- A new optional input,
read_qc
, to allow the user to decide betweenfastq_scan
andfastqc
for the evaluation of read quality. The affected workflows are: TheiaCoV_Illumina_PE_PHB, TheiaCoV_Illumina_SE_PHB, TheiaProk_Illumina_SE_PHB, TheiaProk_Illumina_PE_PHB, TheiaMeta_Illumina_PE_PHB and Freyja_FASTQ_PHB.
- A new optional input,
-
CZGenEpi_Prep_PHB
- Instead of extracting the
sample_is_private_column_name
and thegisaid_id_column_name
columns, these columns are now generated by the program using already-provided inputs and by the newis_private
Boolean variable which is used to set the value for all samples in the set. The field "GISAID ID (Public ID) - Optional" will now reflect the GISAID syntax for Virus Name.
- Instead of extracting the
Docker container updates:
- AMRFinderPlus has been updated to version v3.11.20 and database 2023-09-26.1
- tbp-parser has been updated to version 1.2.0
- Freyja has been updated to version 1.4.8
- ts_mlst database has been updated as of January 2024
- Gambit has been updated to version 1.3.0, including its database files
- Pangolin4 has been updated to version 4.3.1-pdata-1.23.1
- IRMA has been updated to version 1.1.3
Tag updates:
- SARS-CoV-2 Nexclade Dataset Tag has been updated to
2023-12-03T12:00:00Z
🐛 Bug fixes and small improvements:
- kSNP3_PHB: The
ksnp3_core_vcf
output has been renamed toksnp3_vcf_ref_genome
for readability. Additionally, two new outputs are provided:ksnp3_vcf_snps_not_in_ref
andksnp3_vcf_ref_samplename
. - TheiaProk Workflow Series: The MIDAS task was adjusted to reduce logging, and therefore the size of the log file, aiding debugging & reducing storage costs.
- TheiaMeta_Illumina_PE_PHB: A new task Krona was added for the visualization of the Kraken2 reports.
- Mercury_Prep_N_Batch: The
excluded_samples.tsv
is now printed to the execution log file, aiding debugging. - TheiaCoV Workflow Series: The
nextclade_lineage
output now populates correctly for SARS-CoV-2. Additionally, thenexclade_qc
field is now exposed as an output. - Augur_PHB: The AUGUR refine input
clock_filter_iqd
has been reverted to the previous default value of 4. - Kraken Standalone Workflows: A new task Krona was added for the visualization of the Kraken2 reports.
- TheiaValidate_PHB: TheiaValidate now outputs a table with validation-criteria failures only. Additionally, a new input was added that can translate different column names between tables to enable comparison.
- TheiaCoV_ONT_PBH: If a sample fails quality check with read screening, this will no longer cause the workflow to fail. Instead, it will finish with an appropriate message.
- Samples_To_Ref_Tree_PHB: The
organism
input has been renamed tonextclade_dataset_name
for better clarity. - Various workflows: Call caching was disabled in the following workflows: BaseSpace_Fetch_PHB, Transfer_Column_Content_PHB, Assembly_Fetch_PHB, Snippy_Streamline_PHB and TheiaValidate_PHB.
What's Changed
- updated VCF output file renaming in kSNP3 task by @kapsakcj in #207
- reduce unnecessary logging in MIDAS task by @kapsakcj in #210
- update default amrfinderplus docker image to v3.11.20 and db 2023-09-26.1 by @kapsakcj in #229
- TheiaCoV_ONT_PHB Influenza Track by @jrotieno in #233
- TheiaCoV_FASTA_Batch: TheiaCoV_FASTA, for many samples at once by @sage-wright in #238
- Add krona task to TheiaMeta_Illumina_PE by @cimendes in #213
- added 2 QC thresholds to ANI task to reduce false positives by @kapsakcj in #168
- Resfinder improvements, added support for Shigella spp., added XDR Shigella prediction by @kapsakcj in #159
- disable call caching for various workflows by @kapsakcj in #251
- Mercury_Prep_N_Batch: print the excluded_samples.tsv and update Docker to avoid Google SDK warning by @sage-wright in #220
- Nextclade Output Added by @DOH-HNH0303 in #239
- TheiaCoV_FASTA: Adding five new organisms by @jrotieno in #194
- Update task_augur_refine iqd back to 4 by @jrotieno in #268
- TheiaCoV Illumina PE: Identify Influenza Antiviral Resistance Mutations in Assemblies by @jrotieno in #252
- [New Utility] Workflow to rename FASTQ files (non-destructive) by @cimendes in #267
- [TheiaCoV_Fasta_Batch] Substitute FASTA concatenating task to ensure proper sample_id propagation by @cimendes in #274
- Kraken2 Standalone: add krona visualisation by @cimendes in #225
- TheiaValidate_PHB: new features and new Docker image from TheiaValidate repository by @sage-wright in #255
- TheiaProk TB: new VCF output and modification to the coverage report by @sage-wright in #245
- TheiaCoV_ONT: prevent failure by coercing files into strings by @sage-wright in #288
- update default freyja docker image to 1.4.8 for multiple tasks by @kapsakcj in #289
- FastQC added as an optional module in all Illumina_PE and Illumina_SE workflows by @sage-wright in #260
- update docker to version tag 2.23.0-2024-01 by @cimendes in #293
- [TheiaProk Workflows] Add Kraken2 as optional module by @cimendes in #286
- CZGenEpi_Prep_PHB: implementing user-requested changes by @sage-wright in #244
- Update Gambit database files to version 1.3.0 by @kevinlibuit in #292
- [PHB Release 1.3.0] update version and docker tags (nexclade sc2, pangolin, tbp-parser 1.1.7) by @cimendes in #296
- [PR Template Update] Updating template per identified dev process improvements by @kelseykropp in #300
- [TheiaProk suite] Patch fix: change type of kraken2_report to be string in taxon_table task by @cimendes in #297
- Samples_To_Ref_Tree_PHB: changed "organism" input to "nextclade_dataset_name" by @jrotieno in #303
- theiacov_fasta wf logic change for flu by @kapsakcj in #305
- restore vadr_num_alerts string output to theiacov_fasta workflow by @kapsakcj in #307
New Contributors
- @DOH-HNH0303 made their first contribution in #239
- @kelseykropp made their first contribution in #300
Full Changelog: v1.2.1...v1.3.0