-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TheiaCoV] Split database from Kraken2_TheiaCoV task #670
base: main
Are you sure you want to change the base?
Conversation
…ments on krakren2 command call; improvememnts on target organism parse to report proportion of sc2 only if target org is sc2
@@ -20,7 +20,8 @@ workflow dehost_pe { | |||
input: | |||
samplename = samplename, | |||
read1 = ncbi_scrub_pe.read1_dehosted, | |||
read2 = ncbi_scrub_pe.read2_dehosted | |||
read2 = ncbi_scrub_pe.read2_dehosted, | |||
target_organism = "Severe acute respiratory syndrome coronavirus 2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For NCBI scrub workflows can we set this to be a workflow input and not hard coded on the task level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ncbi scrubber, please pass in workflow level variable and do not hard code on task level.
@@ -17,7 +17,8 @@ workflow dehost_se { | |||
call kraken.kraken2_theiacov as kraken2 { | |||
input: | |||
samplename = samplename, | |||
read1 = ncbi_scrub_se.read1_dehosted | |||
read1 = ncbi_scrub_se.read1_dehosted, | |||
target_organism = "Severe acute respiratory syndrome coronavirus 2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for workflow level input
Testing illumina PE here for WNV, MPox, Flu, Sars-Cov-2. |
Everything looks good in the test and the output is properly identified by kraken from what we would expect. One of the flu samples returned nothing from kraken, but I ran it against PHBv2.2.1 and it was the same thing so something with that sample is just slightly off. Run with isolated sample here |
This PR closes #563, closes #520
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
This PR addresses several issues that came up when the RSV target organism was set differently for RSV-A and RSV-B, i.e.
rsv_a_kraken_target_organism = "Respiratory syncytial virus"
andrsv_b_kraken_target_organism = "Human orthopneumovirus"
due to the use of an old database. As the existing docker image had an embedded database, it meant that each time the Kraken database would need to be updated, it would have to be done inside the docker container, which would not be efficient.This PR simply updates the existing
kraken2_theiacov
task to split the database from the container. The new database is custom built with human + refseq viral sequences as of 2024-08-28.No inputs or outputs have been altered other than what is strictly required to make the new code changes functional.
Behaviour change:
⚡ Impacted Workflows/Tasks
Impacted tasks
Impacted workflows
This PR may lead to different results in pre-existing outputs: Yes
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
⚙️ Algorithm
As explained above, the only change in algorithm is that the percentage of sc2 is now only reported if the target organism is SC2, otherwise this column is blank. To accomplish this the data type has been changed from Float to String. Impacts column sorting as it is now alpha-numeric.
➡️ Inputs
freyja_fastq new input:
⬅️ Outputs
Percent sc2 is now a string output, rather than a float output
🧪 Testing
Suggested Scenarios for Reviewer to Test
🔬 Final Developer Checklist
🎯 Reviewer Checklist