-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dorado] New Dorado Basecalling Workflow Terra #659
base: main
Are you sure you want to change the base?
Conversation
outputs working and documentation updated, see ,https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/CDPH_Bioinformatics_Development/job_history/79417a5e-da8c-4fdc-aa61-f28de8490bba |
…e used at runtime; improved logging of dorado STDERR to a file; parsed explict model name from STDERR file or accept user input string; added dorado_log task output file
I will test 3 different workflows and report back:
EDIT: all of these wfs were run AFTER making the below commit |
TheiaProk_ONT ran successfully on the FASTQs produced by my test above with SUP dorado model 👍 https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/CDPH_Bioinformatics_Development/job_history/238d0f1f-fe13-4823-8846-b0774fb75e0c More confirmation the FASTQs produced by this wf are valid for downstream processing |
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
A new Dorado Basecalling Workflow, a GPU-accelerated pipeline for basecalling Oxford Nanopore
POD5
files. The workflow includes optional automatic model selection, SAM-to-BAM conversion, and demultiplexing into unique barcode fastq files, with outputs uploaded to a new user defined Terra table for further downstream analysis.⚡ Impacted Workflows/Tasks
This is a new workflow that does not impact any other workflows
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
This PR introduces the following changes:
use_auto_model
flag for automatic model selection.sup
,hac
,fast
).⚙️ Algorithm
POD5
files to SAM using GPU acceleration. Uses a new Dorado Staph-B Docker image v0.80https://github.com/StaPH-B/docker-builds/tree/master/dorado/0.8.0
➡️ Inputs
sup
,hac
,fast
).⬅️ Outputs
🧪 Testing
POD5
inputs and GPU resources.Test 1. With 9 Rabies pod5 files from 2 barcodes (manual model)
-https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/889322c2-19f0-4092-ac7f-4863e676b28a
Test 2. 24 pod5 files from 2 barcodes (manual model)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/9bef28ea-82ba-4406-8545-f32de7e07e02
test 3. 24 files from 2 barcodes (auto mode)
https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_FCombe_sandbox/job_history/cead789e-c737-4541-a6ed-d9b907493ee1
output terra table example
Suggested Scenarios for Reviewer to Test
use_auto_model
flag enabled.dorado_model
path and confirm outputs.kit_name
) to confirm error handling.🔬 Final Developer Checklist
🎯 Reviewer Checklist