-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input Streaming #578
Comments
Thanks Min, I'm aware of the streaming capabilities of dragen. See https://github.com/umccr/cwl-ica/blob/main/schemas/fastq-list-row/1.0.0/fastq-list-row__1.0.0.yaml#L26-L42, which allows the read_1 and read_2 attributes to be strings (such as presigned urls) that are then inserted into the fastq list csv. Unfortunately, streaming capabilities are dependent on both the workflow engine and the orchestration engine. While cwltool provides the 'streamable' option it's up to the orchestration engine to implement it. ICAv2 downloads all the inputs into a local 'scratch' space first and then streams the inputs from there. See the following blockers regarding streaming data on ICAv2 that I've raised:
Furthermore we have done some tests on using presigned urls and streaming and it's actually not that efficient (if at all), the local fsx instance downloads data pretty quickly and has a fast I/O so it's potentially faster than streaming. |
Thanks Alexis. I have learnt something. Is the orchestration engine from icav2, not cwltool? I don't have access rights to the umccr-illumina/ica_v2 repo. 404 |
Ah okay I will arrange to fix that re the 404 error. Yes it is cwltool but they have altered it to be dragen compatible and it runs tasks through a kubernetes wrapper rather than through docker. It also runs a bunch of non-cwl pre-steps to configure the analysis runtime. I would say that in my experience that the streaming is no faster unless you're only taking a chunk of the file. |
You probably already know this. Just put a note here as something to consider in the future.
ref
The text was updated successfully, but these errors were encountered: