Multi-threading approach #55

Teklu67 · 2022-03-28T17:55:44Z

Hi,
This is a very useful program but it is taking long time to sub-sample from a large fastq file. I am running it on a server and would like to run it using multi-threading but I am novice to programming and not sure how to do that. Any help please?
Thanks,

mbhall88 · 2022-03-29T01:01:27Z

Hi @Teklu67. When you say "a long time", how long are we talking? And how large is your file?

Teklu67 · 2022-03-29T02:43:53Z

Thanks so much for the quick response. It finished sampling 30x from a fq of 690 Gb (60x coverage) in 2 days. Because I have the resources to run using several threads I thought it will finish much faster if there was an option for multi-threading. Thanks!

mbhall88 · 2022-03-29T23:47:43Z

Wow, that's a very big fastq file! Is it compressed (e.g., gzip)?

How did you install rasusa?

Teklu67 · 2022-04-01T05:25:02Z

Yes it is for tetraploid wheat and compressed .gz format. I installed it through conda.

mbhall88 · 2022-04-02T00:05:25Z

Is your data Illumina?

There's not really too much I can offer in the way of speeding rasusa up sorry.

At some point I will look into whether multi-threading the IO is possible (i.e. batching reads).

I'll leave this open and add it to my list of things to investigate in the coming months. Sorry, I can't do it faster, but have a lot of other research projects I am trying to juggle.

However, if you (or anyone else) would like to have a go at it, I would be very happy to receive a pull request.

Teklu67 · 2022-04-05T15:48:28Z

It is ONT data. That is ok, thank you for your time

mbhall88 · 2022-04-06T00:01:43Z

In the mean time, I would suggest maybe trying to split the file up into subsets, and then randomly subsample each subset.

mbhall88 · 2024-11-26T11:49:31Z

Another suggestion: I suspect most of the runtime is (de)compressing the data. Switching to zstd instead of gzip should drastically improve time spent on decompression

mbhall88 added enhancement New feature or request help wanted Extra attention is needed labels Apr 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threading approach #55

Multi-threading approach #55

Teklu67 commented Mar 28, 2022

mbhall88 commented Mar 29, 2022

Teklu67 commented Mar 29, 2022 •

edited

Loading

mbhall88 commented Mar 29, 2022

Teklu67 commented Apr 1, 2022 •

edited

Loading

mbhall88 commented Apr 2, 2022

Teklu67 commented Apr 5, 2022

mbhall88 commented Apr 6, 2022

mbhall88 commented Nov 26, 2024

Multi-threading approach #55

Multi-threading approach #55

Comments

Teklu67 commented Mar 28, 2022

mbhall88 commented Mar 29, 2022

Teklu67 commented Mar 29, 2022 • edited Loading

mbhall88 commented Mar 29, 2022

Teklu67 commented Apr 1, 2022 • edited Loading

mbhall88 commented Apr 2, 2022

Teklu67 commented Apr 5, 2022

mbhall88 commented Apr 6, 2022

mbhall88 commented Nov 26, 2024

Teklu67 commented Mar 29, 2022 •

edited

Loading

Teklu67 commented Apr 1, 2022 •

edited

Loading