-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rasusa wrapper #152
add rasusa wrapper #152
Conversation
This looks great! We will want to update for any possible changes after discussion in #153. For example, if we are able to build the container from the environment.yaml, you might not need to provide it. It's also not clear if we would want |
Just wondering if this PR is required for the container stuff anymore? I think you had a different way of testing this out didn't you @vsoch ? If not, I'll remove the container stuff and contribute it as a wrapper for |
I had a PR in to snakemake (which would be required here) but it looks like it's gone stale. I'm not sure if this work is still a goal for snakemake, or some other strategy is going to be used. Probably we should ask @johanneskoester. |
But @mbhall88 you can probably remove the container stuff and contribute as a wrapper without it! Don't let our slowness / this feature development hold you back! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recipe looks good. Just two questions:
- There's a lot of formatting going on in
test.py
. Are you sure you are using the latestblack
? Just want to avoid unnecessary changes, in case this will be reverted with the next best merge... - Just out of curiosity: What can rasusa do for me, that
samtools view -s
cannot.
I'm using
The difference is that samtools/seqtk only subsample based on the number of reads, Whereas rasusa subsamples to a given (theoretical) coverage; it makes no assumptions about the lengths of your reads. Which is sort of fine if you have Illumina data and you know all your reads are the same length, and you can be bothered to calculate how many you would need to get to a certain coverage. But this breaks down when working with long reads where they are never all the same length. I'm hoping to write this up as an application note in Bioinformatics or something similar soon. |
The GitHub Action does run a And thanks for the elaboration on rasusa, this does sound good. It could even be used to take peaks out of crazily over-covered regions while leaving everything else alone. This could avoid slowing down downstream tools with excessive local coverage. Definitely worth a write-up! |
It won't really do this. For that, you would need an alignment file and to use something like VariantBam (see here for more info). |
This PR will eventually try and add a
rasusa
wrapper (it currently works). But it is not to be merged yet as it relates to work in snakemake/snakemake#532 and #150Note: I also reordered some of the tests to try and maintain alphabetical ordering. Not sure if this is being adhered to anymore?