Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rasusa wrapper #152

Merged
merged 5 commits into from
Nov 16, 2020
Merged

add rasusa wrapper #152

merged 5 commits into from
Nov 16, 2020

Conversation

mbhall88
Copy link
Member

@mbhall88 mbhall88 commented Aug 6, 2020

This PR will eventually try and add a rasusa wrapper (it currently works). But it is not to be merged yet as it relates to work in snakemake/snakemake#532 and #150

Note: I also reordered some of the tests to try and maintain alphabetical ordering. Not sure if this is being adhered to anymore?

@vsoch
Copy link

vsoch commented Aug 6, 2020

This looks great! We will want to update for any possible changes after discussion in #153. For example, if we are able to build the container from the environment.yaml, you might not need to provide it. It's also not clear if we would want container: Dockerfile, leaving it out entirely, or something else.

@mbhall88
Copy link
Member Author

mbhall88 commented Nov 13, 2020

Just wondering if this PR is required for the container stuff anymore? I think you had a different way of testing this out didn't you @vsoch ?

If not, I'll remove the container stuff and contribute it as a wrapper for rasusa as one doesn't exist yet.

@vsoch
Copy link

vsoch commented Nov 13, 2020

I had a PR in to snakemake (which would be required here) but it looks like it's gone stale. I'm not sure if this work is still a goal for snakemake, or some other strategy is going to be used. Probably we should ask @johanneskoester.

@vsoch
Copy link

vsoch commented Nov 13, 2020

But @mbhall88 you can probably remove the container stuff and contribute as a wrapper without it! Don't let our slowness / this feature development hold you back! :)

@mbhall88 mbhall88 changed the title [WIP] add rasusa wrapper add rasusa wrapper Nov 13, 2020
@mbhall88 mbhall88 requested review from dlaehnemann and johanneskoester and removed request for dlaehnemann November 13, 2020 05:49
Copy link
Contributor

@dlaehnemann dlaehnemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recipe looks good. Just two questions:

  1. There's a lot of formatting going on in test.py. Are you sure you are using the latest black? Just want to avoid unnecessary changes, in case this will be reverted with the next best merge...
  2. Just out of curiosity: What can rasusa do for me, that samtools view -s cannot.

@mbhall88
Copy link
Member Author

mbhall88 commented Nov 15, 2020

The recipe looks good. Just two questions:

1. There's a lot of formatting going on in `test.py`. Are you sure you are using the latest `black`? Just want to avoid unnecessary changes, in case this will be reverted with the next best merge...

I'm using black, version 20.8b1 which seems to be the latest? Anyway, would the CI checks have detected if the formatting was incorrect?

2. Just out of curiosity: What can rasusa do for me, that [`samtools view -s`](https://www.htslib.org/doc/samtools-view.html) cannot.

The difference is that samtools/seqtk only subsample based on the number of reads, Whereas rasusa subsamples to a given (theoretical) coverage; it makes no assumptions about the lengths of your reads. Which is sort of fine if you have Illumina data and you know all your reads are the same length, and you can be bothered to calculate how many you would need to get to a certain coverage. But this breaks down when working with long reads where they are never all the same length.

I'm hoping to write this up as an application note in Bioinformatics or something similar soon.
I talk about this a little in the benchmark section.

@dlaehnemann
Copy link
Contributor

The GitHub Action does run a black check (not sure which version, would have to check the output of the GitHub Action run on this branch, I guess). And this PR passes, so it should all be fine and I'll merge.

And thanks for the elaboration on rasusa, this does sound good. It could even be used to take peaks out of crazily over-covered regions while leaving everything else alone. This could avoid slowing down downstream tools with excessive local coverage. Definitely worth a write-up!

@dlaehnemann dlaehnemann merged commit d8ac1e0 into snakemake:master Nov 16, 2020
@mbhall88
Copy link
Member Author

It could even be used to take peaks out of crazily over-covered regions while leaving everything else alone.

It won't really do this. For that, you would need an alignment file and to use something like VariantBam (see here for more info).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants