-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENSEMBL-SEQUENCE does not work for all species #3070
Comments
Also, I noticed the following in the wrapper: try:
shell("curl -sSf {url} > /dev/null 2> /dev/null")
except sp.CalledProcessError:
continue
shell("(curl -L {url} | gzip -d >> {snakemake.output[0]}) {log} If I understand this correctly, the file will be downloaded twice, is that right? Also, it is always decompressed automatically, which might lead to confusion if the specified output file is actually specified as |
And another related issue. I hope it's okay to report this here as well, as it's the same underlying problem, but in the variation wrapper "v3.13.6/bio/reference/ensembl-variation", see here. There, the path that I need to specify is
See https://plants.ensembl.org/info/data/ftp/index.html for the table where this URL is from. However, for the wrapper to correctly assemble the URL, I need to leave out the Furthermore, the species name is automatically capitalized in the wrapper, which also would lead to an error here. |
More updates, hope that's okay. My current workaround for this instead of trying to assemble the URL within the wrapper, I just offer an alternative rule that uses a user-provided URL directly for the download. Might be an easy solution here as well, by offering an optional param |
Can you make a PR with your suggested changes? |
Snakemake version
Snakemake: 8.15.2
Wrapper: "v3.13.6/bio/reference/ensembl-sequence"
Describe the bug
The path for downloading has a hard-coded structure in the wrapper:
This uses a hard check for
> 75
. However, for some species, the path structure differs, for instance A. thaliana is currently in plants release 59, but does not have the above hard-coded extrarelease
number in thespec
part of the filename.The correct file name is
but instead the wrapper is only checking for
which has the additional
59
that should not be there. Hence, the download fails. I think a simple fix is to avoid the hard-coded75
, and instead check both variants of the path.The text was updated successfully, but these errors were encountered: