Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniprotFinder: running chewBAACA without internet access #121

Open
fedex88 opened this issue Mar 9, 2022 · 1 comment
Open

UniprotFinder: running chewBAACA without internet access #121

fedex88 opened this issue Mar 9, 2022 · 1 comment
Assignees
Labels
Status: In Progress Has been assigned and is being worked on. Type: Question

Comments

@fedex88
Copy link

fedex88 commented Mar 9, 2022

Hello @rfm-targa,

While running chewBAACA-2.8.5 (installed via pip install --user) on a cluster where compute nodes don't have access to internet, I had the following error :

chewBBACA version: 2.8.5
Authors: Mickael Silva, Pedro Cerqueira, Rafael Mamede
Github: https://github.com/B-UMMI/chewBBACA
Wiki: https://github.com/B-UMMI/chewBBACA/wiki
Tutorial: https://github.com/B-UMMI/chewBBACA_tutorial
Contacts: [email protected]

=============================
  chewBBACA - UniprotFinder
=============================
Started at: 2022-03-09T09:59:47

Schema: scheme_104/schema_seed
Number of loci: 17686
Translating representative sequences...done.
Downloading list of reference proteomes...<urlopen error ftp error: TimeoutError(110, 'Connection timed out')>
<urlopen error ftp error: TimeoutError(110, 'Connection timed out')>
<urlopen error ftp error: TimeoutError(110, 'Connection timed out')>
<urlopen error ftp error: TimeoutError(110, 'Connection timed out')>
done.

Would it be possible to use local files instead of downloaded ones ? (maybe using a specific option that points to it)
Also, which is the required list of reference proteomes from uniprot?

Thank you so much for your help

Best,
Federica Palma

@rfm-targa rfm-targa self-assigned this Mar 9, 2022
@rfm-targa rfm-targa added Status: In Progress Has been assigned and is being worked on. Type: Question labels Mar 9, 2022
@rfm-targa
Copy link
Contributor

Hello @fedex88,

The UniprotFinder process needs internet access to download the reference proteomes for the taxon or taxa passed to the --taxa parameter. It downloads the README file at ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/ and selects the reference proteomes with a Species Name that contains any of the terms passed to the --taxa parameter. We can add an option to run it in offline mode where it accepts local files to annotate the loci. However, the process also sends requests to UniProt's SPARQL endpoint to get annotation terms based on protein exact matches. The offline mode would have to ignore the functionalities to annotate based on reference proteomes and exact matches through the SPARQL endpoint and would be a simple BLAST against a set of local Fasta files (or Genbank files). It is not difficult to implement and would be important for use cases such as yours. I will add it to the list of functionalities that we have to implement, but for now it will only be possible to run the process with internet access. I will update this issue when it has been implemented. Any suggestions to how the offline mode should work are welcomed.

Best,

Rafael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Has been assigned and is being worked on. Type: Question
Projects
None yet
Development

No branches or pull requests

2 participants