Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running abstractor against uniprot with -m all errors #11

Open
CFGrote opened this issue Jun 17, 2021 · 5 comments
Open

Running abstractor against uniprot with -m all errors #11

CFGrote opened this issue Jun 17, 2021 · 5 comments

Comments

@CFGrote
Copy link

CFGrote commented Jun 17, 2021

Hi,
I was trying to get an abstraction for uniprot and ran

abstractor -s https://sparql.uniprot.org/sparql -o uniprot_abstraction.all.ttl -m all -vvvvv

The log reads:

DEBUG:root:Get entities and relation
DEBUG:root:
SELECT DISTINCT ?source_entity ?relation ?target_entity ?mother_source ?mother_target
WHERE {
    # Get entities
    ?instance_of_source a ?source_entity .
    ?instance_of_target a ?target_entity .
    # Relations
    ?instance_of_source ?relation ?instance_of_target .

    OPTIONAL {{
        ?source_entity rdfs:subClassOf ?mother_source .
    }}
    OPTIONAL {{
        ?target_entity rdfs:subClassOf ?mother_target .
    }}
}

After some time (~45 min == uniprot query timeout limit) it finally errors out with

Traceback (most recent call last):
  File "/home/grotec/.conda/envs/default/bin/abstractor", line 118, in <module>
    Abstractor().main()
  File "/home/grotec/.conda/envs/default/bin/abstractor", line 62, in main
    rdf.add_entities_and_relations(sparql.process_query(library.entities_and_relations))
  File "/home/grotec/.conda/envs/default/lib/python3.7/site-packages/libabstractor/SparqlQuery.py", line 182, in process_query
    return self.parse_sparql_results(self.execute_sparql_query(query))
  File "/home/grotec/.conda/envs/default/lib/python3.7/site-packages/libabstractor/SparqlQuery.py", line 99, in execute_sparql_query
    return endpoint.query().convert()
  File "/home/grotec/.conda/envs/default/lib/python3.7/site-packages/SPARQLWrapper/Wrapper.py", line 1107, in query
    return QueryResult(self._query())
  File "/home/grotec/.conda/envs/default/lib/python3.7/site-packages/SPARQLWrapper/Wrapper.py", line 1087, in _query
    raise e
  File "/home/grotec/.conda/envs/default/lib/python3.7/site-packages/SPARQLWrapper/Wrapper.py", line 1073, in _query
    response = urlopener(request)
  File "/home/grotec/.conda/envs/default/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/grotec/.conda/envs/default/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/grotec/.conda/envs/default/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/grotec/.conda/envs/default/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/grotec/.conda/envs/default/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/home/grotec/.conda/envs/default/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 502: Proxy Error

Is this behaviour to expected? If so, user should be made aware of it. If not, can it be fixed?

Thanks!

@mboudet
Copy link
Contributor

mboudet commented Jun 22, 2021

Hello,
Sorry for the delay, for some reason I'm not getting notifications from this repository...
Like you said, it seems like the query is too computation-intensive, and get cut-off by the proxy.
Could you try using the "-m batch" option? It should break the query into smaller subqueries (one for each entity)

Hopefully it should work... I have been meaning to implement LIMIT and OFFSET for these cases, but did not have the time yet.

@CFGrote
Copy link
Author

CFGrote commented Jun 22, 2021 via email

@mboudet
Copy link
Contributor

mboudet commented Jun 23, 2021

So, this is interesting. A far as I can see, this is not an issue with abstractor itself.

It seems this specific request

SELECT DISTINCT ?entity
WHERE {
    ?instance a ?entity .
}

has a weird behavior in uniprot. Instead of returning the results, it return a request, and then return the results in another call.
(You can see it by directly typing the request on the webpage)

If I modify the request (like removing DISTINCT, or adding an empty column on the side), it works.

So, I'm kinda stumped. I contacted uniprot for more information (it could be a bug, it could be an anti-bot feature...)
I could 'fix' it in abstractor by forcing a different request when the target is the uniprot endpoint, bit it's frankly a dirty fix.

I'll wait for an answer from uniprot, and meanwhile see if I can generate an abstraction for you using some local changes.

@mboudet
Copy link
Contributor

mboudet commented Jun 24, 2021

I got an answer from the UNIPROT. Since this query is costly, they deal with it differently (redirect to another query)
However, it's currently bugged (no results), since a file is missing..

According to them, it should be fixed in ~ 2 weeks.

@CFGrote
Copy link
Author

CFGrote commented Jun 24, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants