Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank node behavior when using SPARQL #109

Open
hsolbrig opened this issue Nov 12, 2020 · 3 comments
Open

Blank node behavior when using SPARQL #109

hsolbrig opened this issue Nov 12, 2020 · 3 comments

Comments

@hsolbrig
Copy link
Contributor

At the moment, different ShEx implementations exhibit different behaviors when crossing BNodes in SPARQL.

PyShEx has three options:

  1. Throw an error when attempting to submit a SPARQL query with a BNode subject or object
  2. Assume that the SPARQL endpoint maintains persistent BNodes (which may cause a hang / timeout if not true)
  3. Take advantage of the GraphDB specific solution

Shex.js only implements option 2)

Not sure on other implementations

Do we want to specify a consistent behavior across interpreters? If so, what should that be?

@gkellogg
Copy link
Contributor

I've always voiced the opinion that it should be illegal to use a blank node as a ShEx starting point, as in RDF, there is no expectation that one used in a serialization will be maintained within a datastore; I think it should be illegal. This is the use case skolem IDs were created for, although I'm not a great fan of those, either.

Better to use a query to identify a starting node, where the query would result in the desired node.

@ericprud
Copy link
Contributor

I think that's a separate issue though; this is about how you practically re-visit a bnode you got in response to a previous query.
This is an issue for remote faceted browsing, ShEx validation, and anyone else iteratively querying a SPARQL endpoint.

@ericprud
Copy link
Contributor

ericprud commented Nov 20, 2020

I'm currently adding both arrival path and disambiguation code in the ShEx.js SPARQL interface. This allows it to:

  1. remember how it got to any bnode
  2. distinguish all of the visited bnodes from each other.

Wikidata (augmented) example:

wd:Q313093 <P999> _:a .
_:a
  # works
  <P2860> _:a ; # apparently, a bare blank node stands for unknown value
  # advisors
  <P184> wd:Q123 , _:1e_____ , _:xe_____ , _:ye_____ , _:1cd__2g , _:1cd__2f , _:1cdef2g , _:1cdef2f .

# advisors (mostly bnodes to exercise disambiguator)
wd:Q123                                                         <P735> "a" , "b" .
_:1e_____ <P000> wd:Qe                                        ; <P735> "abc" .
_:xe_____ <P000> wd:Qe                                        ; <P735> "abc" .
_:ye_____ <P000> wd:Qe                                        ; <P735> "abc" .
_:1cd__2g <P000> wd:Qc , wd:Qd                 ; <P001> wd:Qg ; <P735> "abc" .
_:1cd__2f <P000> wd:Qc , wd:Qd                 ; <P001> wd:Qf ; <P735> "abc" .
_:1cdef2g <P000> wd:Qc , wd:Qd , wd:Qe , wd:Qf ; <P001> wd:Qg ; <P735> "abc" .
_:1cdef2f <P000> wd:Qc , wd:Qd , wd:Qe , wd:Qf ; <P001> wd:Qf ; <P735> "abc" .

The data structure is (JSON liberalized to include RDF terms) to identify e.g. _:1cdef2g is

{ start: wd:Q313093, path: [
  {p:<P999>}, # no ambiguity
  {p:<P184>, unique: {
     <P000>: [wd:Qc, wd:Qd],
    <P001> = [wd:Qg]
   }
]

which allows you to select for _:1cdef2g ?p ?o like:

SELECT ?1 ?p ?o WHERE {
  wd:Q313093 <P999> ?0 . # no ambiguity
  ?0 <P184> ?1 .
  ?1 <P000> wd:Qc , wd:Qd . ?1 <P001> wd:Qg . # disambiguate
 FILTER NOT EXISTS {?1 <P000> ?2 FILTER (NOT (?2 IN (wd:Qc, wd:Qg)) }
  ?1 ?p ?o
}

_:1e_____, _:xe_____ , and _:ye_____ are provably interchangeable so the data structure for the former needs to indidate that it's serving for three:

{ start: wd:Q313093, path: [
  {p:<P999>}, # no ambiguity
  {p:<P184>, unique: {
     <P001> = [wd:Qe]
    }, proxies: [ _:xe_____ , _:ye_____ ]
  }
]

and _:xe_____ , and _:ye_____ simply execute the query for _:1e_____.

I haven't tested for corefs, which would be another way to disambiguate AND might prove that 1e, xe and ye aren't all interchangable, but we'd only have to do those tests iff the schema included inverse arcs in the right places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants