Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

union of targets should be DISTINCT #143

Closed
VladimirAlexiev opened this issue Jan 7, 2022 · 5 comments
Closed

union of targets should be DISTINCT #143

VladimirAlexiev opened this issue Jan 7, 2022 · 5 comments

Comments

@VladimirAlexiev
Copy link

(Thread: https://lists.w3.org/Archives/Public/public-shacl/2022Jan/)

https://www.w3.org/TR/shacl/#targets says: "union of terms produced by the individual targets that are declared by the shape".

Say I have a shape with the following targeting:

sh:targetClass :Foo;
sh:targetSubjectsOf :bar, :baz;
sh:targetObjectsOf :blor;

Say a node matches all of these conditions: will it be selected for validation once and not 4 times?

I.e., is the "union of terms" supposed to be DISTINCT? (UNION in mathematics is distinct, but not in SPARQL)

@HolgerKnublauch> (TQ API) is using a Set which means each target node will only be validated once even if in multiple targets at the same shape.
I believe this is following the intention of the spec. Does any implementer here disagree?

Vladimir: Agreed. But still, the spec should mention DISTINCT.
I'll post this here as an "SHACL Erratum", as per #103


Ashley Sommer> PySHACL does the same. The final collection of targets is a Set object, which deduplicates any identical nodes that are added.


Irene Polikoff> To me, this sounds more like an implementation question, rather than a standards question.

Vladimir: The number of Validation Results will be different (unless targets are distinct, there will be duplicate results).
Even if one stored Validation Results in a repo, they would not be deduplicated since it's not likely Results can use deterministic URLs (not blank nodes or UUID URNs).

The impact on performance will be a linear slowdown.
If that shape causes a lot of other shapes to be invoked, that can be very significant.

@HolgerKnublauch
Copy link
Contributor

The spec clearly states that we are talking about sets of terms, and sets are mathematical constructs where the union is another set. "The target of a target declaration is the set of RDF terms".

There would be no harm in making this clearer by adding a word here, but I am not sure why people refer to SPARQL's UNION keyword here. SPARQL is about bindings, not sets of terms.

@afs
Copy link

afs commented Jan 7, 2022

I agree it is clear at the moment. A union of things that are sets is a set hence unique terms.

Each of the target definitions says "set" as well, except sh:targetNode which is a singleton.

@afs
Copy link

afs commented Jan 7, 2022

SPARQL evaluation works with multi-sets -- set + cardinality of each element. "union" of multi-sets sums the cardinality of elements.

@VladimirAlexiev
Copy link
Author

@hmottestad

At the moment rdf4j ShaclSail splits shapes by both target and constraint. This means that we will produce two validation results if a node matches two target declarations. The validation is run in parallel, so performance may not be adversely affected and it could also be that the simplifications of only having to use one target declaration makes things faster than having to consider multiple target declarations at once.

He posted eclipse-rdf4j/rdf4j#3584

@HolgerKnublauch
Copy link
Contributor

In preparation for a potential future SHACL WG I would like to close GitHub issues that were mainly just questions. Please reopen if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants