-
Notifications
You must be signed in to change notification settings - Fork 8
Shape Validation Algorithm
Mapping Triples to TripleConstraints
The specification defines satisfies but does not provide an efficient way to test that.
The algorithm here relies on regular bag expressions (explained below).
It is necessarily complex as it accepts arbitrarily complex expressions such as (:foo . | :bar .)+
or foaf:name . | foaf:givenName . + ; foaf:familyName
.
Given an intentionally pathological shape
<S1> {
(
<p1> . ; # TC1
<p2> . # TC2
| <p3> . ; # TC3
<p2> .+ # TC4
) ;
<p2> [1] # TC5
}
and an input neighborhood
<n1> <p2> 1, #T1
2, #T2
3 ; #T3
<p3> 4 . #T4
, we can ignore the value restrictions and create a mapping from TC -> T based on predicate alone, a la:
TC | [T] |
---|---|
1 | ∅ |
2 | T1, T2, T3 |
3 | T4 |
4 | T1, T2, T3 |
5 | T1, T2, T3 |
It's actually more efficient to record the mapping from T -> TC:
T | [TC] |
---|---|
1 | TC2, TC4, TC5 |
2 | TC2, TC4, TC5 |
3 | TC2, TC4, TC5 |
4 | TC3 |
and pick exactly one assignment of T -> TC from the above list.
Possible assignments of T to TC can be restricted to those which satisfy the single-occurrence regular bag expression (SORBE) (12|34+)5
.
A regular bag expression can be satisfied by taking scanning the input for any matching symbol (vs. "regular" regular expression where the state machine accepts only the next token).
Theorem 8.3 in http://researchers.lille.inria.fr/~staworko/papers/staworko-arxiv-shex.pdf shows a PTIME algorithm to evaluate SORBEs.
Now we "count" our way through the assignments and try to satisfy the SORBE:
T | TC |
---|---|
1 | TC2 |
2 | TC2 |
3 | TC2 |
4 | TC3 |
produces the ordered) series: 2223
, which doesn't match the SORBE.
T | TC |
---|---|
1 | TC4 |
2 | TC2 |
3 | TC2 |
4 | TC3 |
produces the ordered) series: 4223
, which doesn't match the SORBE.
...
T | TC |
---|---|
1 | TC4 |
2 | TC4 |
3 | TC5 |
4 | TC3 |
produces the ordered) series: 3445
, which matches the SORBE. A candidate solution is
<S1> {
(
<p1> . ; # TC1 -> ∅
<p2> . # TC2 -> ∅
| <p3> . ; # TC3 -> T4
<p2> .+ # TC4 -> T1, T2
) ;
<p2> [1] # TC5 -> T3
}
But wait, the values don't match. Reject that and keep going until we get to:
T | TC |
---|---|
1 | TC5 |
2 | TC4 |
3 | TC4 |
4 | TC3 |
which produces also produces the (ordered) series: 3445
, which matches the SORBE. The solution:
<S1> {
(
<p1> . ; # TC1 -> ∅
<p2> . # TC2 -> ∅
| <p3> . ; # TC3 -> T4
<p2> .+ # TC4 -> T2, T3
) ;
<p2> [1] # TC5 -> T1
}
satisfies all value expressions.