-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve validation rules for XSD integer datatypes #142
Comments
Thinking out loud, I expect it would be hard for SHACL 1.1 to break SHACL 1.0 semantics. So for this to work without breaking the existing logic I guess SHACL 1.1 could introduce a second, optional flag for sh:DatatypeConstraintComponent to also accept, e.g. xsd:int when xsd:integer was requested. So maybe sh:datatype xsd:integer ; sh:datatypeNormalization true ; Furthermore, triple stores could indicate whether they can round-trip datatypes correctly and if not the SHACL engine can set the above flag true as the default. |
I don't necessarily agree that it's breaking 1.0 semantics for you cannot break what's already broken. I think what stores do make sense from the perspective of XSD and robustness principle. All types derived from Thus, I conclude that the current behaviour of SHACL is flawed and deserves as minor version fix.
I don't think this is practical. A web app which dereferenced the resource and shape has no idea whatsoever about backing triplestores |
Useful table. A second flag to modify the constraint is better IMO. As this does work today in many places, if not all, and some systems may currently wish to validate the exact datatype used ( The majority user feedback on TDB1 was to keep the datatype. Normalizing the lexicial part was just accepted. |
I think I will stand by "broken" since the base semantics of datatypes is defined by XSD spec. The stores will turn derived literal with XSD types to Like I said, if I create a graph and it does not validate after round tripping from my database then this is a bug, not a feature. But I hear you, if the concern is breaking users depending on the current spec, then I'm probably proposing a change for Ah, and you make a good point about comparing the lexical form too. We could consider more holistic improvements, for example also to |
@tpluscode well, if the round-tripping does not work then it's a problem that the graph stores need to solve, not languages like SHACL downstream. I guess other systems like OWL (with its xsd:nonNegativeIntegers everywhere) would have similar issues. Fix it at the lowest possible level. I like the suggested sh:datatypeByValue property. |
I'm surprised no-one else flagged this, but I think your example is backwards. That is, even were subtype inferences active, it would be correct for On the other hand, subtype inferencing might enable |
@TallTed I see your point but it can also be seen as a use case for validation by value. F&O does not treat derived types of integers specially and any operation returns xsd:integers, |
I think stores that do not return exactly the datatypes written to them are "broken". Surely there are some scenarios (IoT, constrained sensors) where infinite length integers are not implemented, and you want to talk to them in terms of bytes or ints. @tpluscode argues about byte and int being sub-datatypes of integer. But that itself is a sub-datatype of decimal. So why the same stores don't further "normalize" all integers to decimals? XSD had good reasons to define all these varieties of numbers. For a much more unpleasant alternative, look at JSON and schema.org that have just one badly underspecified "number". So I think that stores that collapse part of the XSD hierarchy on save, are doing the wrong thing. |
What
With SHACL 1.0 the
sh:datatype
constraint mandates exact match. That is, given a property shape withsh:datatype xsd:int
, it will fail the check against the a value""10"^^xsd:integer
even though the typexsd:int
is subtype ofxsd:integer
As a reminder, here's the XSD types hierarchy from https://www.w3.org/TR/xmlschema-2/
Why
This is problematic because it turns out that multiple triplestore implementations will proactively "normalise" integer types on insert. As above, inserting
"10"^^xsd:int
will in fact store"10"^^xsd:integer
in the database. I tried Fuseki (TDB1), Stardog and Allegrograph. It seems that all integer and derived datatypes will treated this way as long as their lexical form matches the given datatype."-10"^^xsd:negativeInteger
becomes"-10"^^xsd:integer
etc.This makes it impossible to roundtrip values created by a shape as shown below because when the resource comes back from the store, the type of
schema:age
will no longer match that of the property shapeProposal
I propose to prescribe to shacl processors that they should also "normalise" the property shapes but deconstructing xsd integer datatypes to
xsd:integer
and appropriatemin/max
as defined by XML Schema (unless explicitly provided by the shape itself)sh:datatype
sh:minInclusive
sh:maxInclusive
xsd:long
-9223372036854775808
9223372036854775807
xsd:int
-2147483648
2147483647
xsd:short
-32768
32767
xsd:byte
-128
127
xsd:nonPositiveInteger
0
xsd:negativeInteger
-1
xsd:nonNegativeInteger
0
xsd:positiveInteger
1
xsd:unsignedLong
0
18446744073709551615
xsd:unsignedInt
0
4294967295
xsd:unsignedShort
0
65535
xsd:unsignedInt
0
255
The text was updated successfully, but these errors were encountered: