Skip to content

Commit

Permalink
Inlcude zeros in SHACL regex bounds
Browse files Browse the repository at this point in the history
We included the regex pattern as-is from the input. Instead, with this
patch, we parse it from the input and re-render it into the canonical
form so that many more regex engines can work with it.

For example, in the input, we omit the minimum bound 0 (*e.g.*,
``{,4}``), which breaks with the Java regex engine beneath
the SHACL validator. Now, the pattern is correctly rendered with an
explicit 0 (``{0,4}``).

Discovered in [aas-core-meta issue 342].

[aas-core-meta issue 342]: aas-core-works/aas-core-meta#342
  • Loading branch information
mristin committed Aug 22, 2024
1 parent bd310fc commit c8bf07d
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 35 deletions.
29 changes: 28 additions & 1 deletion aas_core_codegen/rdf_shacl/shacl.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from aas_core_codegen import intermediate, specific_implementations, infer_for_schema
from aas_core_codegen.common import Stripped, Error, assert_never, Identifier
from aas_core_codegen.parse import retree as parse_retree
from aas_core_codegen.rdf_shacl import (
naming as rdf_shacl_naming,
common as rdf_shacl_common,
Expand Down Expand Up @@ -214,7 +215,33 @@ def _define_property_shape(
# region Define patterns

for pattern_constraint in pattern_constraints:
pattern_literal = rdf_shacl_common.string_literal(pattern_constraint.pattern)
# NOTE (mristin):
# We need to render the regular expression so that the pattern appears in
# the canonical form. The original pattern in the specification might be written
# in Python dialect, which does not translate directly to many Regex Engines.
#
# For example, repetition bounds can be given with 0 omitted (*e.g.*, ``{,4}``),
# while SHACL and Java need an explicit zero (``{0, 4}``). Our standard renderer
# puts an explicit zero.

regex, parse_error = parse_retree.parse([pattern_constraint.pattern])
if parse_error is not None:
return None, Error(
prop.parsed.node,
f"(mristin): "
f"The pattern could not be parsed: {pattern_constraint.pattern}",
)
assert regex is not None

rendered_pattern = parse_retree.render(regex)

assert len(rendered_pattern) == 1 and isinstance(rendered_pattern[0], str), (
"Expected strictly only a string output from rendering "
f"a string pattern (*e.g.*, no FormattedValues), "
f"but got: {rendered_pattern}"
)

pattern_literal = rdf_shacl_common.string_literal(rendered_pattern[0])

stmts.append(Stripped(f"sh:pattern {pattern_literal} ;"))

Expand Down
Loading

0 comments on commit c8bf07d

Please sign in to comment.