Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no notion of order in RDF mappings, and this has implications for the RDF output #135

Open
bjdmeest opened this issue Jul 3, 2024 · 3 comments
Assignees

Comments

@bjdmeest
Copy link
Member

bjdmeest commented Jul 3, 2024

raised by @chrdebru

eg when 2 triplesmaps write to the same collection, it might be needed to specify the order of execution

@bjdmeest
Copy link
Member Author

bjdmeest commented Jul 3, 2024

Christophe will make a good example

@chrdebru
Copy link
Contributor

chrdebru commented Jul 3, 2024

The title is misleading: it should be "There is no notion of order in RDF mappings, and this has implications for the RDF output."

data1.json
data2.json
mapping.ttl.txt
mapping-rev.ttl.txt

The mapping file has an extension .txt because I cannot upload .ttl files.

We can not rely on the order in which triples maps are declared in a mapping. We we can encounter situations in which the mapping yield a different result. The mapping.ttl and mapping-rev.ttl both contain the same triples maps, but they are switched around in the files. BURP relies on Apache Jena, and I presume they use hashes to store the nodes in memory. You can see that the execution of both mappings yields the same result, demonstrating we cannot rely on the order of the triples maps.

$ java -jar .\burp.jar -m mapping.ttl
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_4> "2" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> "1" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> "b" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "a" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag> .
<http://example.com/e/a> <http://example.com/ns#with> <http://example.com/c/a> .
System exiting with code: 0

$ java -jar .\burp.jar -m mapping-rev.ttl
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_4> "b" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> "a" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> "2" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "1" .
<http://example.com/c/a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag> .
<http://example.com/e/a> <http://example.com/ns#with> <http://example.com/c/a> .
System exiting with code: 0

Change the IRI of one of the triples maps (e.g., <#TM1> into <#TMxxx>), and you get a different result ("1, 2, a, b").

There is no "increment function", but we can easily create one where you use functions to generate predicates p1, p2, p3, ... and end up with different outputs. I can look into creating one.

@dachafra dachafra changed the title Order of execution of triplesmaps has implications on the RDF output There is no notion of order in RDF mappings, and this has implications for the RDF output Jul 3, 2024
@chrdebru
Copy link
Contributor

Here is the proposed paragraph for clarification:

The RDF data model represents knowledge as a graph of triples. The order in which these triples appear in a file or data stream is considered insignificant in the underlying model. [*] As such, RML Processors MUST NOT assume any implicit ordering of triple maps[, predicate-object maps, or term maps] within an RDF graph. This may lead to variations in output across different RML Processors, particularly when utilizing RML-CC when collections and containers from different triples maps [or term maps], or RML-FNML relying on state, such as generating incremental IDs, shared by triples maps [or term maps].

[*] When loading an RDF graph, libraries might rename blank nodes for internal consistency or to avoid clashes, potentially impacting the original order in which they were stored.

  1. The [*] is optional but stresses the problem of how declared blank node identifiers cannot be relied on. As this is implied by the previous sentence, it can be omitted.
  2. As this is also the case for term maps, I propose including those, hence the other square brackets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants