Consistent graph replication - RDF Dataset Canonicalization #51

sandervd · 2023-12-20T15:20:05Z

When a client requires hard guarantees on consistency, the logic described in the RDF Dataset Canonicalization could be used to provided hashes of the state that should be reached after applying a fragment, or even better, a transaction.
This becomes relevant in cases where LDES is used as a replication protocol for named graphs (the client should have an exact copy of the named graph the publisher intended). For instance, consistency could be lost if a client is offline longer than allowed by the retention period, which could result in missed delete operations (tombstone events). If a checksum mismatch is detected, the client must restart replication from the start of the log to arrive at consistent state.

Reference: https://www.w3.org/TR/rdf-canon/

xdxxxdx · 2024-02-08T08:31:30Z

I think this can be applied generically to TREE (tree client)?

sandervd · 2024-02-09T12:36:34Z

Hmm, I was thinking more to include a hash on each member (version object), that would represent the state of the full represented graph after applying the change:
For instance if we would have a collection {(1,A,State 1), (2, B, Some value), (3, A, State 2)}
After applying the 3th member, we would have the graph:
{(A: State 2), (B: Some value)}. The hash should in this case be the hash of the state of the full graph, if that makes sense 😄
This way we can give much stronger guarantees of consistency.

Of course, the hashes would only be valid in tail of the log due to retention deleting objects that have newer state further in the log.

pietercolpaert · 2024-02-26T10:17:03Z

I actually use that over here, to transform data dumps into an LDES feed: https://github.com/pietercolpaert/DCAT-AP-Dumps-To-Feeds/blob/main/index.ts#L59

I’m not sure however what would be the influence on the LDES spec itself? DO you expect this hash to be present in the member? Do you want a path to point to that property?

sandervd · 2024-03-27T17:45:36Z

Yes, I would see it as metadata of an event, similar like its timestamp. The hash would indicate the state of the graph after applying the member (or members in case of a transaction). This way we can assure graph integrity over time, the client can validate it holds an exact replica of the graph published/intended.
I see this as an important guarantee in cases like the base registries etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent graph replication - RDF Dataset Canonicalization #51

Consistent graph replication - RDF Dataset Canonicalization #51

sandervd commented Dec 20, 2023

xdxxxdx commented Feb 8, 2024

sandervd commented Feb 9, 2024

pietercolpaert commented Feb 26, 2024

sandervd commented Mar 27, 2024

Consistent graph replication - RDF Dataset Canonicalization #51

Consistent graph replication - RDF Dataset Canonicalization #51

Comments

sandervd commented Dec 20, 2023

xdxxxdx commented Feb 8, 2024

sandervd commented Feb 9, 2024

pietercolpaert commented Feb 26, 2024

sandervd commented Mar 27, 2024