-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed transactions in LDES #46
Comments
I would avoid splitting up streams and the need for distributed transaction as much as possible. In this case I would probably use auto-increments generated by the server to indicate the position of a member in the log (in this case is rather natural to see the LDES as the binlog of the database system), as the position in the WAL needs to be exact. Given that the timestamp has sufficient granularity we could use it, but I would prefer a clean cut: in case of a WAL it is strictly the server who determines the position in the log; as the spec leaves it open to use instance data (a timestamp generated elsewhere) for ordering, I feel it would communicate a wrong message. When the split between metadata and data is clear (by using named graph tree members), adding the additional transaction semantics to the event shouldn't be much of a hassle. It is also important that we can communicate in which part of the log the transactional boundaries are respected; for instance when state + time retention (see #36 ) is enabled, consistency can only be guaranteed in the last part (where time based retention prevents member removal), as in the first part of the log members could be missing as they are cleaned up by retention mid transaction. To come back to the multiple streams: the splitting into multiple streams could be achieved later (for clients that don't need transactionality) by a fragmentation or to multiple streams, knowing that each stream/fragmentation breaks the borders of transactions and guarantees of integrity (integrity can only be guaranteed in the mixed stream). |
How do we indicate that we have a consistent knowledge graph across LDESes?
For instance, what if we split Linked OpenStreetMap into 3 LDESes: one for nodes, one for ways and one for relations. Then, for each LDES, multiple objects are added into one transaction. Just processing one member would not be very valuable, as you would get an inconsistent replication: the osm:Way would for example not yet have the necessary osm:Nodes to point to.
A solution would be to have a transaction system that indicates whether the transaction is still in progress, and define the bounds (probably based on a timestamp?). The set of members can then only be processed into a derived view or service from the moment the transaction is marked as completed.
Any ideas on this?
A way to circumvent transactions in LDES is to just post members at exactly the same datetime. However, we might want to introduce parts of the knowledge graph changes at different times.
Related work:
The text was updated successfully, but these errors were encountered: