An obvious repercussion of being able to replicate documents about the place is that sometimes you might edit them in more than one place at the same time. When the databases containing these concurrent edits are replicated, there needs to be some way to bring these divergent documents back together. Cloudant's MVCC data-model is used to do this. This page describes how it works.
The Sync datastore participates in master-less replication with Cloudant or Apache CouchDB. What this means is that there is no canonical copy of the documents in each database. One of the main results of this fact is that changes may happen to a document in many different places concurrently. When these changes are replicated between previously disconnected databases, conflicts arise. However, Cloudant and Cloudant Sync provide ways to both access and resolve conflicts from within your application.
It's important to understand that this data model is in place to make sure that:
- The user loses no data -- we keep all versions of a document that haven't been superceded. That is, all leaf nodes of the tree.
- The application has as much information as possible to resolve the conflicts, as it's able to examine all of the leaf nodes of the tree before resolving a conflict.
Cloudant Sync's MVCC data layer is key to the conflict resolution process, and can be visualised as a tree structure.
A document is really a tree of the document and its history. This is neat because it allows us to store multiple versions of a document. In the main, there's a single, linear tree -- just a single branch -- running from the creation of the document to the current revision. This is the usual case, and looks like this, with the revisions represented by their revision IDs:
1-x --- 2-x --- 3-x --- 4-x
^
"winning" revision /
The fact that the document is a tree implies that it's possible, however, to create further branches in the tree.
When a document has been replicated to more than one place, it's possible to edit it concurrently in two places. When the datastores storing the document then replicate with each other again, they each add their changes to the document's tree. This causes an extra branch to be added to the tree for each concurrent set of changes. When this happens, the document is said to be conflicted. This creates multiple current revisions of the document, one for each of the concurrent changes.
Say we last replicated the document above at the 2-x
revision. We make
two changes locally (3-x
and 4-x
) and the remote datastore has a single
change made to it (3-y
). On replicating back from the remote, the local
datastore ends up with a document like this:
replicated from remote
|
v
------ 3-y
/
1-x --- 2-x --- 3-x --- 4-x
^
"winning" revision /
We now have two non-deleted leaf nodes: the document is conflicted.
To make things easier, calling Datastore#getDocument(...)
returns one of
the leaf nodes of the branches of the conflicted document. It selects the
node to return in an arbitrary but deterministic way, which means that all
replicas of the database will return the same revision for the document. The
other copies of the document are still there in the case of conflicts,
however, waiting to be merged, as shown below.
See more information on document trees in the javadocs for
DocumentRevisionTree
.
When a document has been changed in many places, it becomes conflicted. This means that there are a number of active, alternative versions of the document. Applications -- whether on device or a web app communicating with the Cloudant or CouchDB HTTP interfaces -- must resolve the conflicts by creating a merged version of the active versions of the document, then updating the document with this and deleting the now obsolete leaf nodes.
Fortunately, Cloudant Sync has helper methods to simplify this. There's a method which returns all the documents in a conflicted state, along with a helper method to streamline the process of resolving conflicts.
There's a function on the Datastore
prototype:
datastore.getConflictedDocumentIds();
Once you've found the list of documents, you need to resolve them. This is
done one-by-one, passing a document ID and a function able to resolve conflicts
to the resolveConflictsForDocument(String, Datastore~resolveConflictsCallback)
function of the Datastore
prototype.
The Datastore~resolveConflictsCallback
function has two parameters, the
documentId and an array of conflicting document revisions and should return
the document revision to be used to resolve the conflicts. All remaining
document revisions will automatically be marked as deleted. A rather simplistic
implementation would be:
function pickFirst(documentId, documentRevisions) {
return documentRevisions[0];
}
Clearly, in the general case this will discard the user's data(!), but it'll do for this example.
It is also possible to return an updated document from the
Datastore~resolveConflictsCallback
, perhaps by merging data from the conflicts:
function mergeResolver(documentId, documentRevisions) {
var docRev = documentRevisions[0];
docRev.my_new_field = /* ...update body, perhaps with data from the other conflicts */
docRev._attachments = /* ...you can also create/update/delete attachments */
return docRev;
}
Conceptually, the resolveConflictsForDocument
method does the following:
-
Get all the non-deleted leaf node revisions for the document.
------ 3-y / 1-x --- 2-x --- 3-x --- 4-x
That's
3-y
and4-x
here. -
Call
resolve
with the list of revisions from (1). -
Take the returned revision and update the current winning revision (
4-x
) with this revision. -
Delete the other non-deleted leaf nodes (
3-y
in this case) of the document tree.
The tree ends up looking like this:
------ 3-y --- 4-deleted
/
1-x --- 2-x --- 3-x --- 4-x --- 5-x
^
"winning" revision /
The winning revision is now the only non-deleted leaf node, so the document is no longer conflicted.
All this happens inside a transaction, ensuring consistency.
This resolution can be replicated to the remote document store, bringing the two databases into a consistent state.
You could imagine an application running the following method via a timer to periodically fix up any conflicts:
function resolveConflicts(datastore) {
function pickFirst(documentId, documentRevisions) {
return documentRevisions[0];
}
datastore.getConflictedDocumentIds()
.then(function (docIds) {
for (var i = 0; i < docIds.length; ++i) {
datastore.resolveConflictsForDocument(docIds[i], pickFirst);
}
});
}```
How often this should run depends on your application, but you'd probably
want to consider:
- Running every few minutes.
- Running when a pull replication completes.
We're always looking at ways to improve the experience around conflicts,
so be sure to file an issue if you have suggestions or problems.