-
Notifications
You must be signed in to change notification settings - Fork 4
On disk file format
This is a wiki page to draft the new GeoCouch on disk file format.
Same as for couchstore: https://github.com/couchbaselabs/couchstore/wiki/Format
Prefixed with a 32 bit length and a hash, similarly to other data chunks, but the length does include the length of the hash.
Values in the file header
- 8 bits -- File format version (Currently 3)
- 48 bits -- Update sequence number. This is the sequence number new updates should start at.
- 48 bits -- Purge sequence number
- 16 bits -- Size of
by_id
B-tree root - The
by_id
B-tree root - List of indexes (one Design Document can have several indexes). For every index:
- 48 bits -- Update sequence number
- 48 bits -- Purge sequence number
- 16 bits -- Size of R-tree root
- The R-tree root
The B-tree root is a node pointers as described in the "B-trees" section of the couchstore file format.
The R-tree root is similar to the node pointers as described in the "B-trees" section of the couchstore file format. As for B-tree roots, it doesn't store the reduce value size, as it can be inferred from the lengths in the header.
- 48 bits -- Node position in file
- 48 bits -- Sub tree size, or disk size of R-tree data below this node. For pointers to leaf nodes this is the size on disk of the pointed to node, otherwise it is the sum of the size of the pointed to node and all the values of this field on the pointers in that node.
- Reduce value -- Used for memoizing the intermediate values of a reduce function.
The K/V and K/P nodes have the same format as the B-trees have (see https://github.com/couchbaselabs/couchstore/wiki/Format for more). The values of the leaf nodes is as described below.
The keys in this R-tree are raw blobs with whatever comes from Erlang. It is a list with keys, one for every dimension.
The values are
- 12 bits -- Size of the document ID
- 28 bits -- Size of the document data (bytes stored on disk including the geometry)
- 48 bits -- Position of the geometry content on disk
- 48 bits -- Position of the document content on disk
- Document ID
The content type will always be JSON, as we emitted JSON from the emit() function.
The revision metadata is not needed.
The reduce value in this R-tree will be the bitmask for the superstar index and the value of the reduce function.
It still needs to be decided which format the geometry will have. As calculations need to be made, perhaps Well-Known Binary (see Section 3.3 of the official standard or in more readable HTML) would make sense, which is supported by every major geo library.