reference implementation for CIP4

celestiaorg · Feb 5, 2024 · a4feb21 · a4feb21
1 parent c934d87
commit a4feb21
Showing 1 changed file with 35 additions and 4 deletions.
diff --git a/cips/cip-4.md b/cips/cip-4.md
@@ -2,7 +2,7 @@
 cip: 4
 title: Standardize data expiry time for pruned nodes
 description: Standardize default data expiry time for pruned nodes to 30 days.
-author: Mustafa Al-Bassam (@musalbas), Rene Lubov (@renaynay)
+author: Mustafa Al-Bassam (@musalbas), Rene Lubov (@renaynay), Ramin Keene (@ramin)
 discussions-to: https://forum.celestia.org/t/cip-standardize-data-expiry-time-for-pruned-nodes/1326
 status: Last Call
 last-call-deadline: 2024-02-07
@@ -19,7 +19,7 @@ This CIP standardizes the default expiry time of historical blocks for pruned (n
 
 The purpose of data availability layers such as Celestia is to ensure that block data is provably published to the Internet, so that applications and rollups can know what the state of their chain is, and store that data. Once the data is published, data availability layers [do not inherently guarantee that historical data will be permanently stored and remain retrievable](https://notes.ethereum.org/@vbuterin/proto_danksharding_faq#If-data-is-deleted-after-30-days-how-would-users-access-older-blobs). This task is left to block archival nodes on the network, which may be ran by professional service providers.
 
-Block archival nodes are nodes that store a full copy of the historical chain, whereas pruned nodes store only the latest blocks. Consensus nodes running Tendermint are able to prune blocks by specifying a `min-retain-blocks` parameter in their configuration. Data availability nodes running celestia-node will also [soon have the ability to prune blocks](https://github.com/celestiaorg/celestia-node/pull/2738).
+Block archival nodes are nodes that store a full copy of the historical chain, whereas pruned nodes store only the latest blocks. Consensus nodes running Tendermint are able to prune blocks by specifying a `min-retain-blocks` parameter in their configuration. Data availability nodes running celestia-node will also [soon have the ability to prune blocks](https://github.com/celestiaorg/celestia-node/pull/3150/).
 
 It is useful to standardize a default expiry time for blocks for pruned nodes, so that:
 * Rollups and applications have an expectation of how long data will be retrievable from pruned nodes before it can only be retrieved from block archival nodes.
@@ -45,12 +45,43 @@ Data availability sampling light nodes SHOULD sample blocks created in the last
 
 ## Backwards Compatibility
 
-The implementation of pruned nodes will break backwards compatibility in a few ways: 
+The implementation of pruned nodes will break backwards compatibility in a few ways:
 
-1. Light nodes running on older software (without the sampling window) will not be able to sample historical data (blocks older than 30 days) as nodes advertising on the `full` tag will no longer be expected to provide historical blocks. 
+1. Light nodes running on older software (without the sampling window) will not be able to sample historical data (blocks older than 30 days) as nodes advertising on the `full` tag will no longer be expected to provide historical blocks.
 2. Similarly, full nodes running on older software will not be able to sync historical blocks without discovering non-pruned nodes on the `archival` tag.
 3. Requesting blobs from historical blocks via a light node or full node will not be possible without discovering non-pruned nodes on the `archival` tag.
 
+## Reference Implementation
+
+### Data Availability Sampling Window (light nodes)
+
+Implementation for light nodes can be quite simple, where a satisfactory implementation merely behaves in that the choice to sample headers should not occur for headers whose timestamp is outside the given sampling window.
+
+Given a hypothetical "sample" function that performs data availability sampling of incoming extended headers from the network, the decision to sample or not should be taken by inspecting the header's timestamp, and ignoring it in any sampling operation if the duration between the header's timestamp and the current time exceeds the duration of the sampling window. For example
+
+```go
+
+const windowSize = time.Second * 86400 * 30 // 30 days
+
+func sample(header Header) error{
+    if time.Since(header.Time()) > windowSize {
+        return nil // do not perform any sampling
+    }
+
+    // continue with rest of sampling operation
+}
+```
+
+Example implementation by [celestia node](https://github.com/celestiaorg/celestia-node/pull/2991)
+
+### Storage Pruning
+
+Pruning of data outside the availability window will be highly implementation specific and dependent on how data storage is engineered.
+
+A satisfactory implementation would be where any node implementing storage pruning may, if NOT advertising oneself to peers as an archival node on the 'full' topic, discard stored data outside the 30 day availability window. A variety of options exist for how any implementation might schedule pruning of data, and there are no requirements around how this is implemented. The only requirement is merely that the time guarantees around data within the availability window are properly respected, and that data availability nodes correctly advertise themselves to peers.
+
+An example implementation of storage pruning (WIP at time of writing) in [celestia node](https://github.com/celestiaorg/celestia-node/pull/3150/files)
+
 ## Security Considerations
 
 As discussed in Rationale, data availability sampling light nodes need to at least sample data within the Tendermint weak subjectivity period of 21 days in order to independently verify the data availability of the chain. 30 days exceeds this.