Range Selector #902
Replies: 1 comment
-
Picking up on this a month later, we have a plan for some of the primitives required to do set reconciliation, specifically a protocol for doing whole collections, sub-collections, and sub-sequences of bytes within blobs. We haven't documented this properly yet, but Rüdiger from our team goes into it in detail: https://youtu.be/bK9KDJxCfzI?t=271 This isn't set reconciliation, we'd need to build that on top. We have an early proof of concept that takes the first steps to implement CAR Mirror for the WNFS project: https://github.com/n0-computer/appa/blob/c5afcecbed0bab4b515c9e171e6a5d937ab4f440/src/main.rs#L272-L301 @AIDXNZ, the appa project is an example of "Custom Requests", an abstraction atop the Iroh data transfer protocol we're designing that would let you tackle some of these exact explorations like latency-based optimization as an extension. |
Beta Was this translation helpful? Give feedback.
-
*This is pertaining to the Range Based Set Reconciliation protocol.
TLDR; Using Small Ranges for lite nodes/embedded devices/ low bandwidth areas
This is less of a feature and more of a discussion as I am just spit balling.
Here are some questions I have
Selecting an initial range guidelines:
Set size: a lossyish approach for smaller sets(files) , a large range size may be more appropriate, while larger sets(files) may benefit from smaller range sizes.
Number of differences: Sets with a higher number of differences may benefit from smaller range sizes to reduce the number of missing elements that need to be transmitted.
Available bandwidth: If bandwidth is limited, smaller range sizes may be more suitable as they prevent duplicated chunks being sent over the wire.
Computational power: Larger range sizes may require more computational power to compute the hash values of the ranges, so the available computational resources should be considered when selecting the range size.
Number of Provider Sets: For example if you're asking for a file from one node just use a large range or split the sets in half recursively. But if there are 3+ nodes with the providing the file it would make more sense to use a small range to begin with and maybe different ranges per provider, in my head at least lol.
Example
Lets imagine a low resource node requesting a large Set of blobs. A possible approach to finding the optimal range size would be to start with a small range size and gradually increase it while measuring the performance (i.e., the number of transmissions required and duplicates) for each range size. Once the performance begins to degrade or level off, that range size could be considered the optimal range size for the set reconciliation. The node would terminate when the optimal range size is found, or when the range size exceeds a predefined maximum value, or the transfer is completed.
Note that the specific implementation details and performance metrics may vary depending on the specific application but you could just always fall back to the recursive version as well by default. :)
Beta Was this translation helpful? Give feedback.
All reactions