-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with Recent Headers #336
Comments
Looks like storing all 8192 of the most recent headers is about 5mb. |
Seems we could support range queries by having a separate content key that:
In a network situation where clients are expected to store most or all of the recent headers, this could be used to quickly acquire all of the most recent headers. |
One way to do the content-id for recent headers might be to have the derived If my intuition is correct, this would result in two blocks that are close to each other in height, also being close to each other in the network, making it somewhat easy to navigate the network to grab sequential blocks from the recent set. |
If we said that all nodes had to store all headers it would introduce the first real baseline "requirement" for nodes in our network, meaning that they'd be required to both store ~5mb of data and they would have to continually acquire new headers as the entered the network. My initial gut says that I don't like introducing this as a requirement and that maybe I'd rather it be optional. Here are some ideas.
The use cases I'm thinking we want to support are:
|
Are we going to store in db the recent 8192
We are already doing this by storing all bootstraps and |
Is this opt-in? Is it assumed that you can request any of these from any node on the network? Are nodes on the network expected to sync the last 27 hours of these when they come online? I think this is what I mean by making it optional. It's very different for a client to choose to store the last 27 hours of these vs a client to be expected to have the last 27 hours of these and for functionality in the network to be based on the assumption that they can request the last 27 hours of these from any node on the network. |
For Regarding So I think everything depends on what kind of flexibility we want to provide for the end user for choosing their starting point to sync the light client. |
Doesn't this idea introduce a roving hot spot in the network (on the assumption that recent chain history is the most popular) centered around the nodes "nearest" to the current 13 most significant bits of the contentIDs? I get conceptually why its convenient for purposes of retrieval but feels like it could end up being a DDoS vector once we get an uptick in the "very light" clients you mentioned that are regularly dropping in and out and end up hammering the same set of nodes looking for the head of the chain. |
Good catch on the hot spots. Possibly we could eliminate the hotspot by expecting all nodes to store the latest 256 headers, and then stripe the rest around the network. Grabbing 256 headers in one request from a close neighbor should be pretty trivial in terms of bandwidth costs. Still needs to be decided if the striping approach actually fixes a real problem and is necessary... |
If we decide that nodes SHOULD store 8192 recent headers then I think the simple solution that is currently done for the Currently in Beacon networkOne But not all nodes will have the full 4 month range, it depends on when it was bootstrapped. Instead of a neighborhood requests, random requests are done -> some might fail, that should be fine however. The content key is by Storage and pruning:Within Fluffy this is stored with the Apply the same to history recent headersThis does not mean that every node that comes online needs to store all this data immediately. As long as "ephemeral" nodes are not a massive majority, this should be fine I think. (Even in the case of localized storage/access this could/would still be an issue, but to a lesser extent) The idea of @pipermerriam of adding in When a node stores all ~5MB of recent headers, it could store the headers in a specialized table so that it can easily retrieve them via block number and via block hash (e.g. primary key hash, index on number or similar ways). The protocol could then provide two different content keys, but they access the same data on the nodes. And the content keys could/would support ranges. Pruning could work by dropping block numbers older than x. (There is a slight discrepancy with block number vs slot, as not each slot necessarily has a block, so perhaps it is a little more complex, but then again it doesn't need to be exactly the last 8192 slots i think) Now, if for some reason this is not sufficient and too flawed causing issues retrieving this data, then yes, we will have to resort to something more complicated in terms of content id derivation and localized access to the data (as has been mentioned in above comments). I'm however questioning if this will be needed. |
I think all the necessary fields to convert to an EL
Yes, but I would say the In the last Portal meetup I actually mentioned that I would be pro to moving the |
Implications of this are:
|
Recent BlockHeaders / Ephemeral BlockHeaders
EL
BlockHeader
s that are still in the currentperiod
cannot have a proof againsthistorical_summaries
as they are not part of the usedstate.block_roots
for the latestHistoricalSummary
.These can only be injected into the network with their proof once a period transition has occured.
Before this, they could get injected without a proof, as is currently done.
PR #292 was/is a simple take on how to verify these.
However, the actual scenarios of how these headers get stored will probably be different.
A Portal node that is running the Portal beacon network and has a beacon light client synced will also have access to these headers as they are part of the
LightClientHeader
since Capella (albeit in a different serialization): https://github.com/ethereum/consensus-specs/blob/7cacee6ad64483357a7332be6a11784de1242428/specs/capella/light-client/sync-protocol.md?plain=1#L52Currently these recent / ephemeral (proof-less) BlockHeaders fall under the same content key as headers with a proof.
It has been raised in the past to moving the BlockHeader without a proof into a seperate type in the history network. I think that is a good idea as they are conceptually different than headers with proof:
The effect of this is that:
All this will simply require different storage and access features.
Some example scenarios:
Portal node that is beacon LC synced:
Portal node that is not yet beacon LC synced:
Client with no Portal beacon network running (e.g. full node with Portal integrated)
Effect of changing to a new content type
The
None
option in the currentUnion
becomes invalid. However, removing theNone
would make all current data invalid. So if we want to clean this up properly, we need a migration path.Storing and accessing the data
Storage would be different than the current Content databases as it requires pruning and dealing with re-orgs.
It will thus more likely end up in a separate table / persistent cache but this is up to the implementation.
Access could exist as it does now, i.e. Neighborhood based look-ups, but with optionality of nodes to store more than its radius, and thus nodes could also try to request them to any node.
Or, we could make this explicit and say that each node MUST store all.
Additionally, to "backfill" faster we could add in this implicit version a range request (this is similar as we do now in the beacon network for LightCLientUpdates)
The text was updated successfully, but these errors were encountered: