Implement PeerDAS #14129

nalepae · 2024-06-21T11:46:36Z

Remaining tasks:

Initial sync: Backfill data columns when starting from a checkpoint.
Initial sync: Do not request more than 64 data columns: Request 64 data columns then reconstruct the rest.
Implement validator custody.
Peer sampling with trailing fork choice: Integrate the output of sampling in the forkchoice?
Check why we cannot run initial sync when starting a node during epoch 0, in slot [1, 31]. (Starting a node since genesis works, starting a node from epoch 1 and later works.) Note: This issue may be already here with blobs (without data columns.)
Indroduce a nasty peer flag that holds AND/OR (both are important) alterate data columns on RPC responses.
Add unit tests where they are not yet added.
Replace the cKZG library by the goKZGone (when available) - In progress.
⚠️ Implement number of blobs as a CL parameter. ⚠️
Decouple network subnets from Core methods as referenced.
Peer sampling: Find more peers if a column has no corresponding peers. (Can do like in internalBroadcastDataColumn)
Add PeerDAS specific metrics
Clarify usage of --minimum-peers-per-subnet=<n>
Should we downscore peer on sampling? (Maybe the peer does not see the block)
Should we stop using a peer when failed on sampling? (To avoid DDoS)
Check metadataV3 interop with other clients if before EIP-7594
Refactor code where GetValidCustodyPeers is used. ==> We need to fetch data from peers that are not super nodes.
prysmctl: Implement byRoot and byRange data columns requests
Using subnet sampling only: Should, when using data columns by root requests (and also by range, during initial sync), the node request only custody columns, or custody columns + extra subnet sampling columns? If requesting only custody columns, then a strange situation arises: When receiving a block with a extra subnet sampling column missing, then the node will ignore the block. A few sec after, when requesting the blocks and the associated columns, the node won't need the subnet sampling extra columns to import the block.
Find why sometimes ENR records are not correctly updated via discovery.
Find why, with Grandine in the mix, discv5 iterator.Next is soooo slow.
Test with only one bootnode (and not all nodes)
By range (and by root) requests: Run verifications for finalized blocks?
ByRoot requests: choose all peers like on ByRange requests, not only super nodes.
ENR: Are we sure the sequence number cannot go backward after a reboot? (This trigs downscoring on Prysm.)
subscribeWithParameters: Should we subscribe to subnets / find peers 1 epoch in advance for getSubnetsToSubscribe and getSubnetsToFindPeersOnly?
Subnets unsubscribing: Handle the same way dynamic (fork epoch) and non dynamic (fork epoch + 1) unsubscriptions.

* Add Support For Discovery Of Column Subnets * Lint for SubnetsPerNode * Manu's Review * Change to a better name

* Add Data Column Subscriber * Add Data Column Vaidator * Wire all Handlers In * Fix Build * Fix Test * Fix IP in Test * Fix IP in Test

* Add RPC Handler * Add Column Requests * Update beacon-chain/db/filesystem/blob.go Co-authored-by: Manu NALEPA <[email protected]> * Update beacon-chain/p2p/rpc_topic_mappings.go Co-authored-by: Manu NALEPA <[email protected]> * Manu's Review * Manu's Review * Interface Fixes * mock manager --------- Co-authored-by: Manu NALEPA <[email protected]>

* Bump `c-kzg-4844` lib to the `das` branch. * Implement `MerkleProofKZGCommitments`. * Implement `das-core.md`. * Use `peerdas.CustodyColumnSubnets` and `peerdas.CustodyColumns`. * `CustodyColumnSubnets`: Include `i` in the for loop. * Remove `computeSubscribedColumnSubnet`. * Remove `peerdas.CustodyColumns` out of the for loop.

https://github.com/ethereum/consensus-specs/blob/dev/specs/_features/eip7594/p2p-interface.md#the-discovery-domain-discv5

* Remove capital letter from error messages. * `[4]byte` => `[fieldparams.VersionLength]byte`. * Prometheus: Remove extra `committee`. They are probably due to a bad copy/paste. Note: The name of the probe itself is remaining, to ensure backward compatibility. * Implement Proposer RPC for data columns. * Fix TestProposer_ProposeBlock_OK test. * Remove default peerDAS activation. * `validateDataColumn`: Workaround to return a `VerifiedRODataColumn`

* Add new DA check * Exit early in the event no commitments exist. * Gazelle * Fix Mock Broadcaster * Fix Test Setup * Update beacon-chain/blockchain/process_block.go Co-authored-by: Manu NALEPA <[email protected]> * Manu's Review * Fix Build --------- Co-authored-by: Manu NALEPA <[email protected]>

* Update `consensus_spec_version` to `v1.5.0-alpha.1`. * `CustodyColumns`: Fix and implement spec tests. * Make deepsource happy. * `^uint64(0)` => `math.MaxUint64`. * Fix `TestLoadConfigFile` test.

This reverts commit bd7ec3f.

* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.

* `sendBatchRootRequest`: Refactor and add comments. * `sendBatchRootRequest`: Do send requests to peers that custodies a superset of our columns. Before this commit, we sent "data columns by root requests" for data columns peers do not custody. * Data columns: Use subnet sampling only. (Instead of peer sampling.) aaa * `areDataColumnsAvailable`: Improve logs. * `GetBeaconBlock`: Improve logs. Rationale: A `begin` log should always be followed by a `success` log or a `failure` log.

* `validateDataColumn`: Refactor logging. * `dataColumnSidecarByRootRPCHandler`: Improve logging. * `isDataAvailable`: Improve logging. * Add hidden debug flag: `--data-columns-reject-slot-multiple`. * Add more logs about peer disconnection. * `validPeersExist` --> `enoughPeersAreConnected` * `beaconBlocksByRangeRPCHandler`: Add remote Peer ID in logs. * Stop calling twice `writeErrorResponseToStream` in case of rate limit.

* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor. * `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock` * `reconstructDataColumns`: Stop looking into the DB to know if we have some columns. Before this commit: Each time we receive a column, we look into the filesystem for all columns we store. ==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look. Also, as soon as a column is saved into the file system, then if, right after, we look at the filesystem again, we assume the column will be available (strict consistency). It happens not to be always true. ==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency. After this commit: We use a (strictly consistent) cache to determine if we received a column or not. ==> No more consistency issue, and less stress for the filesystem. * `dataColumnSidecarByRootRPCHandler`: Improve logging. Before this commit, logged values assumed that all requested columns correspond to the same block root, which is not always the case. After this commit, we know which columns are requested for which root. * Add a log when broadcasting a data column. This is useful to debug "lost data columns" in devnet. * Address Nishant's comment

* `columnErrBuilder`: Uses `Wrap` instead of `Join`. Reason: `Join` makes a carriage return. The log is quite unreadable. * `validateDataColumn`: Improve log. * `areDataColumnsAvailable`: Improve log. * `SendDataColumnSidecarByRoot` ==> `SendDataColumnSidecarsByRootRequest`. * `handleDA`: Refactor error message. * `sendRecentBeaconBlocksRequest` ==> `sendBeaconBlocksRequest`. Reason: There is no notion at all of "recent" in the function. If the caller decides to call this function only with "recent" blocks, that's fine. However, the function itself will know nothing about the "recentness" of these blocks. * `sendBatchRootRequest`: Improve comments. * `sendBeaconBlocksRequest`: Avoid `else` usage and use map of bool instead of `struct{}`. * `wrapAndReportValidation`: Remove `agent` from log. Reason: This prevent the log to hold on one line, and it is not really useful to debug. * `validateAggregateAndProof`: Add comments. * `GetValidCustodyPeers`: Fix typo. * `GetValidCustodyPeers` ==> `DataColumnsAdmissibleCustodyPeers`. * `CustodyHandler` ==> `DataColumnsHandler`. * `CustodyCountFromRemotePeer` ==> `DataColumnsCustodyCountFromRemotePeer`. * Implement `DataColumnsAdmissibleSubnetSamplingPeers`. * Use `SubnetSamplingSize` instead of `CustodySubnetCount` where needed. * Revert "`wrapAndReportValidation`: Remove `agent` from log." This reverts commit 55db351.

* `retrieveMissingDataColumnsFromPeers`: Improve logging. * `dataColumnSidecarByRootRPCHandler`: Stop decreasing peer's score if asking for a column we do not custody. * `dataColumnSidecarByRootRPCHandler`: If a data column is unavailable, stop waiting for it. This behaviour was useful for peer sampling. Now, just return the data column if we store it. If we don't, skip. * Dirty code comment. * `retrieveMissingDataColumnsFromPeers`: Improve logs. * `SendDataColumnsByRangeRequest`: Improve logs. * `dataColumnSidecarsByRangeRPCHandler`: Improve logs.

…4483)

* `BestFinalized`: Refactor (no functional change). * `BestNonFinalized`: Refactor (no functional change). * `beaconBlocksByRangeRPCHandler`: Remove useless log. The same is already printed at the start of the function. * `calculateHeadAndTargetEpochs`: Avoid `else`. * `ConvertPeerIDToNodeID`: Improve error. * Stop printing noisy "peer should be banned" logs. * Initial sync: Request data columns from peers which: - custody a superset of columns we need, and - have a head slot >= our target slot. * `requestDataColumnsFromPeers`: Shuffle peers before requesting. Before this commit, we always requests peers in the same order, until one responds something. Without shuffling, we always requests data columns from the same peer. * `requestDataColumnsFromPeers`: If error from a peer, just log the error and skip the peer. * Improve logging. * Fix tests.

* Improve logging. * `retrieveMissingDataColumnsFromPeers`: Limit to `512` items per request. * `retrieveMissingDataColumnsFromPeers`: Allow `nil` peers. Before this commit: If, when this funcion is called, we are not yet connected to enough peers, then `peers` will be possibly not be satisfaying, and, if new peers are connected, we will never see them. After this commit: If `peers` is `nil`, then we regularly check for all connected peers. If `peers` is not `nil`, then we use them.

* `retrieveMissingDataColumnsFromPeers`: Search only for needed peers. * Improve logging.

* Fix Commitments Check * `highestFinalizedEpoch`: Refactor (no functional change). * `retrieveMissingDataColumnsFromPeers`: Fix logs. * `VerifyDataColumnSidecarKZGProofs`: Optimise with capacity. * Save data columns when initial syncing. * `dataColumnSidecarsByRangeRPCHandler`: Add logs when a request enters. * Improve logging. * Improve logging. * `peersWithDataColumns: Do not filter any more on peer head slot. * Fix Nishant's comment. --------- Co-authored-by: Manu NALEPA <[email protected]>

…llow syncing from full nodes. (#14532) * `validateDataColumnsByRange`: `current` ==> `currentSlot`. * `validateRequest`: Extract `remotePeer` variable. * `dataColumnSidecarsByRangeRPCHandler`: Small non functional refactor. * `streamDataColumnBatch`: Fix major bug. Before this commit, the node was unable to respond with a data column index higher than the count of stored data columns. For example, if there is 8 data columns stored for a given block, the node was able to respond for data columns indices 1, 3, and 5, but not for 10, 16 or 127. The issue was visible only for full nodes, since super nodes always store 128 data columns. * Initial sync: Fetch data columns from all peers. (Not only from supernodes.) * Nishant's comment: Fix `lastSlot` and `endSlot` duplication. * Address Nishant's comment.

https://github.com/ethereum/consensus-specs/blob/dev/specs/_features/eip7594/p2p-interface.md#datacolumnsidecarsbyrange-v1 The following data column sidecars, where they exist, MUST be sent in (slot, column_index) order.

* `ColumnAlignsWithBlock`: Split lines. * Data columns verifications: Batch * Remove completely `DataColumnBatchVerifier`. Only `DataColumnsVerifier` (with `s`) on columns remains. It is the responsability of the function which receive the data column (either by gossip, by range request or by root request) to verify the data column wrt. corresponding checks. * Fix Nishant's comment.

This reverts commit b28b1ed.

This reverts commit 5f17317.

nalepae added peerDAS Blocked Blocked by research or external factors labels Jun 21, 2024

nalepae self-assigned this Jun 21, 2024

nalepae force-pushed the peerDAS branch from 164d743 to 3c9f5ef Compare July 18, 2024 09:40

nalepae force-pushed the peerDAS branch from b2a6877 to c9fa938 Compare July 29, 2024 13:15

nisdas force-pushed the peerDAS branch 3 times, most recently from dc39693 to 3fb1fa4 Compare August 20, 2024 09:41

nisdas mentioned this pull request Aug 23, 2024

Use Data Column Validation Across Prysm #14377

Merged

nalepae force-pushed the peerDAS branch 2 times, most recently from f2dea1c to bd7ec3f Compare August 28, 2024 11:21

KatyaRyazantseva mentioned this pull request Oct 3, 2024

PeerDAS metrics: add data column, kzg, custody metrics ethereum/beacon-metrics#14

Draft

nalepae force-pushed the peerDAS branch 2 times, most recently from 1488044 to ddea2d7 Compare October 8, 2024 20:38

KatyaRyazantseva mentioned this pull request Oct 23, 2024

Fix 'data_column_sidecar_computation' metric align with PeerDAS metrics specs #14574

Open

3 tasks

nalepae force-pushed the peerDAS branch from e599896 to 2ab8c3b Compare October 24, 2024 07:57

nalepae force-pushed the peerDAS branch 3 times, most recently from e92cde5 to 2dd4302 Compare November 25, 2024 14:06

nalepae and others added 11 commits November 27, 2024 10:11

Add in column sidecars protos (#13862)

c688c84

add it (#13865)

3e23f6e

Add Support For Discovery Of Column Subnets (#13883)

9ffc19d

* Add Support For Discovery Of Column Subnets * Lint for SubnetsPerNode * Manu's Review * Change to a better name

Add Data Column Gossip Handlers (#13894)

d844026

* Add Data Column Subscriber * Add Data Column Vaidator * Wire all Handlers In * Fix Build * Fix Test * Fix IP in Test * Fix IP in Test

Implement custody_subnet_count ENR field. (#13915)

f503efc

https://github.com/ethereum/consensus-specs/blob/dev/specs/_features/eip7594/p2p-interface.md#the-discovery-domain-discv5

Update .bazelrc (#13931)

b78c348

Spectests (#13940)

dcbb543

* Update `consensus_spec_version` to `v1.5.0-alpha.1`. * `CustodyColumns`: Fix and implement spec tests. * Make deepsource happy. * `^uint64(0)` => `math.MaxUint64`. * Fix `TestLoadConfigFile` test.

nisdas and others added 18 commits November 27, 2024 10:30

Revert "Change Custody Count to Uint8 (#14386)" (#14415)

2de1e6f

This reverts commit bd7ec3f.

Fix CPU usage in small devnets (#14446)

2191faa

* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.

Update c-kzg-4844 to v2.0.1 (#14421)

19221b7

Stop disconnecting peers for bad response / excessive colocation. (#1…

f65f12f

…4483)

Put Subscriber in Goroutine (#14486)

92f9b55

PeerDAS: Improve initial sync logs (#14496)

a14634e

* `retrieveMissingDataColumnsFromPeers`: Search only for needed peers. * Improve logging.

Data columns initial sync: Rework. (#14522)

e21261e

streamDataColumnBatch: Sort columns by index. (#14542)

9dac676

https://github.com/ethereum/consensus-specs/blob/dev/specs/_features/eip7594/p2p-interface.md#datacolumnsidecarsbyrange-v1 The following data column sidecars, where they exist, MUST be sent in (slot, column_index) order.

nalepae force-pushed the peerDAS branch from 2dd4302 to 3432ffa Compare November 27, 2024 10:51

nalepae added 5 commits November 28, 2024 16:37

Revert "Add error count prom metric (#14670)"

5f17317

This reverts commit b28b1ed.

Merge branch 'peerDAS' into peerDAS-do-not-merge

6537f80

disconnectFromPeer: Remove unused function.

453ea01

Revert "Revert "Add error count prom metric (#14670)""

726e8b9

This reverts commit 5f17317.

Merge branch 'develop' into peerDAS

34ef0da

nalepae force-pushed the peerDAS branch from aef35f3 to 34ef0da Compare December 10, 2024 22:11

nalepae added 5 commits December 17, 2024 22:19

Merge branch 'develop' into peerDAS

361e575

Merge branch 'develop' into peerDAS

f882bd2

Activate peerDAS at electra. (#14734)

859ac00

Merge branch 'develop' into peerDAS

56c73e7

Merge branch 'develop' into peerDAS

de05b83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement PeerDAS #14129

Implement PeerDAS #14129

nalepae commented Jun 21, 2024 •

edited

Loading

Implement PeerDAS #14129

Are you sure you want to change the base?

Implement PeerDAS #14129

Conversation

nalepae commented Jun 21, 2024 • edited Loading

Remaining tasks:

nalepae commented Jun 21, 2024 •

edited

Loading