-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch peers p2p #559
Fetch peers p2p #559
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my only real comment of value for you is the logger.Fatal
comment. Everything else is more of an opportunity for you to inform me 😄
ip netip.AddrPort | ||
} | ||
|
||
func (c *PeerConfig) Validate() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have a constructor newPeerConfig
that does the validation before building the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do validate at construction time in the top-level config constructor: https://github.com/ava-labs/awm-relayer/blob/main/relayer/config/viper.go#L21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the cleanest pattern, but we have to unmarshal the config and validate it in separate steps regardless, it's just a question of code organization.
// Maximum amount of time to spend waiting (in addition to network round trip time per attempt) | ||
// during relayer signature query routine | ||
signatureRequestRetryWaitPeriodMs = 10_000 | ||
signatureRequestRetryWaitPeriodMs = 20_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's in the description, but why were these values doubled? (Just for my own knowledge)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not be necessary anymore, but I increased this to account for the increased warm up time when 1) tracking a new subnet or 2) adding/removing validators. With ValidatorManager
churn tracking, the latter case is likely rare, but the former will be common on the signature aggregator side. The current implementation eliminates this warm up latency for the new subnet case, so it's probably safe to reduce this back to the original value.
warpMessage = getWarpMessageFromLog(ctx, receipt, subnetBInfo) | ||
|
||
reqBody = api.AggregateSignatureRequest{ | ||
Message: "0x" + hex.EncodeToString(warpMessage.Bytes()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a helper for prefixing with a 0x
!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some helpers here and there in various dependencies, but nothing immediately available that I'm able to find. We only do this a couple times and then only in tests, so I'm not too concerned.
peers/app_request_network.go
Outdated
for _, peer := range peers { | ||
vdrs, err := pClient.GetCurrentValidators(context.Background(), constants.PrimaryNetworkID, []ids.NodeID{peer.ID}) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably return an error here to be consistent with the rest of this function
peers/app_request_network.go
Outdated
// Only track the peer if it is a primary network validator | ||
if len(vdrs) == 0 { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're assuming that if the peer knows of any p-chain validators, it itself is a primary network validator? Shouldn't subnet-only validators know of some p-chain validators because they have to sync the p-chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is just a confirmation that the peer returned by the infoAPI.Peers
is indeed a primary network validator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is just a confirmation that the peer returned by the infoAPI.Peers is indeed a primary network validator
Yes, this is the case here. We're using pClient.GetCurrentValidators
to filter the peers returned by info.Peers
. We could definitely do this in one rpc call though. Making individual rpc calls for each peer is pretty inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to make one call to GetCurrentValidators
peers/app_request_network.go
Outdated
// Only track the peer if it is a primary network validator | ||
if len(vdrs) == 0 { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is just a confirmation that the peer returned by the infoAPI.Peers
is indeed a primary network validator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the only outstanding comment I think is @geoff-vball 's about the comment being misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left 3 comments that are all related to a bug you had in one commit but then subsequently fixed. You're welcome to take the code organization suggestion or just merge as is :)
Why this should be merged
Connects to peers over p2p, rather than via the info API. Post Etna, info API nodes are not expected to track peers for all subnets, so we instead need to fetch peers directly from primary network validators, which do track all peers across all L1s. We do this by bootstrapping the network with connections to primary network validators (which are still fetched from the info API), and discovering peers via peer gossip. Peer gossip only tracks peers that correspond to known validators, so we also need to maintain an up-to-date view of the validators of the L1s we wish to connect to (including the primary network). This change leverages ava-labs/icm-contracts#641 to confirm that we can connect to L1s with disjoint validator sets from the primary network.
Note: This PR uses a modified version of Avalanchego v1.12.0-fuji that relaxes the outbound peer request handling. Once this change is formalized and included in an Avalanchego release, we will circle back and update the dependency. This workaround is safe because it only impacts outbound messages from the signature aggregator peer.
How this works
How this was tested
CI
How is this documented
N/A