Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

astrid observed block wiggles prometheus metric #12308

Open
taratorio opened this issue Oct 14, 2024 · 0 comments · May be fixed by #12888
Open

astrid observed block wiggles prometheus metric #12308

taratorio opened this issue Oct 14, 2024 · 0 comments · May be fixed by #12888
Assignees
Labels
astrid imp2 Medium importance polygon

Comments

@taratorio
Copy link
Member

taratorio commented Oct 14, 2024

We would like to record "observed wiggles" duration metric in a Prometheus dashboard when the Astrid Block Consumer receives blocks from p2p at the chain tip. Probably using a summary metric like in UpdateBlockConsumerHeaderDownloadDelay works.

Docs explaining what backup block producer wiggle times are in Bor - https://docs.polygon.technology/pos/architecture/bor/introduction/#wiggle-time-delay-before-backup-production

Example of how to calculate wiggle (this one is done for logging purposes in a different part of the flow but it is what we want to have a metric for) - https://github.com/erigontech/erigon/blob/main/polygon/bor/bor.go#L1203

wiggleSeconds = succession(signer(header)) * backupMultiplier

Calculating the wiggle is based on the signer (synonym for block producer/proposer) succession number.

For this, might be best to add a new HeaderWiggleCalculator (analogous to the existing HeaderDifficultyCalculator) with a function Wiggle(header) (uint64, error). Make sure to use the same signer's signaturesCache for efficiency. This may mean doing a small refactor to pass the same signaturesCache to the CanonicalChainBuilderFactory and to the HeaderWiggleCalculator.

A good place (option 1) to record this observation might be inside the Sync.applyNewBlockOnTip function. After connecting a new block using the canonical chain builder we get back a slice of newConnectedHeaders. We can loop over these to calculate the wiggle using the wiggle calculator and update the metric. The newConnectedHeaders slice should in most cases be quite short (bounded by the average milestone length in the worst case - distance from root of canonical chain build to the tip which is usually ~30 blocks on average on mainnet) so there should not be much of a performance impact of doing this on the hot path of chain tip processing. If concerned about this performance penalty then this metric update can be done async by spawning another goroutine or using a channel to some consumer that async will update the metric given the blocks. If taking the async approach make sure that there is a guarantee that all blocks will have their wiggles recorded in the same sequence as the headers were seen.

This model of ordered async processing of blocks already exists in the form of the polygon/sync package executionClientStore which is responsible for storing the blocks. Maybe this metric can be updated as part of its insertBlocks function (option 2). Another benefit of this is that the metric will then also be recorded if we hit the handleMilestoneTipMismatch flow and DownloadBlocksUsingMilestone. Also this metric will then be recorded for the blocks part of the syncToTip initial sync mode. Inside insertBlocks we can spawn a new goroutine for looping over all of the blocks and updating the wiggles metric using the wiggle calculator. To ensure ordering is correct id wait on that goroutine to finish at the very end just before exiting insertBlocks function. With this approach the metric will be updated in parallel while we are inserting blocks via the execution engine meaning there won't be any performance impact.

Another place (option 3) for this may be inside canonicalChainBuilder.Connect after all validation checks are done and just before adding a new forkTreeNode. The downside of this approach is that it introduces unnecessary metric logic to the canonical chain builder which we want to keep as slim and simple as possible and as business logic agnostic as possible.

Considering all options above, I'd recommend option 2.

@taratorio taratorio added this to the astrid-m5-cleanup milestone Oct 14, 2024
@taratorio taratorio added the imp2 Medium importance label Oct 14, 2024
@eastorski eastorski self-assigned this Nov 26, 2024
@eastorski eastorski linked a pull request Nov 27, 2024 that will close this issue
@eastorski eastorski linked a pull request Nov 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
astrid imp2 Medium importance polygon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants