-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PeerDAS metrics: add data column, kzg, custody metrics #14
base: master
Are you sure you want to change the base?
Changes from all commits
7976fc5
85262be
024a544
f748c61
4cdc1e1
0384ab4
54ea9ae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,10 +30,30 @@ The following are the minimal metrics agreed to be conformed by the various clie | |
| `beacon_processed_deposits_total` | Gauge | Total number of deposits processed | On epoch transition | | ||
|
||
\* All `*_root` values are converted to signed 64-bit integers utilizing the last 8 bytes interpreted as little-endian (`int.from_bytes(root[24:32], byteorder='little', signed=True)`). | ||
|
||
### PeerDAS Metrics | ||
|
||
The following metrics are proposed to be added to clients for PeerDAS monitoring. This list is open for discussion. Each client has the opportunity to contribute to it by suggesting additions or disputing existing metrics. | ||
|
||
#### Data column, kzg, custody metrics | ||
|
||
| Name | Metric type | Usage | Sample collection event | | ||
|--------------------------------------------|-------------|-------------------------------------------------------------|----------------------| | ||
| `beacon_data_column_sidecar_processing_requests_total` | Counter | Number of data column sidecars submitted for processing | On data column sidecar gossip verification | | ||
| `beacon_data_column_sidecar_processing_successes_total` | Counter | Number of data column sidecars verified for gossip | On data column sidecar gossip verification | | ||
| `beacon_data_column_sidecar_gossip_verification_milliseconds` | Histogram | Full runtime of data column sidecars gossip verification | On data column sidecar gossip verification | | ||
| `beacon_data_availability_reconstructed_columns_total` | Counter | Total count of reconstructed columns | On data column kzg verification | | ||
| `beacon_data_availability_reconstruction_time_milliseconds` | Histogram | Time taken to reconstruct columns | On data column kzg verification | | ||
| `beacon_data_column_sidecar_computation_seconds` | Histogram | Time taken to compute data column sidecar, including cells, proofs and inclusion proof | On data column sidecar computation | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it worth suggesting a historgram bucket? e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've also implemented a |
||
| `beacon_data_column_sidecar_inclusion_proof_verification_milliseconds` | Histogram | Time taken to verify data column sidecar inclusion proof | On data column sidecar inclusion proof verification | | ||
| `beacon_kzg_verification_data_column_single_milliseconds` | Histogram | Runtime of single data column kzg verification | On single data column kzg verification | | ||
| `beacon_kzg_verification_data_column_batch_milliseconds` | Histogram | Runtime of batched data column kzg verification | On batched data column kzg verification | | ||
| `beacon_custody_columns_count_total` | Counter | Total count of columns in custody within the data availability boundary | On custody collecting and verification | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there another metric for tracking the custody count? |
||
|
||
### Additional Metrics | ||
|
||
The following are proposed metrics to be added to clients. This list is _not_ stable and is subject to drastic changes, deletions, and additions. The additional metric list is being | ||
discussed, we are yet to reach consensus. Ideally we would also discuss which of these values need to be counters, guages or histograms. | ||
discussed, we are yet to reach consensus. Ideally we would also discuss which of these values need to be counters, guages or histograms. | ||
|
||
| Name | Metric type | Usage | Sample collection event | | ||
|----------------------------------------------|-------------|--------------------------------------------------------------------------------------|---------------------| | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name and description may be a bit confusing here, as "processing" could mean different things in different clients. The description makes sense in Lighthouse because we have a task scheduling component called
BeaconProcessor
, and all the tasks are "submitted" to this scheduler for processing - so you see this terminology used quite frequently in our metrics.It's more of a convenient metric rather than a must-have in Lighthouse, as we can get the same data with
count(beacon_data_column_sidecar_gossip_verification_seconds)
. IMO we should probably minimise the number of standardise metrics, so we don't force all clients to implement metrics that aren't necessary for them. It comes with maintenance cost (once introduced, renaming / removing would be a breaking change) as well as extra prometheus storage cost.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may still make sense to have this metric though.
For a few of our metrics we discard the timer if the operations fails, so it may not reflect the actual attemtpt count.
Is it also worth mentioning whether we should discard a timer metric if the operation fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, I see this implemented in Teku already, if it make sense feel free to ignore the above comment!