-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: metrics dumps & extended metrics #2519
Conversation
ci is very sad |
iroh-net/src/relay/server.rs
Outdated
match new_conn { | ||
true => 1, | ||
false => 0, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_conn as u64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally dislike that as I never feel I can trust that implicit conversion from bool to int in regards to compiler/platforms.
@@ -41,6 +41,10 @@ pub(crate) struct Cli { | |||
/// Port to serve metrics on. Disabled by default. | |||
#[clap(long)] | |||
pub(crate) metrics_port: Option<MetricsPort>, | |||
|
|||
/// If set, metrics will be dumped in CSV format to the specified path at regular intervals (100ms). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't 100ms very frequent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's intentional for us to have very high resolution in this setup, you can use the regular scripts and scrape to grafana if you just want general stats over time. This is meant as a high res debug tool.
iroh-metrics/src/service.rs
Outdated
if let Err(e) = m { | ||
error!("Failed to encode metrics: {e:#}"); | ||
} else { | ||
let m = m.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use match statement to avoid unwrap
?
d94a592
to
53bb3bd
Compare
## Description Extends several pieces: - introduces a "metric dumper" which is just a way of saying you can sample the internal metrics and write them to a CSV file (which should come in handy for 3 pieces that should come down the line; 1) CI using these to validate behavior, 2) debug dumps from 3rd parties, 3) local debugging) - along with the dumper the `doctor plot` has been extended to be able to read those dumps and also just generally improved some rough edges so it's less error prone. - node counts are here for relays. You now have a derived metric which simply counts unique daily node connections. Sample usage for the metrics dumper: `cargo run --bin iroh --all-features -- --metrics-dump-path test.metrics.csv start` Sample of the plotter: `cargo run --bin iroh --all-features -- doctor plot --timeframe 30 --interval 10 --file metrics.dump.csv magicsock_actor_tick_main_total,magicsock_actor_tick_msg_total,magicsock_actor_tick_endpoint_heartbeat_total,magicsock_actor_tick_endpoints_update_receiver_total,magicsock_actor_tick_re_stun_total` ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions This is merging into #2464 as part of the larger metrics refactor. ## Change checklist - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented.
Description
Extends several pieces:
doctor plot
has been extended to be able to read those dumps and also just generally improved some rough edges so it's less error prone.Sample usage for the metrics dumper:
cargo run --bin iroh --all-features -- --metrics-dump-path test.metrics.csv start
Sample of the plotter:
cargo run --bin iroh --all-features -- doctor plot --timeframe 30 --interval 10 --file metrics.dump.csv magicsock_actor_tick_main_total,magicsock_actor_tick_msg_total,magicsock_actor_tick_endpoint_heartbeat_total,magicsock_actor_tick_endpoints_update_receiver_total,magicsock_actor_tick_re_stun_total
Breaking Changes
Notes & open questions
This is merging into #2464 as part of the larger metrics refactor.
Change checklist