Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: metrics dumps & extended metrics #2519

Merged
merged 2 commits into from
Jul 22, 2024
Merged

Conversation

Arqu
Copy link
Collaborator

@Arqu Arqu commented Jul 17, 2024

Description

Extends several pieces:

  • introduces a "metric dumper" which is just a way of saying you can sample the internal metrics and write them to a CSV file (which should come in handy for 3 pieces that should come down the line; 1) CI using these to validate behavior, 2) debug dumps from 3rd parties, 3) local debugging)
  • along with the dumper the doctor plot has been extended to be able to read those dumps and also just generally improved some rough edges so it's less error prone.
  • node counts are here for relays. You now have a derived metric which simply counts unique daily node connections.

Sample usage for the metrics dumper:
cargo run --bin iroh --all-features -- --metrics-dump-path test.metrics.csv start

Sample of the plotter:
cargo run --bin iroh --all-features -- doctor plot --timeframe 30 --interval 10 --file metrics.dump.csv magicsock_actor_tick_main_total,magicsock_actor_tick_msg_total,magicsock_actor_tick_endpoint_heartbeat_total,magicsock_actor_tick_endpoints_update_receiver_total,magicsock_actor_tick_re_stun_total

Breaking Changes

Notes & open questions

This is merging into #2464 as part of the larger metrics refactor.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.

@Arqu Arqu requested review from flub and dignifiedquire July 17, 2024 21:44
@Arqu Arqu self-assigned this Jul 17, 2024
@Arqu Arqu added the metrics extracting quantified mesurements from iroh label Jul 17, 2024
@dignifiedquire
Copy link
Contributor

ci is very sad

match new_conn {
true => 1,
false => 0,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_conn as u64

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally dislike that as I never feel I can trust that implicit conversion from bool to int in regards to compiler/platforms.

@@ -41,6 +41,10 @@ pub(crate) struct Cli {
/// Port to serve metrics on. Disabled by default.
#[clap(long)]
pub(crate) metrics_port: Option<MetricsPort>,

/// If set, metrics will be dumped in CSV format to the specified path at regular intervals (100ms).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't 100ms very frequent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intentional for us to have very high resolution in this setup, you can use the regular scripts and scrape to grafana if you just want general stats over time. This is meant as a high res debug tool.

iroh-cli/src/commands/start.rs Outdated Show resolved Hide resolved
iroh-metrics/src/service.rs Outdated Show resolved Hide resolved
@Arqu Arqu requested review from flub and dignifiedquire July 18, 2024 22:29
if let Err(e) = m {
error!("Failed to encode metrics: {e:#}");
} else {
let m = m.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use match statement to avoid unwrap?

@Arqu Arqu force-pushed the arqu/fresh_counters branch from d94a592 to 53bb3bd Compare July 22, 2024 14:41
@Arqu Arqu force-pushed the arqu/metrics_vol2 branch from 31602a4 to bcb4b2a Compare July 22, 2024 14:47
@Arqu Arqu force-pushed the arqu/metrics_vol2 branch from 851f3c6 to 8684fe0 Compare July 22, 2024 15:14
@Arqu Arqu requested a review from dignifiedquire July 22, 2024 15:23
@Arqu Arqu merged commit 7f7792a into arqu/fresh_counters Jul 22, 2024
24 of 25 checks passed
@Arqu Arqu deleted the arqu/metrics_vol2 branch July 22, 2024 15:36
Arqu added a commit that referenced this pull request Jul 22, 2024
## Description

Extends several pieces:
- introduces a "metric dumper" which is just a way of saying you can
sample the internal metrics and write them to a CSV file (which should
come in handy for 3 pieces that should come down the line; 1) CI using
these to validate behavior, 2) debug dumps from 3rd parties, 3) local
debugging)
- along with the dumper the `doctor plot` has been extended to be able
to read those dumps and also just generally improved some rough edges so
it's less error prone.
- node counts are here for relays. You now have a derived metric which
simply counts unique daily node connections.


Sample usage for the metrics dumper:
`cargo run --bin iroh --all-features -- --metrics-dump-path
test.metrics.csv start`

Sample of the plotter:
`cargo run --bin iroh --all-features -- doctor plot --timeframe 30
--interval 10 --file metrics.dump.csv
magicsock_actor_tick_main_total,magicsock_actor_tick_msg_total,magicsock_actor_tick_endpoint_heartbeat_total,magicsock_actor_tick_endpoints_update_receiver_total,magicsock_actor_tick_re_stun_total`

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

This is merging into #2464 as
part of the larger metrics refactor.

## Change checklist

- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics extracting quantified mesurements from iroh
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants