Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale built in Measure units #311

Closed
epompeii opened this issue Feb 3, 2024 · 17 comments
Closed

Scale built in Measure units #311

epompeii opened this issue Feb 3, 2024 · 17 comments

Comments

@epompeii
Copy link
Member

epompeii commented Feb 3, 2024

Currently, the built in units use the most exact possible units. For example Latency uses nanoseconds.
This can lead to some rather ridiculously large Metrics (ex 13,000,000,000,000 nanoseconds).
In the Perf Plots:

  1. Create a way to scale these units
  2. Auto-select the best scale based on the current dataset
@epompeii
Copy link
Member Author

epompeii commented May 3, 2024

This will be pretty trivial to implement for the built in Measure units.
However, I don't want the built in Measures to be magical or treated any differently than custom Measures.

With that said, I think that Measures should be able to supply a scale_units function that implements a set (TBD) interface.

For security and approachability reasons, I think that AssemblyScript would be a good choice here, with the scale_units functions run in WASM.

@epompeii
Copy link
Member Author

This should also work in the benchmark results table, so the solution here should be persisted across all views. That is, this should not just be for the Perf Plot UI.

@thomaseizinger
Copy link

I want to suggest to not couple this to the metric itself. In fact, even the unit itself seems to be unnecessarily coupled to the metric?

We have some benchmarks that measure throughout. For those particular ones, the metric is bits / s in the MBit range. But for other benchmarks, the same metric could mean something else in a different range, right?

So perhaps this needs to be separate from the metrics and instead associated with a benchmark to make the plots look nice.

@epompeii
Copy link
Member Author

I think what you are advocating for is the way things work now.

  • Benchmarks are named performance regression tests. (ex my_benchmark)
  • Metrics are a single, point-in-time performance regression test result. (ex 42.0)
  • Measures are the unit of measurement for a Metric. (ex Foo as foos / second)

For those particular ones, the metric is bits / s in the MBit range

So the Measure here would be Throughput as bits / s.

When this issue is completed, Bencher would have a way to scale bits / s to MB / s for your results (Metrics) in the MBit range.

Does that make sense? Let me know if I'm missing anything for your use case though.

@thomaseizinger
Copy link

For those particular ones, the metric is bits / s in the MBit range

So the Measure here would be Throughput as bits / s.

When this issue is completed, Bencher would have a way to scale bits / s to MB / s for your results (Metrics) in the MBit range.

Does that make sense? Let me know if I'm missing anything for your use case though.

Currently, a unit is associated with a measure, like bits / s with throughput.

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?

Would you expect to define more specific throughputs then? It feels odd that a measure is something you define as a separate entity to be reused across benchmarks, even though it might not apply to all benchmarks in a project. Adding scaling to this complicates this further: What if two benchmarks in the same project use bits / s but one is in the GB/s range and the other in KB/s. Do I need to make two different measures for this too?

@epompeii
Copy link
Member Author

Currently, a unit is associated with a measure, like bits / s with throughput.

Correct!

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?

Would you expect to define more specific throughputs then?

I would recommend creating two Measures:

  1. Bandwidth as bits / second
  2. Syscalls as syscalls / seconds

It feels odd that a measure is something you define as a separate entity to be reused across benchmarks, even though it might not apply to all benchmarks in a project.

Can you help me understand why this feels odd?

Bencher needs to be able to support benchmarks that have 1 to many different measurements. We can't assume that all of these measurements will always be used by all benchmarks though.

For example:

  • Benchmark A you collect wall clock time
  • Benchmark B you collect wall clock time and four different instruction counts
  • Benchmark C you collect four different instruction counts and heap allocations

Adding scaling to this complicates this further: What if two benchmarks in the same project use bits / s but one is in the GB/s range and the other in KB/s. Do I need to make two different measures for this too?

The current thought is that you would use one Measure for both of these cases using the lowest, indivisible units as the "base". For your example this would be bits / s. When this issue is completed, the UI will scale the units as necessary. The Measure would provide an interface for this scaling.

So when viewing the benchmarks that are in the KB / s range, then the results will be plotted in KB / s and the benchmarks in the bits / s range would be plotted in bits / s.

@thomaseizinger
Copy link

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?
Would you expect to define more specific throughputs then?

I would recommend creating two Measures:

1. `Bandwidth` as ` bits / second`

2. `Syscalls` as `syscalls / seconds`

Okay, this makes more sense! I think I got confused because the throughput metrics was pre-created and said something generic like "operations / s" so I thought this was meant to be used for all kinds of throughputs. Maybe that could be made more specific so it is clear that creating new measures is something one is expected to do?

@epompeii
Copy link
Member Author

I think I got confused because the throughput metrics was pre-created and said something generic like "operations / s" so I thought this was meant to be used for all kinds of throughputs.

Ah, yeah. The built-in Throughput is there because some of the general purpose benchmarking harnesses report throughput. The units they all use for the numerator is likewise rather generic. I just went with operations to standardize across them.

Maybe that could be made more specific so it is clear that creating new measures is something one is expected to do?

Definitely! The only nuance is that it is only expected when creating a custom benchmarking harness.
I have some verbage to this effect in the how to track custom benchmarks docs. However, I specifically chose to use the built-in Latency Measure for the example to keep things simple. It may be worth adding a second, custom Measure example there as well.

@thomaseizinger
Copy link

thomaseizinger commented Jul 25, 2024

Maybe "Throughput" could be named "dummy-throughput" instead? Or, "mock-throughput"?

nvm, just read the thing about custom harnesses. Wouldn't it always make sense for users to create a custom measure? Perhaps the measure for built-in adapters need to be configurable via an env var so users can set it?

@epompeii
Copy link
Member Author

Wouldn't it always make sense for users to create a custom measure?

It would not always make sense.

  • If one is using a built-in adapter, they will use its built-in Measure(s).
  • If one is using a custom adapter, they may want to:
    • Use built-in Measure(s) (ex a custom harness that still measures Latency as nanoseconds)
    • Use custom Measure(s)

Perhaps the measure for built-in adapters need to be configurable via an env var so users can set it?

This is up to the benchmarking harness itself to support. I have yet to see this in the wild though.

@epompeii
Copy link
Member Author

Completed: https://bencher.dev/docs/reference/changelog/#v0431

The way this is implemented is very simple. Just the name of the Measure units is checked. If name of the units is one of the following in the first level of the list, then there is built-in "smart" auto-scaling to all of the units in the second level of the list:

  • nanoseconds (ns)
    • microseconds (µs)
    • milliseconds (ms)
    • seconds (s)
    • minutes (m)
    • hours (h)
  • seconds (s)
    • minutes (m)
    • hours (h)
  • bytes (B)
    • kilobytes (KB)
    • megabytes (MB)
    • gigabytes (GB)
    • terabytes (TB)
    • petabytes (PB)

For all other Measure units, auto-scaling simply uses exponential notation with the existing units name:

  • 1e3 x units
  • 1e6 x units
  • 1e9 x units
  • 1e12 x units
  • 1e15 x units

For Perf Plots and Perf Images, the auto-scaling is based on the minimum value for ALL visible elements.

For Reports and PR comments, the benchmark results table auto-scaling is based on the minimum value for EACH Measure. That is, the auto-scaling is independent for each Measure. For the Alerts table, the auto-scaling is independent for each row.

For example, if there is just a single Metric with a value of 20,000,000 for a Latency Measure with the units nanoseconds (ns) then that value would be "smart" auto-scaled to 20 with the units milliseconds (ms).

Note, that since this is based solely on the name of the units, a custom Measure can still benefit from "smart" auto-scaling units if they named their units nanoseconds (ns). However, if the custom Measure used the custom units Flops, then the above example would be auto-scaled to 20 with the units of 1e6 x Flops.

@mrazauskas
Copy link

@epompeii This is great improvement. Since all of my benchmarks were reading seconds, now I can understand them much better.

One detail. Would it be possible to apply the same logic for hover message? Here is an example:

Screenshot 2024-12-23 at 16 59 00

@epompeii
Copy link
Member Author

@mrazauskas I'm glad that the auto-scaling is helpful. It is definitely possible to use the auto-scaled values in the hover over. I opted to keep the raw value for the time being to see what users preferred. I'll add a +1 to the scaled tally. 😃

@thomaseizinger
Copy link

is there also auto-scaling for throughput?

@epompeii
Copy link
Member Author

is there also auto-scaling for throughput?

@thomaseizinger there is, however it is currently handled by the catch-all. For example, 100,000,000 with the units of ops/sec would scale to 100 with the units of 1e6 x ops/sec.

Because throughput is operations divided by time, I would have to actually scale down the time units in order to "smart" auto-scale. So the example above would scale to 100 with the units of ops/microsecond. Is that what you want or something else?

@thomaseizinger
Copy link

In our case, throughput is effectively in the hundreds of megabits per second. It would be nice to see that in the correct scals instead of 1e6 x bits / sec (although that is pretty good already too).

@epompeii
Copy link
Member Author

It would be nice to see that in the correct scals instead of 1e6 x bits / sec (although that is pretty good already too).

Thank you for the feedback, @thomaseizinger. I have added a tracking issue: #544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants