Scale built in Measure units #311

epompeii · 2024-02-03T12:16:18Z

Currently, the built in units use the most exact possible units. For example Latency uses nanoseconds.
This can lead to some rather ridiculously large Metrics (ex 13,000,000,000,000 nanoseconds).
In the Perf Plots:

Create a way to scale these units
Auto-select the best scale based on the current dataset

The text was updated successfully, but these errors were encountered:

epompeii · 2024-05-03T10:27:21Z

This will be pretty trivial to implement for the built in Measure units.
However, I don't want the built in Measures to be magical or treated any differently than custom Measures.

With that said, I think that Measures should be able to supply a scale_units function that implements a set (TBD) interface.

For security and approachability reasons, I think that AssemblyScript would be a good choice here, with the scale_units functions run in WASM.

epompeii · 2024-07-12T21:13:19Z

This should also work in the benchmark results table, so the solution here should be persisted across all views. That is, this should not just be for the Perf Plot UI.

thomaseizinger · 2024-07-23T21:34:48Z

I want to suggest to not couple this to the metric itself. In fact, even the unit itself seems to be unnecessarily coupled to the metric?

We have some benchmarks that measure throughout. For those particular ones, the metric is bits / s in the MBit range. But for other benchmarks, the same metric could mean something else in a different range, right?

So perhaps this needs to be separate from the metrics and instead associated with a benchmark to make the plots look nice.

epompeii · 2024-07-24T20:23:25Z

I think what you are advocating for is the way things work now.

Benchmarks are named performance regression tests. (ex my_benchmark)
Metrics are a single, point-in-time performance regression test result. (ex 42.0)
Measures are the unit of measurement for a Metric. (ex Foo as foos / second)

For those particular ones, the metric is bits / s in the MBit range

So the Measure here would be Throughput as bits / s.

When this issue is completed, Bencher would have a way to scale bits / s to MB / s for your results (Metrics) in the MBit range.

Does that make sense? Let me know if I'm missing anything for your use case though.

thomaseizinger · 2024-07-24T22:31:58Z

For those particular ones, the metric is bits / s in the MBit range

So the Measure here would be Throughput as bits / s.

When this issue is completed, Bencher would have a way to scale bits / s to MB / s for your results (Metrics) in the MBit range.

Does that make sense? Let me know if I'm missing anything for your use case though.

Currently, a unit is associated with a measure, like bits / s with throughput.

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?

Would you expect to define more specific throughputs then? It feels odd that a measure is something you define as a separate entity to be reused across benchmarks, even though it might not apply to all benchmarks in a project. Adding scaling to this complicates this further: What if two benchmarks in the same project use bits / s but one is in the GB/s range and the other in KB/s. Do I need to make two different measures for this too?

epompeii · 2024-07-24T23:41:16Z

Currently, a unit is associated with a measure, like bits / s with throughput.

Correct!

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?

Would you expect to define more specific throughputs then?

I would recommend creating two Measures:

Bandwidth as bits / second
Syscalls as syscalls / seconds

It feels odd that a measure is something you define as a separate entity to be reused across benchmarks, even though it might not apply to all benchmarks in a project.

Can you help me understand why this feels odd?

Bencher needs to be able to support benchmarks that have 1 to many different measurements. We can't assume that all of these measurements will always be used by all benchmarks though.

For example:

Benchmark A you collect wall clock time
Benchmark B you collect wall clock time and four different instruction counts
Benchmark C you collect four different instruction counts and heap allocations

Adding scaling to this complicates this further: What if two benchmarks in the same project use bits / s but one is in the GB/s range and the other in KB/s. Do I need to make two different measures for this too?

The current thought is that you would use one Measure for both of these cases using the lowest, indivisible units as the "base". For your example this would be bits / s. When this issue is completed, the UI will scale the units as necessary. The Measure would provide an interface for this scaling.

So when viewing the benchmarks that are in the KB / s range, then the results will be plotted in KB / s and the benchmarks in the bits / s range would be plotted in bits / s.

thomaseizinger · 2024-07-25T01:39:09Z

What if I have multiple benchmarks in my project that all have a form of throughput but one of them is in bits / s and the other in syscalls / s?
Would you expect to define more specific throughputs then?

I would recommend creating two Measures:
1. `Bandwidth` as ` bits / second`

2. `Syscalls` as `syscalls / seconds`

Okay, this makes more sense! I think I got confused because the throughput metrics was pre-created and said something generic like "operations / s" so I thought this was meant to be used for all kinds of throughputs. Maybe that could be made more specific so it is clear that creating new measures is something one is expected to do?

epompeii · 2024-07-25T15:15:43Z

I think I got confused because the throughput metrics was pre-created and said something generic like "operations / s" so I thought this was meant to be used for all kinds of throughputs.

Ah, yeah. The built-in Throughput is there because some of the general purpose benchmarking harnesses report throughput. The units they all use for the numerator is likewise rather generic. I just went with operations to standardize across them.

Maybe that could be made more specific so it is clear that creating new measures is something one is expected to do?

Definitely! The only nuance is that it is only expected when creating a custom benchmarking harness.
I have some verbage to this effect in the how to track custom benchmarks docs. However, I specifically chose to use the built-in Latency Measure for the example to keep things simple. It may be worth adding a second, custom Measure example there as well.

thomaseizinger · 2024-07-25T22:10:23Z

~~Maybe "Throughput" could be named "dummy-throughput" instead? Or, "mock-throughput"?~~

nvm, just read the thing about custom harnesses. Wouldn't it always make sense for users to create a custom measure? Perhaps the measure for built-in adapters need to be configurable via an env var so users can set it?

epompeii · 2024-07-26T20:50:19Z

Wouldn't it always make sense for users to create a custom measure?

It would not always make sense.

If one is using a built-in adapter, they will use its built-in Measure(s).
If one is using a custom adapter, they may want to:
- Use built-in Measure(s) (ex a custom harness that still measures Latency as nanoseconds)
- Use custom Measure(s)

Perhaps the measure for built-in adapters need to be configurable via an env var so users can set it?

This is up to the benchmarking harness itself to support. I have yet to see this in the wild though.

epompeii · 2024-12-21T23:31:21Z

Completed: https://bencher.dev/docs/reference/changelog/#v0431

The way this is implemented is very simple. Just the name of the Measure units is checked. If name of the units is one of the following in the first level of the list, then there is built-in "smart" auto-scaling to all of the units in the second level of the list:

nanoseconds (ns)
- microseconds (µs)
- milliseconds (ms)
- seconds (s)
- minutes (m)
- hours (h)
seconds (s)
- minutes (m)
- hours (h)
bytes (B)
- kilobytes (KB)
- megabytes (MB)
- gigabytes (GB)
- terabytes (TB)
- petabytes (PB)

For all other Measure units, auto-scaling simply uses exponential notation with the existing units name:

1e3 x units
1e6 x units
1e9 x units
1e12 x units
1e15 x units

For Perf Plots and Perf Images, the auto-scaling is based on the minimum value for ALL visible elements.

For Reports and PR comments, the benchmark results table auto-scaling is based on the minimum value for EACH Measure. That is, the auto-scaling is independent for each Measure. For the Alerts table, the auto-scaling is independent for each row.

For example, if there is just a single Metric with a value of 20,000,000 for a Latency Measure with the units nanoseconds (ns) then that value would be "smart" auto-scaled to 20 with the units milliseconds (ms).

Note, that since this is based solely on the name of the units, a custom Measure can still benefit from "smart" auto-scaling units if they named their units nanoseconds (ns). However, if the custom Measure used the custom units Flops, then the above example would be auto-scaled to 20 with the units of 1e6 x Flops.

mrazauskas · 2024-12-23T15:00:17Z

@epompeii This is great improvement. Since all of my benchmarks were reading seconds, now I can understand them much better.

One detail. Would it be possible to apply the same logic for hover message? Here is an example:

epompeii · 2024-12-23T15:13:50Z

@mrazauskas I'm glad that the auto-scaling is helpful. It is definitely possible to use the auto-scaled values in the hover over. I opted to keep the raw value for the time being to see what users preferred. I'll add a +1 to the scaled tally. 😃

thomaseizinger · 2024-12-23T15:35:01Z

is there also auto-scaling for throughput?

epompeii · 2024-12-23T16:39:40Z

is there also auto-scaling for throughput?

@thomaseizinger there is, however it is currently handled by the catch-all. For example, 100,000,000 with the units of ops/sec would scale to 100 with the units of 1e6 x ops/sec.

Because throughput is operations divided by time, I would have to actually scale down the time units in order to "smart" auto-scale. So the example above would scale to 100 with the units of ops/microsecond. Is that what you want or something else?

thomaseizinger · 2024-12-24T09:16:48Z

In our case, throughput is effectively in the hundreds of megabits per second. It would be nice to see that in the correct scals instead of 1e6 x bits / sec (although that is pretty good already too).

epompeii · 2024-12-24T14:46:16Z

It would be nice to see that in the correct scals instead of 1e6 x bits / sec (although that is pretty good already too).

Thank you for the feedback, @thomaseizinger. I have added a tracking issue: #544

thomaseizinger mentioned this issue Jul 23, 2024

ci: use bencher.dev for continuous benchmarking firezone/firezone#5915

Merged

epompeii closed this as completed Dec 21, 2024

epompeii mentioned this issue Dec 24, 2024

Scale custom Measure units #544

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale built in Measure units #311

Scale built in Measure units #311

epompeii commented Feb 3, 2024

epompeii commented May 3, 2024

epompeii commented Jul 12, 2024

thomaseizinger commented Jul 23, 2024

epompeii commented Jul 24, 2024

thomaseizinger commented Jul 24, 2024

epompeii commented Jul 24, 2024

thomaseizinger commented Jul 25, 2024

epompeii commented Jul 25, 2024

thomaseizinger commented Jul 25, 2024 •

edited

Loading

epompeii commented Jul 26, 2024

epompeii commented Dec 21, 2024

mrazauskas commented Dec 23, 2024

epompeii commented Dec 23, 2024

thomaseizinger commented Dec 23, 2024

epompeii commented Dec 23, 2024

thomaseizinger commented Dec 24, 2024

epompeii commented Dec 24, 2024

Scale built in Measure units #311

Scale built in Measure units #311

Comments

epompeii commented Feb 3, 2024

epompeii commented May 3, 2024

epompeii commented Jul 12, 2024

thomaseizinger commented Jul 23, 2024

epompeii commented Jul 24, 2024

thomaseizinger commented Jul 24, 2024

epompeii commented Jul 24, 2024

thomaseizinger commented Jul 25, 2024

epompeii commented Jul 25, 2024

thomaseizinger commented Jul 25, 2024 • edited Loading

epompeii commented Jul 26, 2024

epompeii commented Dec 21, 2024

mrazauskas commented Dec 23, 2024

epompeii commented Dec 23, 2024

thomaseizinger commented Dec 23, 2024

epompeii commented Dec 23, 2024

thomaseizinger commented Dec 24, 2024

epompeii commented Dec 24, 2024

thomaseizinger commented Jul 25, 2024 •

edited

Loading