OPTE Benchmarks

OPTE maintains two sets of benchmarks: userland microbenchmarks, and kernel module benchmarks. Userland benchmarks can be run on most development machines, while the kernel module benchmarks will require a full Helios install and additional lab setup depending on what benchmarks you want to run.

Benchmark outputs are located in opte/target/criterion, and any flamegraphs built during kmod benchmarks are placed into opte/target/xde-bench.

Userland Benchmarks

We use criterion to measure and profile individual packet processing times for slow-/fast-path traffic as well as generated hairpin packets.

These can be called using cargo ubench, or cargo bench --package opte-bench --bench userland — <options>. This benchmark runner uses the standard criterion CLI. To see a clean list of available benchmarks, use the cargo ubench --list 2> /dev/null | sort | uniq command.

Benchmarks are split into several categories:

Metric: wallclock, alloc_ct, alloc_sz.
Action: parse, process.
Packet family.

Kernel Module Benchmarks

The kernel module benchmarks can be called using cargo kbench, or cargo bench --package opte-bench --bench xde — <options>. They require that:

you are running on an up-to-date Helios instance.
the XDE kernel module and opteadm are installed, either via IPS or the cargo xtask install command.
you have installed the IPS packages flamegraph, demangle, iperf and sparse.

They implement zont-to-zone iperf traffic in two scenarios:

cargo kbench local on one machine. This uses an identical test setup to xde-tests/loopback. Two sparse zones will be created on the current machine, with simnet links being used as an underlay network. This is lower fidelity than the below two-node setup.
cargo kbench server and cargo kbench remote <SERVER_IP> on two separate machines. One zone will be created on each machine (running an iperf server and client respectively), using the shared lab/home network to exchange link local addresses.

Below you can find a lab setup which suffices for the second option. Currently, linklocals must be created with the name syntax <nic>/ll: this can be done using, e.g., pfexec ipadm create-addr igb0/ll -T addrconf. The benchmark defaults to using the NICs igb0 and igb1, and can be overridden to match your setup using the --underlay-nics option. E.g., when testing over a Chelsio NIC --underlay-nics cxgbe0 cxgbe1 will select these devices and use the link-local addresses cxgbe0/ll and cxgbe1/ll. Additionally, MTUs should be set to 9000 for physical underlay links.

fe80::a236:9fff:fe0c:2586            fe80::a236:9fff:fe0c:25b6
fe80::a236:9fff:fe0c:2587            fe80::a236:9fff:fe0c:25b7
            ┌─────────────────────────────────────┐
            │                                     │
            │         ┌─────────────────┐         │
            │         │                 │         │
       igb0┌┴┐       ┌┴┐igb1       igb1┌┴┐       ┌┴┐igb0
         ╔═╩═╩═══════╩═╩═╗           ╔═╩═╩═══════╩═╩═╗
         ║ cargo kbench  ║░          ║ cargo kbench  ║░
         ║    remote     ║░          ║    server     ║░
         ║ 10.0.125.173  ║░          ║               ║░
         ╚══════╦═╦══════╝░          ╚══════╦═╦══════╝░
          ░░░░░░░│░░░░░░░░░           ░░░░░░░│░░░░░░░░░
  10.0.147.187/8                               10.0.125.173/8
                 │      ┌ ─ ─ ─ ─ ─ ┐        │
                          Lab/Home
                 └ ─ ─ ▶│  Network  │◀ ─ ─ ─ ┘
                         ─ ─ ─ ─ ─ ─

Connecting igb0<→igb0, etc., is not a requirement, as NDP tables are inspected for inserting underlay network routes.

In both scenarios, the benchmark harness will run iperf in client-to-server and server-to-client modes, and will record periodic stack information and timings using dtrace. These are converted into flamegraphs and timing data for further analysis by criterion.

In-situ measurement

The kernel module benchmark harness can be moved onto a gimlet or other development system for measurement. The path to the binary can be found using the command:

cargo bench --package opte-bench \
  --no-run --message-format json-render-diagnostics \
  | jq -r -s "map( \
      select(.reason==\"compiler-artifact\") \
      | select( \
          .target.kind\
          | map_values(.==\"bench\") \
          | any \
      ) \
      | select(.target.name==\"xde\") \
  ) | map(.executable)"

Once the binary is moved onto the global zone of a target machine, measurements can be taken using xde in-situ. On a gimlet we add the -d flag as we do not have access to flamegraph. This places captured stacks into the xde-bench folder.

$ ./xde in-situ expt-name -d
# ...
exit

$ ls -R xde-bench
xde-bench:
expt-name

xde-bench/expt-name:
histos.out  raw.stacks

Measured data in xde-bench can be moved and processed into flamegraphs and histograms on any development machine using the command ./xde in-situ expt-name -c none.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.adoc

README.adoc

OPTE Benchmarks

Userland Benchmarks

Kernel Module Benchmarks

In-situ measurement

Files

README.adoc

Latest commit

History

README.adoc

File metadata and controls

OPTE Benchmarks

Userland Benchmarks

Kernel Module Benchmarks

In-situ measurement