Skip to content

Latest commit

 

History

History
110 lines (85 loc) · 5.48 KB

README.adoc

File metadata and controls

110 lines (85 loc) · 5.48 KB

OPTE Benchmarks

OPTE maintains two sets of benchmarks: userland microbenchmarks, and kernel module benchmarks. Userland benchmarks can be run on most development machines, while the kernel module benchmarks will require a full Helios install and additional lab setup depending on what benchmarks you want to run.

Benchmark outputs are located in opte/target/criterion, and any flamegraphs built during kmod benchmarks are placed into opte/target/xde-bench.

Userland Benchmarks

We use criterion to measure and profile individual packet processing times for slow-/fast-path traffic as well as generated hairpin packets.

These can be called using cargo ubench, or cargo bench --package opte-bench --bench userland — <options>. This benchmark runner uses the standard criterion CLI. To see a clean list of available benchmarks, use the cargo ubench --list 2> /dev/null | sort | uniq command.

Benchmarks are split into several categories:

  • Metric: wallclock, alloc_ct, alloc_sz.

  • Action: parse, process.

  • Packet family.

Kernel Module Benchmarks

The kernel module benchmarks can be called using cargo kbench, or cargo bench --package opte-bench --bench xde — <options>. They require that:

  • you are running on an up-to-date Helios instance.

  • the XDE kernel module and opteadm are installed, either via IPS or the cargo xtask install command.

  • you have installed the IPS packages flamegraph, demangle, iperf and sparse.

They implement zont-to-zone iperf traffic in two scenarios:

  • cargo kbench local on one machine. This uses an identical test setup to xde-tests/loopback. Two sparse zones will be created on the current machine, with simnet links being used as an underlay network. This is lower fidelity than the below two-node setup.

  • cargo kbench server and cargo kbench remote <SERVER_IP> on two separate machines. One zone will be created on each machine (running an iperf server and client respectively), using the shared lab/home network to exchange link local addresses.

Below you can find a lab setup which suffices for the second option. Currently, linklocals must be created with the name syntax <nic>/ll: this can be done using, e.g., pfexec ipadm create-addr igb0/ll -T addrconf. The benchmark defaults to using the NICs igb0 and igb1, and can be overridden to match your setup using the --underlay-nics option. E.g., when testing over a Chelsio NIC --underlay-nics cxgbe0 cxgbe1 will select these devices and use the link-local addresses cxgbe0/ll and cxgbe1/ll. Additionally, MTUs should be set to 9000 for physical underlay links.

fe80::a236:9fff:fe0c:2586            fe80::a236:9fff:fe0c:25b6
fe80::a236:9fff:fe0c:2587            fe80::a236:9fff:fe0c:25b7
            ┌─────────────────────────────────────┐
            │                                     │
            │         ┌─────────────────┐         │
            │         │                 │         │
       igb0┌┴┐       ┌┴┐igb1       igb1┌┴┐       ┌┴┐igb0
         ╔═╩═╩═══════╩═╩═╗           ╔═╩═╩═══════╩═╩═╗
         ║ cargo kbench  ║░          ║ cargo kbench  ║░
         ║    remote     ║░          ║    server     ║░
         ║ 10.0.125.173  ║░          ║               ║░
         ╚══════╦═╦══════╝░          ╚══════╦═╦══════╝░
          ░░░░░░░│░░░░░░░░░           ░░░░░░░│░░░░░░░░░
  10.0.147.187/8                               10.0.125.173/8
                 │      ┌ ─ ─ ─ ─ ─ ┐        │
                          Lab/Home
                 └ ─ ─ ▶│  Network  │◀ ─ ─ ─ ┘
                         ─ ─ ─ ─ ─ ─

Connecting igb0<→igb0, etc., is not a requirement, as NDP tables are inspected for inserting underlay network routes.

In both scenarios, the benchmark harness will run iperf in client-to-server and server-to-client modes, and will record periodic stack information and timings using dtrace. These are converted into flamegraphs and timing data for further analysis by criterion.

In-situ measurement

The kernel module benchmark harness can be moved onto a gimlet or other development system for measurement. The path to the binary can be found using the command:

cargo bench --package opte-bench \
  --no-run --message-format json-render-diagnostics \
  | jq -r -s "map( \
      select(.reason==\"compiler-artifact\") \
      | select( \
          .target.kind\
          | map_values(.==\"bench\") \
          | any \
      ) \
      | select(.target.name==\"xde\") \
  ) | map(.executable)"

Once the binary is moved onto the global zone of a target machine, measurements can be taken using xde in-situ. On a gimlet we add the -d flag as we do not have access to flamegraph. This places captured stacks into the xde-bench folder.

$ ./xde in-situ expt-name -d
# ...
exit

$ ls -R xde-bench
xde-bench:
expt-name

xde-bench/expt-name:
histos.out  raw.stacks

Measured data in xde-bench can be moved and processed into flamegraphs and histograms on any development machine using the command ./xde in-situ expt-name -c none.