You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a success story for Nix because in 2022 I ran our test and benchmark regression suite in a software universe from ~2016. This is valuable because we can ensure backwards compatibility with older systems in new Snabb versions, and differentiate regressions caused by newer versions of external dependencies.
The world hasn’t stood still however and I think its time to update snabblab-nixos to fit the Snabb community’s current needs. Some things I have in mind:
Focus on semi-manual build/test/benchmark-matrix/report-generation runs
We do not have a Hydra instance currently, and I am not too keen on administering one. I could imagine that we do not really need a Hydra instance and we can get by sharing a handful of self-contained NixOS lab machines between Snabb developers and use those to run benchmark matrices and tests locally.
Replace murren and lugano server classes with some new gear
It would be nice to set up new lab machines and adapt snabblab-nixos accordingly.
Have to think about which hardware we can realistically get in them. Of the top of my head it would be nice to have the following NICs for driver development and testing:
Mellanox Connect X 100G
Intel ??? 100G (can we get a Intel 100G NIC to test if the AVF driver supports it well?)
Intel 58299 10G
Intel X(..)710 10-40G
Intel i350 1G
It would also be useful to have at least two different systems
Recent Intel Xeon
Recent AMD EPYC
Throw out obsolete benchmarks
Do we still need to test with many versions of DPDK and QEMU versions? I feel like neither lwAFTR nor Snabb applications at SWITCH make use of vhost_user. Unless someone expresses immediate interest in actively developing the vhost_user driver I could image retiring the whole DPDK / QEMU / SnabbNFV test complex. It makes up a fairly big part of current snabblab-nixos complexity and does not match (as far as I can tell) the applications we are currently working on.
I would like to consider dropping the DPDK / QEMU matrix, and scale back SnabbNFV testing to a software-loopback setup. I think to keep running tests of the SnabbNFV code to spot JIT regressions is low effort and high value. Running DPDK / QEMU compatibility tests is fairly involved and maybe we can get away with a QEMU test between Snabb’s vhost_user and virtio apps for the time being. Once we restart actively hacking on vhost_user we can step up tests with QEMU and DPDK again.
Add new benchmarks
I would like to add new benchmarks that track performance of applications like lwAFTR and ipfix (once its upsteamed). This way upstream testing serves its users and helps them communicate tradeoffs.
I think those benchmarks should be software-only, i.e. not depend on specific NICs. I would like to test NIC driver performance separately. I think we can do this because we are confident enough that performance regressions in one app do not “bleed over” into another app. With Snabb’s architecture, RaptorJIT, and per-app JIT zones we have sufficient isolation between driver and apps to make this feasible. So I suggest to push out full-stack integration testing to the individual applications, and keep software benchmarks and driver benchmarks upstream.
I also want to add benchmarks for individual NIC drivers. I think this was less necessary in the 10G age because we easily got line rate with a single queue and any hardware feature enabled if the driver was not majorly borked. I.e., those benchmarks would have been fairly boring.
With 100G NICs however we see much more nuanced performance profiles and have tighter cycle budgets as well. It seems like the tech might be new enough that benchmarks for those drivers are still “not boring”. I’ve started one such benchmark for our Connect-X driver in #1469, where we already observed big differences between CPU vendors and driver configurations.
The text was updated successfully, but these errors were encountered:
With a few bug fixes I was able to run snabblab-nixos and use it to do QA for the Octarina release on one of the lugano servers. The branch with the changes I made lives here, currently: https://github.com/eugeneia/snabblab-nixos/commits/fix2022
This is a success story for Nix because in 2022 I ran our test and benchmark regression suite in a software universe from ~2016. This is valuable because we can ensure backwards compatibility with older systems in new Snabb versions, and differentiate regressions caused by newer versions of external dependencies.
The world hasn’t stood still however and I think its time to update snabblab-nixos to fit the Snabb community’s current needs. Some things I have in mind:
We do not have a Hydra instance currently, and I am not too keen on administering one. I could imagine that we do not really need a Hydra instance and we can get by sharing a handful of self-contained NixOS lab machines between Snabb developers and use those to run benchmark matrices and tests locally.
murren
andlugano
server classes with some new gearIt would be nice to set up new lab machines and adapt snabblab-nixos accordingly.
Have to think about which hardware we can realistically get in them. Of the top of my head it would be nice to have the following NICs for driver development and testing:
It would also be useful to have at least two different systems
Do we still need to test with many versions of DPDK and QEMU versions? I feel like neither lwAFTR nor Snabb applications at SWITCH make use of vhost_user. Unless someone expresses immediate interest in actively developing the vhost_user driver I could image retiring the whole DPDK / QEMU / SnabbNFV test complex. It makes up a fairly big part of current snabblab-nixos complexity and does not match (as far as I can tell) the applications we are currently working on.
I would like to consider dropping the DPDK / QEMU matrix, and scale back SnabbNFV testing to a software-loopback setup. I think to keep running tests of the SnabbNFV code to spot JIT regressions is low effort and high value. Running DPDK / QEMU compatibility tests is fairly involved and maybe we can get away with a QEMU test between Snabb’s vhost_user and virtio apps for the time being. Once we restart actively hacking on vhost_user we can step up tests with QEMU and DPDK again.
I would like to add new benchmarks that track performance of applications like lwAFTR and ipfix (once its upsteamed). This way upstream testing serves its users and helps them communicate tradeoffs.
I think those benchmarks should be software-only, i.e. not depend on specific NICs. I would like to test NIC driver performance separately. I think we can do this because we are confident enough that performance regressions in one app do not “bleed over” into another app. With Snabb’s architecture, RaptorJIT, and per-app JIT zones we have sufficient isolation between driver and apps to make this feasible. So I suggest to push out full-stack integration testing to the individual applications, and keep software benchmarks and driver benchmarks upstream.
I also want to add benchmarks for individual NIC drivers. I think this was less necessary in the 10G age because we easily got line rate with a single queue and any hardware feature enabled if the driver was not majorly borked. I.e., those benchmarks would have been fairly boring.
With 100G NICs however we see much more nuanced performance profiles and have tighter cycle budgets as well. It seems like the tech might be new enough that benchmarks for those drivers are still “not boring”. I’ve started one such benchmark for our Connect-X driver in #1469, where we already observed big differences between CPU vendors and driver configurations.
The text was updated successfully, but these errors were encountered: