-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xen-related performance problems #7404
Xen-related performance problems #7404
Comments
I can confirm that this problem is clearly noticeable. My Fedora AppVM with up to 8 GB of memory and all cores assigned to it, runs much slower than a native Fedora install on bare metal. Compiling code takes roughly twice a long. I first blamed the lack of hyperthreading, but passing
Watching 1080p@60fps YouTube videos on Qubes OS boils my laptop and doesn't feel like 60fps. Native Fedora handles this much better, despite using only CPU-based video decoders[1]. Startup latency of apps on Qubes OS is much worse. [1] I'm not 100% sure it's CPU-only. But according to htop it hits all my cores really hard, while still being smoother and much cooler than Qubes OS. |
This is definitely a problem and we’re working on it. It won’t ever be as fast as on bare hardware, but our goal is to go from
I suggest reverting this, as it is not security supported upstream. The fact that it did not help your benchmarks indicates that it is not likely to be the culprit.
Yeah that’s not good. For clarification: if native Fedora takes time X to build C++ projects, does this mean Qubes OS takes (X / (1 - 0.58)) time? If you could post the raw benchmark data, that would be very helpful.
|
I can't exactly reproduce this. More information on the workload is needed I think. In my test, I used Linpack Xtreme. On Qubes, I have SMT enabled, 12 vCPUs in my case and 12GB of RAM assigned to the VM (although linpack only uses about 2GB). The VM is PVH with memory balancing enabled. Everything else default for Xen and kernel-latest. Benchmark VM is the only VM running while testing. CPU frequency started at about 3.5GHz and went down to ~2.1GHz over duration of test. My result was 100 GFlops. Then I started a fedora 36 iso. However, CPU frequency started at 3.9GHz and went down to about 2.4 or 2.5. My result was 112 GFlops. Perhaps your CPU is not boosting correctly like mine does? |
I turned off memory balancing via Qubes Manager and assigned fixed 10 GB of memory to the Qube. No improvement 😕
Yes! I don't have the raw benchmark data anymore, but I'm pretty sure it's the same problem that causes the sysbench deviations. |
The result of sysbench inside an AppVM heavily dependends on how many other AppVMs are running. Running only one single AppVM brings the results from I tried running the memory benchmark in dom0, where it is significantly faster. Here the results of
I'm not sure if there is anything special about my system. My entire xen and kernel setup is pretty vanilla. Only deviation is |
@AlxHnr Can you try |
Is the domU a PVH (the default on qubes, if no PCI devices)? What CPU that is? |
Yes. All my domU's are PVH (default), except sys-usb and sys-net.
Giving the PV VM more cores and memory makes no difference. PV VMs are slow and laggy to the point of being unusable.
i7-3632qm. It supports VT-d, but my motherboard/BIOS/whatever does not. I hope this problem is not specific to my setup. My goal here is to get to a point where others can reproduce these problems. I don't have much time and care less about temporary fixes for myself. I care more about achieving sane defaults that work for everybody. |
I'm seeing about an 8x slowdown in sysbench memory run on a domU PVH vs. dom0 on my ancient quad Sandy Bridge. B |
dom0 is itself a PV VM, so that is strange. @andyhhp: Do you what could cause such a huge difference between PV dom0 and PV domU? Are super pages only allowed to be used by dom0?
Me too. |
PV guests cannot use superpages at all. dom0 doesn't get them either.
Numbers this bad are usually PV-L1TF and IvyBridge is affected, but Qubes has SHADOW compiled out so it's not that. Do you have |
Makes sense, I see that superpage support on PV got ripped out in 2017. Not surprising in retrospect, considering that at least two of the fatal flaws in PV were due to it.
@AlxHnr Can you provide |
Just as an aside, under R4.0 on Sandy Bridge xl dmesg says: PV L1TF shadowing: Dom0 disabled, DomU enabled Just checked R4.1 on i7-8850H and same result. B |
That’s normal. The L1TF mitigation code enables shadow paging if the hypervisor was built with that, or calls |
@fepitre can you provide an |
i7-8750H here, about the same result. xl dmesg |
Thanks! Would you mind posting sysbench results? |
Here the important bits from `xl dmesg`
|
Hmm - sadly nothing helpful there. Not terribly surprising as it's a release hypervisor, but that's no guarantee that a debug Xen would be any more helpful. As an unrelated observation, @marmarek you can work around:
by backporting xen-project/xen@e44d986084760 and xen-project/xen@e5046fc6e99db which will silence the spurious warning. |
I've tried Xen with
|
When calling |
It surprises me too. @andyhhp do you have suggestions for debugging this? Is there a way to get stats on TLB misses? I wonder if CPU pinning would help. |
[Summary: sysbench's event timing interacts poorly with the high-overhead xen clocksource in PV and some PVH VMs.] I think we may be seeing a mirage, or rather, a side effect of other system calls being made in parallel with the memory ones. I played around a bit with strace -f sysbench... ...and noticed that under domU PV but not under dom0 PV, I saw an additional 75K lines in systrace output with this pattern:
After some additional experimenting and googling, I found that I can get "terrible sysbench results" from PV dom0 by performing the following (as root):
And I can then "restore good sysbench results" from PV dom0 by performing the following (as root):
Here's where it gets even stranger (caveat: testing on two different pieces of hardware) Under R4.0 (Xen 4.8), PVH domU uses "xen" as the clocksource by default but it does not have as severe as an impact, with performance closer to dom0. Even more fun: To reiterate: I don't think this is a memory performance problem. B |
It is disabled by default in Qubes OS, but can be enabled with some grub parameters mentioned earlier in this discussion. I've tried them with no impact on performance. |
At least no impact that I've noticed. But when my benchmark VM is the only VM running, enabling SMT gives me a reliable extra ~13%. This is consistent across multiple reboots and test runs, but far away from the 2x I hoped for. (Tested with I also found a way to maintain the same performance, even with running a lot of other VMs. This involved tweaking scheduler weights and rate limits. But as my benchmarks get faster, everything else becomes less responsive. Even moving the mouse is stuttering hard. Dom0's CPU graph goes up to 50% during those runs, even if it is not doing anything. I assume this is just the overhead of virtualization. Still, the fastest runs I could achieve took 25% longer than native Fedora. |
@AlxHnr How does performance under Qubes compare to that in a KVM VM? |
I just tested on Fedora 35 (host) and used gnome-boxes to spin up another Fedora 35 (guest) live ISO image. Out of the box, my benchmarks have shown the same identical 25% slowdown that I saw with my fastest Xen runs. But with the difference that the host was still perfectly usable and responsive. I then started 4 additional VMs from the same live ISO image and only saw a negligible slowdown of <1%. |
@AlxHnr Is this with core scheduling on the KVM host? For a fairer comparison you might want to turn off SMT in your firmware, or turn it off in both KVM and Xen. |
I don't know if it uses core scheduling by default. But disabling SMT only made my tests take 16% longer, both on the host and the KVM guest. The latter one is only slightly behind my best SMT xen run, while the former still beats it. |
Can you share the specific tweaks you used? |
I'm getting my best results with these values. Commands for benchmarking in my # Dom0 + 7 other VMs are idling in the background.
# Grub setting: smt=on sched-gran=core
xl sched-credit2 -d development -w 65535 # Value impacts balance of VM performance vs host responsiveness
xl sched-credit2 -s -r 0
xl sched-credit2 | grep -vE 'Name|Domain|development|pool' | grep -oE '^\S+' | xargs -I {} xl sched-credit2 -d {} -c 0 I also tried decreasing the weight of all other VM's instead of setting development to |
Automated announcement from builder-github The component
|
Automated announcement from builder-github The component
Or update dom0 via Qubes Manager. |
I've abridged workarounds that have been mentioned in this thread. They are listed in the order from the (supposedly) least invasive to the most invasive:
Let me know if I missed anything. Thank you all! |
Currently, libxl neither pauses nor suspends a stubdomain when suspending the domain it serves. Qubes OS has an out-of-tree patch that just pauses the stubdomain, but that is also insufficient: sys-net (an HVM with an attached PCI device) does not properly resume from suspend on some systems, and the stubdomain considers the TSC clocksource to be unstable after resume. This patch properly suspends the stubdomain. Doing so requires creating a nested libxl__domain_suspend_state structure and freeing it when necessary. Additionally, a new callback function is added that runs when the stubdomain has been suspended. libxl__qmp_suspend_save() is called by this new callback. Saving the state doesn't work on Qubes for two reasons: - save/restore consoles are not enabled (as requiring qemu in dom0) - avoid using QMP Link: QubesOS/qubes-issues#7404 Co-authored-by: Marek Marczykowski-Górecki <[email protected]> Signed-off-by: Marek Marczykowski-Górecki <[email protected]> Signed-off-by: Demi Marie Obenour <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
Certainly not fixed: Xen still shatters super pages a lot, so there will be a lot of TLB misses. The only fix I know of is to change Xen to reliably not shatter super pages so that the second-stage TLB is mostly 2M pages. |
How to file a helpful issue
Qubes OS release
R4.1
Brief summary
The Xen hypervisor has performance problems on certain compute-intensive workloads
Steps to reproduce
See @fepitre for details
Expected behavior
Same (or almost same) performance as bare hardware
Actual behavior
Less performance than bare hardware
The text was updated successfully, but these errors were encountered: