-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qubes OS doesn't reboot after S3 suspend (NV41PZ) #1097
Comments
I think I know what the problem is... It might have been observing it on my MSI desktop too. So, after the suspend, the CPU features seem not to be programmed equally on all cores. Then after a reboot, the register with the mismatched features (IA32_FEATURE_CONTROL IIRC), when programmed, causes an exception in coreboot, which halts the boot process. Not sure if this is still relevant, but worth to pay attention to if somebody will start tackling this issue. |
CC: @marmarek |
For IA32_FEATURE_CONTROL I do see a difference: before suspend 0x5, after suspend 0x7. Can this cause That said, my guess is the above difference is not the cause of the crash on reboot. |
Copying from the other issue message collected on MSI:
That |
|
Does it maybe try to disable "Enable VMX in SMX operation." (0x2) when lock bit is already set? |
Values read from that MSR would be on stack just below |
My point is - maybe firmware see 0x2 enabled and try to disable it (even when its already locked) - which can't work. |
Which MSI and which release is that output from? |
Dasharo (coreboot+UEFI) v0.9.1 on PRO Z790-P WIFI (MS-7E06) |
I can't see a code like than in coreboot. UEFI and Xen would most likely work in 64b, which leaves FSP. I wasn't able to find a byte string exactly like that, but there are some similar ones in FSP-M, unfortunately license doesn't allow disassembling so I can't check more. Does anyone know if this can be reproduced outside of Qubes OS/Xen? |
Native Linux seems to set this MSR to 0x5 on resume |
BTW, native Linux on resume complains:
|
Is there an option in Dasharo to enable "Enable VMX in SMX operation" bit too? I can't find it in the menu... SMX should be available on NV41, no? And CPUID on MSI claims it's there too. |
FSP has only one VmxEnable parameter, I have no idea whether this applies to SMX as well. It is possible that it depends on one of the other settings, or some state that differs between cold boot and resume from S3. |
Anyway, firmware writing to the feature control MSR when it's already locked it clearly a bug. And also, enabling (and locking) it during boot but not resume also sounds suspicious. If it's done during boot, it should be also done during resume. |
Everything is just as I wrote in my first comment. Unfortunately FSP and coreboot cannot agree who should initialize the VMX in SMX should not be enabled by coreboot nor FSP on MSI (there is no need to, because chipset does not support TXT). |
Does it mean this issue will can be solved on NV4x specifically by "simply" doing new Dasharo release?
Yeah, but also, it shouldn't try to disable it when the lock bit is set... If |
Maybe, maybe not. Regular boot works well. The S3 resume path is problematic here and should be investigated because, clearly, something is not happening as should, compared to normal boot.
No, it definitely isn't. |
On resume from S3 it is... |
coreboot or FSP won't let you leave IA32_FEATURE_CONTROL unlocked neither on normal boot nor S3 resume, sorry. So if Xen required IA32_FEATURE_CONTROL to be unlocked on S3 resume, but not on normal boot path, then something is wrong. Locked IA32_FEATURE_CONTROL is also a prerequisite for TXT initialization.
Yeah, coreboot is well aware of that and won't attempt to do so: https://github.com/coreboot/coreboot/blob/main/src/cpu/intel/common/common_init.c#L40 However, FSP is not that smart... It always blindly initializes the MSR as if it is the first entity touching it. Here is the code used by FSP to program CPU features: Despite there is a condition to write the feature bit before the lock is set, it is only used to sort the operations in proper order, not to prevent any writes if the lock is already set. So each feature registered calls the appropriate support function and initialize function. Then the macro It is huge PITA. Lost weeks worth of time figuring it out... Then there is also the TME stuff which isn't properly programmed at S3 resume as you already noticed. Another PITA I haven't looked into yet. |
I understand what you say. You talk about theory. I added debug print on resume path in Xen and clearly seen lock bit not set there. I talk about practice. |
Ohh so that's what happen... Interesting... That would mean FSP is not programming it on S3 resume... |
@marmarek CPU feature programming on S3 resume was disabled by default until 4 months ago... That explains what you see (and why FSP didn't program it)... 😮💨 |
Component
Dasharo firmware, EC firmware
Device
NovaCustom NV4x 12th Gen
Dasharo version
v1.7.2
Dasharo Tools Suite version
No response
Test case ID
No response
Brief summary
Qubes OS doesn't reboot after S3 suspend-to-RAM was triggered and needs a forced reboot.
How reproducible
100% reproducible.
How to reproduce
Expected behavior
The laptop reboots.
Actual behavior
The screen remains black and the laptop doesn't restart.
The laptop needs a forced restart by holding the power button until it switches off, then start again by pressing the power button normally.
Screenshots
no-reboot-after-S3.mp4
Additional context
The issue was submitted to Qubes OS as well: QubesOS/qubes-issues#9511
The issue doesn't seem to happen on Ubuntu and Fedora.
Solutions you've tried
Marek tried the
reboot=acpi
GRUB-option without luck.The text was updated successfully, but these errors were encountered: