-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid recent MTL regression #9314
Conversation
turns out zephyrproject-rtos/zephyr#76046 has already been negatively tested by #9309 |
This reverts commit d629e52. First of the 3 commits, that need to be reverted. Signed-off-by: Guennadi Liakhovetski <[email protected]>
This reverts commit ffce2cb. Second of the 3 commits, that need to be reverted. Signed-off-by: Guennadi Liakhovetski <[email protected]>
…rams" This reverts commit eda6029. Third of the 3 commits, that need to be reverted. Signed-off-by: Guennadi Liakhovetski <[email protected]>
This reverts commit 8847de0. CONFIG_MODULES should work again on MTL. Signed-off-by: Guennadi Liakhovetski <[email protected]>
As long as MTL still boots then fine by me! Note it must be made clear that these reverted commits PASSED CI in #9238! They passed in isolation though, now they are combined with other changes including a big Zephyr update. This complexity and the very complex issue #9268 are the reasons why I didn't try bisecting: I didn't expect it would be that "easy" Not testing LLEXT on MTL is bad but not testing MTL at all (because... it does not boot!) is much worse. There's also a very unusual failure on ADL but it does not seem related https://sof-ci.01.org/sofpr/PR9314/build6597/devicetest/index.html EDIT: I used ssh to check the logs and that ADL crashed silently or was power-cycled. The logs stop abruptly. In the same run, a TGL system was rebooted remotely for no obvious reason. Weird. |
Actually both this PR and disabling LLEXT feel like a game of whack-a-mole. Who knows what other, also totally unrelated feature will trigger this crash again. I just verified that disabling IMR resume also avoids this crash: Unlike disabling LLEXT or "device posture", that would seem like an actually reliable way to temporarily avoid this crash: something that would let it randomly re-appear again. |
Superseded by revert: |
Yesterday a recent regression on MTL, documented in #9308 was addressed in #9313 by disabling
CONFIG_MODULES
on MTL. That approach has multiple drawbacks: (1) it's too broad, even if one wanted to disable using LLEXT modules on MTL, it would have been enough to make the only module component DRC built-in again. (2) PR #9116 that was a suspect in this regression passed daily tests after merging and PR testing on the next day after its merging were passing too, (3) LLEXT has a high cost of maintenance without CI testing, so removing it from the CI returns it to the state where it has to be checked manually for regressions. This PR presents an alternative "fix" for the problem - reverting latest commits which triggered the breakage as identified by bisection.