-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] performance test is failing following timestamp sync change sof#8480 #1153
Comments
passed: https://sof-ci.ostc.intel.com/#/result/planresultdetail/36559 Current performance testing are requesting skip trace until find trace timestamp between [0, 1] as the start So once Rander aligned kernel and DSP timestamp, DSP timestamp only restart(from 0) on DSP power off/on, this means with current option, until mtrace logging end, there may no timestamp between [0,1], so raise this exception and cause performance test failed. Current solution is remove this option and python script will directly try to find logs include "comp_copy" as valid module performance logging, then collect and calculate its performance data. This is also working with previous mtrace logs, no other negative impact observed so far. |
PR submitted for review and discuss: |
Thanks for collecting logs. Next time please avoid screenshots: https://docs.zephyrproject.org/latest/develop/getting_started/index.html#use-copy-paste |
Thanks for collecting these logs and describing the change, this is great progress.
The initial timestamp sync message ( I investigated this a bit and I think I found the key fact and key difference that you have been constantly missing so far: booting from IMR.
Very interesting, so you're saying that the |
The several lines of trace at the beginning in a test belongs to the previous test, that's why we(or maybe I) want to skip them. Before Rander's patch, there's a clear division you can tell start from where, the traces belong to current test, but after Rander's patch, you can't. Line 969 in bc3ced5
|
Thanks @aiChaoSONG , this is useful. So you added the I had a look at these but I couldn't tell:
I think I finally found why. Too bad I had to analyze this myself. |
Actually, we could/should probably use something like this This would help not just with the logs, it would also ensure the FW itself is measured in a "clean" state. I mean in the future, we obviously need a shorter-term solution now. |
@marc-hb If test case is run by developer manually, before Rander's patch, timestamp is always start from 0, so with/without this option won't make any difference. But after Rander's patch, it will fail because the timestamp may not start from 0, and due to the mechanism we use to skip trace, no trace will left for analysis in the end. So removing the use of |
That's the impression I got, thanks for confirming. But the main question is still: how much pollution will this typically cause? If too much / too often, is there some trick we could use to reduce it? Another option could be to disable IMR boot when testing performance, I think it shows the zero timestamp again for a few lines. Any idea how hard disabling IMR would that be? |
Not that much. Normally the residue traces are related to trigger stop IPC, dma put and component reset (as you can see). There could be 10 ~ 20 lines, but they won't affect performance measurement, because we focus on trace lines that are within the component log context and printed by comp_copy function in performance measurement. So, even with the traces from previous test, it is all good.
We have a kernel option sof_debug, one of its bit is used to control boot from IMR or not. But we have to set it only for performance test in deployment, which is more complex. I think remove the use of --skip-to-first-trace is still the fastest and simplest way to workaround the current dilemma with barely negative impact. |
The --skip-to-first-trace option was added as a precaution to exclude log pollution from before the start of a test but how much pollution it excludes was never assessed; we just don't know. Hopefully not much? Replace, see below. mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"), mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots, not on when booting from IMR. Booting from IMR has now been enabled by default in most configurations. That combination breaks --skip-to-first-trace. The only practical short-term solution is to turn off --skip-to-first-trace and take the risk of old logs polluting the test results, which is what this commit does. If results pollution proves to be a problem, we could in the longer term try to disable IMR boot just before running this test using sof_debug as discussed in thesofproject#1153 This would also measure from a possibly cleaner DSP state. New, DSP "debug ops" being proposed in thesofproject/linux#4726 might also help. Fixes: thesofproject#1153 Signed-off-by: Baofeng Tian <[email protected]> Signed-off-by: Chao Song <[email protected]>
The --skip-to-first-trace option was added as a precaution to exclude log pollution from before the start of a test, normally the residue traces are related to trigger stop IPC, dma put and component reset (as you can see). There could be 10 ~ 20 lines. mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"), mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots, not on when booting from IMR. Booting from IMR has now been enabled by default in most configurations. That combination breaks --skip-to-first-trace. The only practical short-term solution is to turn off --skip-to-first-trace and take the risk of old logs polluting the test results, which is what this commit does. If results pollution proves to be a problem, we could in the longer term try to disable IMR boot just before running this test using sof_debug as discussed in thesofproject#1153 This would also measure from a possibly cleaner DSP state. New, DSP "debug ops" being proposed in thesofproject/linux#4726 might also help. Fixes: thesofproject#1153 Signed-off-by: Baofeng Tian <[email protected]> Signed-off-by: Chao Song <[email protected]>
https://github.com/orgs/community/discussions/66741 Let's close when the test is passing. |
test has been passing for a while https://sof-ci.ostc.intel.com/#/result/planresultdetail/40027 |
Describe the bug
A clear and concise description of what the bug is.
CI build performance test is failing
To Reproduce
use local mtrace and dmesg file can reproduce as well
Reproduction Rate
100%.
Expected behavior
A clear and concise description of what you expected to happen.
Impact
CI performance build test should pass.
Environment
The text was updated successfully, but these errors were encountered: