-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib.sh: perf_analyze(): drop --skip-to-first-trace #1150
Conversation
tools/sof_perf_analyzer.py
Outdated
trace_prev = None | ||
try: | ||
if args.skip_to_first_trace: | ||
trace_prev = skip_to_first_trace(trace_item_gen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not remove the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be precise, there is no new logging system introduced.
With new logging system introduced, SOF logging timestamp
are aligned with kernel, so SOF logging timestamp can't be zero
for following case.
Based on this, skip to first trace are not needed anymore,
hence remove it from current script.
Signed-off-by: Baofeng Tian <[email protected]>
The firmware timestamp is in sync with the host side now,
and becomes much more deterministic. The condition to use
--skip-to-first-trace option in CI is eliminated (see the help
info for the reason it is used in CI test).
Based on this, skip-to-first-trace are not needed anymore,
hence remove it from the script.
Link: https://github.com/thesofproject/sof/pull/8480
Signed-off-by: Baofeng Tian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip to first trace are not needed anymore, hence remove it from current script.
This is NOT the real reason why you're submitting this PR.
You're submitting this PR because a test started FAILING following some logging change.
Describe the test failure and the issue first.
So far it's not obvious why a minor looking timestamp change broke this and why it requires removing code. sof-test must stay usable on older product versions.
Submit a new GitHub issue if the analysis of the issue is too long for a commit message.
045f844
to
7daa1be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared test logs before/after the thesofproject/sof#8480 change (merged on Jan. 8th). Did you?
While the return to zero is gone, the logs from the previous test are STILL THERE and they can still pollute the measurement. That has not changed. Only their timestamp has changed but the content has not changed so they must still be skipped.
Please discuss with @RanderWang to understand the new timestamp behavior and design a solution to skip logs from the previous test that is compatible with both old and new versions.
(BTW you're removing the --skip-to-first-trace
command line option while leaving the function behind. Even if removing the whole thing were right, leaving dead code behind would be sloppy).
7daa1be
to
3947c34
Compare
While the performance logs look different, here are some sample logs in another test:
The timestamp still starts at zero on DSP boot. |
this is not a performance logging, when test some features, DSP will be restarted, and then 0 timestamp will happen. I am changing the patch to only remove this option from lib.sh, and still keep the functionality in python script |
3947c34
to
78c0eeb
Compare
Before DSP boot complete, DSP uses its default timestamp. After it sends boot_ready msg to host, host will send ipc msg to sync time stamp and then DSP will use the same time stamp, in this case the timestamp jumps to 5504.678680 |
"boot" and "power on" mean the same thing in this context. Important questions:
That's a bit better but it's still not good enough because we still don't have a complete and clear explanation of what changed and how. As usual, let's not fix a problem that we don't really understand yet (i.e. "Programming by coincidence"). Please analyze, understand and share a complete explanation of the problem first. Only then we can look at a fix. |
because before Rander's change, each test case will re-start timestamp with 0, after Rander's change, only DSP restart can make timestamp to 0.
check this link, we also observed this before Rander's change, it maybe a SOF bug.
Keqiao did not file bug, just informed Chao and me to fix, the reason I think in commit message already explained and this PR review comments also explained.
@marc-hb ,this is the right way to go, performance test is failing and if we can't provide a fix on time, a lot of people will feel upset for the reason, especially for SQE, if this failing blocks milestone by SQE, I would feel more un-comfortable. if you still have questions, you can offline ping me, you can also went through the scirpt, since you are script expert. |
I think I found it's a bit more complicated, see my latest comment #1153
You're submitting a change in the way timestamps are handled, so of course we need to understand what is happening with timestamps.
Anyone can file a bug. If only one person could file but then the project would obviously not scale.
I can see you tried but a lot of questions had no answer or the wrong answer. But I think we're making some progress.
Maybe it is but if we don't know why it is then we don't know if it is. Stop "programming by coincidence".
Maybe you can just remove the testing and just print "PASS"? This could make everyone "happy and comfortable". We've been quite "successful" with this approach: https://github.com/thesofproject/sof-test/issues?q=+label%3A%22False+Pass+%2F+green+failure%22+ |
I've spent time looking at logs, source code, changes and discussing with @aiChaoSONG (thanks!) in #1153 Please fix the commit message with a summary similar to this and we should be good. Feel free to rephrase in your own words.
|
Actually, we know a little bit. Please replace this with words from @aiChaoSONG in #1153 (comment) |
78c0eeb
to
b928a2a
Compare
b928a2a
to
1ab4988
Compare
The --skip-to-first-trace option was added as a precaution to exclude log pollution from before the start of a test, normally the residue traces are related to trigger stop IPC, dma put and component reset (as you can see). There could be 10 ~ 20 lines. mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"), mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots, not on when booting from IMR. Booting from IMR has now been enabled by default in most configurations. That combination breaks --skip-to-first-trace. The only practical short-term solution is to turn off --skip-to-first-trace and take the risk of old logs polluting the test results, which is what this commit does. If results pollution proves to be a problem, we could in the longer term try to disable IMR boot just before running this test using sof_debug as discussed in thesofproject#1153 This would also measure from a possibly cleaner DSP state. New, DSP "debug ops" being proposed in thesofproject/linux#4726 might also help. Fixes: thesofproject#1153 Signed-off-by: Baofeng Tian <[email protected]> Signed-off-by: Chao Song <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
I checked all the failures, none is related. |
(very different from the initial description)
The --skip-to-first-trace option was added as a precaution to exclude
log pollution from before the start of a test, normally the residue traces are
related to trigger stop IPC, dma put and component reset (as you can see).
There could be 10 ~ 20 lines.
mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace
depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"),
mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots,
not on when booting from IMR. Booting from IMR has now been enabled by default
in most configurations. That combination breaks --skip-to-first-trace.
The only practical short-term solution is to turn off --skip-to-first-trace and take
the risk of old logs polluting the test results, which is what this commit does.
If results pollution proves to be a problem, we could in the longer term try to disable
IMR boot just before running this test using sof_debug as discussed in #1153
This would also measure from a possibly cleaner DSP state. New, DSP "debug ops"
being proposed in thesofproject/linux#4726 might also help.