Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: debug: Introduce debug option to test DSP ops #4726

Closed

Conversation

ranj063
Copy link
Collaborator

@ranj063 ranj063 commented Dec 2, 2023

Introduce a new debug option to allow testing DSP operations such a boot firware, set power state etc. Add new new kconfig to allow this option for developers. For now only 2 ops are supported for booting a new firmware and setting the DSP power state to D3.

To allow passing a new firmware file, modify the signature of the load_firmware op to take the firmware file name as an argument. When the kconfig for DSP op testing is enabled, runtime PM is disabled to allow the DSP state to be solely managed from the userspace and no audio card is registered.

Introduce a new debug option to allow testing DSP operations such a boot
firware, set power state etc. Add new new kconfig to allow this option
for developers. For now only 2 ops are supported for booting a new
firmware and setting the DSP power state to D3.

To allow passing a new firmware file, modify the signature of the
load_firmware op to take the firmware file name as an argument. When the
kconfig for DSP op testing is enabled, runtime PM is disabled to allow
the DSP state to be solely managed from the userspace and no audio card
is registered.

Signed-off-by: Ranjani Sridharan <[email protected]>
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following the general direction if I am honest.

help
This option is used for testing DSP operations such as load/unload FW,
set power state, enable/disable core etc. from userspace.
Say Y if you want to retain DSP context for FW exceptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that line looks like a copy-paste from above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention that this will disable the audio support completely

@@ -472,6 +472,7 @@ static int sof_probe_continue(struct snd_sof_dev *sdev)
/* hereafter all FW boot flows are for PM reasons */
sdev->first_boot = false;

#if !IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_DSP_OPS_TEST)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use if (!IS_ENABLED) instead of #if


/*
* test firmware boot by passing the firmware file as the argument:
* ex: echo boot_firmware,intel/avs/tgl/community/dsp_basefw.bin > dsp_test_op
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why combine the two parts?

echo intel/avs/tgl/community/dsp_basefw.bin > filename
echo 1 > boot

return ret;

/* resume DMA trace */
return sof_fw_trace_resume(sdev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like code duplication, I don't follow what it's supposed to do

Do we really want to do this or run something like the python stuff on top of a simplified SOF driver.

We have to be really careful not to open a pandora box of new stuff that we'll have to maintain forever twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart the idea is to be able to run the python scripts on Linux with minimal effort by using the debugfs option. This is in essence a simplified SOF driver isnt it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this basically an alternative to https://github.com/thesofproject/sof-diagnostic-driver that don't have to load/unload?

Isn't the sof-diagnostic-driver approach more generic/flexible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the question is really "what is the partitioning between scripts and driver"? Do we want to reuse snd_sof_run_firmware and hard-code it in a sequence used by a script, or do we want the scripts to do something that runs the firmware?
If you really want to use scripts, then the driver part should be rather minimal and probably a pass-through to access registers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but @plbossart if we want to write scripts to do all the register reads/writes, it would essentially be rewriting the SOF driver. Isn't this a saner approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but @plbossart if we want to write scripts to do all the register reads/writes, it would essentially be rewriting the SOF driver. Isn't this a saner approach?

the question is whether
a) you want to reuse the existing Python scripts used by FW validation or
b) you want to develop NEW python scripts based on a TBD API

It's not clear from this PR what the direction is.
a) is complicated because I don't know what the interface requirements are, but
b) doesn't seem very appealing to me since it'd be a 3rd way of interacting with the DSP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb this isnt a replacement for the diagnostic driver. This is a way to enable us to run the python scripts for FW testing on Linux with minimal effort.

What other purpose does the sof-diagnostic-driver have?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, it depends which python script we are talking about... One set I have seen is implementing IPC message crafting, sending (yep, with register poking and then polling for completion).
It might be hard to get it to download the firmware because of the DMA configuring, that is true.

Copy link
Member

@lgirdwood lgirdwood Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart @ujfalusi @andyross @marc-hb this PR is NOT about IPC ABI validation but a feature to support debug and compliance/stress testing of kernel + SOF + Zephyr functional IP programming flows with userspace scripting (not userspace IP programming).

We need to test the kernel functional IP code alongside the associated Zephyr IP code together (with the same timing and sequencing) which is not possible using Python to flip bits via mmap(). We already have a Python mmap() CI tool today that verifies IPC ABI compliance only. i.e. not useful to certify IP programming compliance.

Yes we can do a lot of test with aplay/amixer, but these tools are end user applications not designed to strongly validate/stress the kernel + SOF + Zephyr combination.

e.g. we need to be able to stress (yes, low hanging fruit will come first)

  1. D3 -> D0 and vice versa
  2. DSP core off/on
  3. S0ix entry/exit
  4. FW boot
  5. IMR save/restore
  6. topology load/unload

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will assert that you can do all of the above with sof-test. Just run thousands of cycles of suspend-resume, module load/unload, etc.

int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev);
int snd_sof_load_firmware_memcpy(struct snd_sof_dev *sdev);
int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev, const char *fw_filename);
int snd_sof_load_firmware_memcpy(struct snd_sof_dev *sdev, const char *fw_filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this go to a separate patch?

@marc-hb
Copy link
Collaborator

marc-hb commented Dec 6, 2023

https://github.com/zephyrproject-rtos/zephyr/ already has a pure-Python driver from @andyross for testing purposes:

zephyrproject-rtos/zephyr@9073db4

Newer, much simpler firmware load tool

This is a smaller firmware loader with fewer dependencies and much
faster operation:

+ No need for an externally built SOF diag_driver, it gets its DMA
  memory in userspace and works with any unmodified kernel (that has
  hugetlbfs anyway)

+ Does not leak kernel memory on failure (diag_driver was basically a
  front end for kmalloc(), and if the script exited early...)

+ Much smaller: 230 lines of python in one file vs. 1600 in nine.

+ Much faster; no needless operations and sleeping steps.  Completes
  load and launches hello_world in 0.2s of real time.

+ Correctly resets the stream state and can actually recover form the
  "wedged DSP" state that the previous loader would sometimes get
  stuck in.

+ Clearer structure, easier to use as a testbed for driver-side
  interaction.

Signed-off-by: Andy Ross <[email protected]>

This was the first, very old commit, newer version is now here:
https://github.com/zephyrproject-rtos/zephyr/tree/main/soc/xtensa/intel_adsp/tools

@andyross
Copy link

andyross commented Dec 6, 2023

@marc-hb To be fair: cavstool isn't PM-aware, so you generally have to unload the kernel driver in order to use it. But it's been very successful as a paradigm. I even banged out a broad equivalent for bringing up Zephyr on the mt8195 DSP (which has a MUCH simpler host control interface, again to be fair).

I don't personally see any reason not to have this in the kernel, it seems like an obvious thing for the driver to provide. I will say though that I'd prefer to see it start from the perspective of "loading arbitrary firmware to the device and getting output" instead of just assuming everything is an audio blob that handles ALSA streams. The SOF unit testing story for DSP code has always been kinda weak, and Zephyr has had to pick up a lot of that, historically.

Copy link
Collaborator

@ujfalusi ujfalusi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, imho if this test mode is activated then we should not boot the 'default' firmware on module load but wait for the firmware path to be loaded and the instruction to boot the DSP with it.

If this mode is selected we should be printing a warning that only firmware testing is allowed via some tbd interface and the kconfig should be more 'alarming' to deter users from selecting it. Currently it is 'why not, it should not do much harm' category.

Note: We sort of have an interface to do simple things via the message injector with the default firmware, we need to prevent rpm to keep the DSP up, but we can send arbitrary messages all while we have the sound card..

help
This option is used for testing DSP operations such as load/unload FW,
set power state, enable/disable core etc. from userspace.
Say Y if you want to retain DSP context for FW exceptions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention that this will disable the audio support completely

return ret;

/* resume DMA trace */
return sof_fw_trace_resume(sdev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, it depends which python script we are talking about... One set I have seen is implementing IPC message crafting, sending (yep, with register poking and then polling for completion).
It might be hard to get it to download the firmware because of the DMA configuring, that is true.

}

/* ops are executed as "op_name,argument1,argument2...". For example, to set the DSP power state
* to D3: echo "load_firmware,<PATH>/sof-tgl.ri" > dsp_test_op"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not separate files for the firmware path, set_power_state, etc?
How one would guess the valid op_names and what they are set at the moment?

dfse->type = SOF_DFSENTRY_TYPE_BUF;
dfse->sdev = sdev;

debugfs_create_file("dsp_test_op", 0222, sdev->debugfs_root, dfse, &sof_dsp_ops_tester_fops);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dsp_test or rather separate files under dsp_test/ directory?

plat_data->fw_filename);
if (!fw_filename)
return -ENOMEM;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, freeing up the passed fw_filename is OK?

@@ -157,7 +159,7 @@ static size_t sof_ipc4_fw_parse_basefw_ext_man(struct snd_sof_dev *sdev)
if (ret)
return ret;
}

#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we skipping this part?

@@ -14,22 +14,23 @@
#include "sof-priv.h"
#include "ops.h"

int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev)
int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev, const char *fw_filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not neglect the snd_sof_load_firmware_memcpy()

@@ -266,6 +266,14 @@ config SND_SOC_SOF_DEBUG_RETAIN_DSP_CONTEXT
Say Y if you want to retain DSP context for FW exceptions.
If unsure, select "N".

config SND_SOC_SOF_DEBUG_DSP_OPS_TEST
bool "SOF enable DSP ops testing"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does constitute a "DSP ops"? Isn't this something like "DSP test mode"? Should we have a sof_debug flag for this to be enabled/disabled instead of re-building the kernel?

@@ -488,6 +489,7 @@ static int sof_probe_continue(struct snd_sof_dev *sdev)
"error: failed to register machine driver %d\n", ret);
goto fw_trace_err;
}
#endif

ret = sof_register_clients(sdev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we loading clients in DSP test mode? They might interfere with the testing, no?

fw_trace_err:
#endif
sof_fw_trace_free(sdev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also enable ftrace? That will send messages, again, might interfere with testing?

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 17, 2024

The ability to make the DSP panic after some time parameter would also be useful to test recovery features like these:

btian1 added a commit to btian1/sof-test that referenced this pull request Jan 17, 2024
The --skip-to-first-trace option was added as a precaution to exclude
log pollution from before the start of a test but how much pollution
it excludes was never assessed; we just don't know.
Hopefully not much? Replace, see below.

mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace
depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"),
mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots,
not on when booting from IMR. Booting from IMR has now been enabled by default
in most configurations. That combination breaks --skip-to-first-trace.

The only practical short-term solution is to turn off --skip-to-first-trace and take
the risk of old logs polluting the test results, which is what this commit does.

If results pollution proves to be a problem, we could in the longer term try to disable
IMR boot just before running this test using sof_debug as discussed in thesofproject#1153
This would also measure from a possibly cleaner DSP state. New, DSP "debug ops"
being proposed in thesofproject/linux#4726 might also help.

Fixes: thesofproject#1153

Signed-off-by: Baofeng Tian <[email protected]>
Signed-off-by: Chao Song <[email protected]>
btian1 added a commit to btian1/sof-test that referenced this pull request Jan 17, 2024
The --skip-to-first-trace option was added as a precaution to exclude
log pollution from before the start of a test, normally the residue traces are
related to trigger stop IPC, dma put and component reset (as you can see).
There could be 10 ~ 20 lines.

mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace
depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"),
mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots,
not on when booting from IMR. Booting from IMR has now been enabled by default
in most configurations. That combination breaks --skip-to-first-trace.

The only practical short-term solution is to turn off --skip-to-first-trace and take
the risk of old logs polluting the test results, which is what this commit does.

If results pollution proves to be a problem, we could in the longer term try to disable
IMR boot just before running this test using sof_debug as discussed in thesofproject#1153
This would also measure from a possibly cleaner DSP state. New, DSP "debug ops"
being proposed in thesofproject/linux#4726 might also help.

Fixes: thesofproject#1153

Signed-off-by: Baofeng Tian <[email protected]>
Signed-off-by: Chao Song <[email protected]>
@ujfalusi
Copy link
Collaborator

The ability to make the DSP panic after some time parameter would also be useful to test recovery features like these:

* [Firmware recovery #1675](https://github.com/thesofproject/linux/issues/1675)

* [ASoC: SOF: ipc4-pcm: Workaround for crashed firmware on system suspend #4780](https://github.com/thesofproject/linux/pull/4780)

It is not a scope of this PR.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 17, 2024

The ability to make the DSP panic

It is not a scope of this PR.

Understood but would it be a useful stepping stone? Looks like one.

@ujfalusi
Copy link
Collaborator

The ability to make the DSP panic

It is not a scope of this PR.

Understood but would it be a useful stepping stone? Looks like one.

To be honest I would rather not do such a thing.

marc-hb pushed a commit to thesofproject/sof-test that referenced this pull request Jan 18, 2024
The --skip-to-first-trace option was added as a precaution to exclude
log pollution from before the start of a test, normally the residue traces are
related to trigger stop IPC, dma put and component reset (as you can see).
There could be 10 ~ 20 lines.

mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace
depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"),
mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots,
not on when booting from IMR. Booting from IMR has now been enabled by default
in most configurations. That combination breaks --skip-to-first-trace.

The only practical short-term solution is to turn off --skip-to-first-trace and take
the risk of old logs polluting the test results, which is what this commit does.

If results pollution proves to be a problem, we could in the longer term try to disable
IMR boot just before running this test using sof_debug as discussed in #1153
This would also measure from a possibly cleaner DSP state. New, DSP "debug ops"
being proposed in thesofproject/linux#4726 might also help.

Fixes: #1153

Signed-off-by: Baofeng Tian <[email protected]>
Signed-off-by: Chao Song <[email protected]>
@plbossart
Copy link
Member

I'll close this for now since it's not an active development. We can always re-open this later.

@plbossart plbossart closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants