ASoC: SOF: debug: Introduce debug option to test DSP ops #4726

ranj063 · 2023-12-02T00:22:53Z

Introduce a new debug option to allow testing DSP operations such a boot firware, set power state etc. Add new new kconfig to allow this option for developers. For now only 2 ops are supported for booting a new firmware and setting the DSP power state to D3.

To allow passing a new firmware file, modify the signature of the load_firmware op to take the firmware file name as an argument. When the kconfig for DSP op testing is enabled, runtime PM is disabled to allow the DSP state to be solely managed from the userspace and no audio card is registered.

Introduce a new debug option to allow testing DSP operations such a boot firware, set power state etc. Add new new kconfig to allow this option for developers. For now only 2 ops are supported for booting a new firmware and setting the DSP power state to D3. To allow passing a new firmware file, modify the signature of the load_firmware op to take the firmware file name as an argument. When the kconfig for DSP op testing is enabled, runtime PM is disabled to allow the DSP state to be solely managed from the userspace and no audio card is registered. Signed-off-by: Ranjani Sridharan <[email protected]>

plbossart

Not following the general direction if I am honest.

plbossart · 2023-12-02T00:38:42Z

sound/soc/sof/Kconfig

+	help
+	  This option is used for testing DSP operations such as load/unload FW,
+	  set power state, enable/disable core etc. from userspace.
+	  Say Y if you want to retain DSP context for FW exceptions.


that line looks like a copy-paste from above?

I would mention that this will disable the audio support completely

plbossart · 2023-12-02T00:40:13Z

sound/soc/sof/core.c

@@ -472,6 +472,7 @@ static int sof_probe_continue(struct snd_sof_dev *sdev)
 	/* hereafter all FW boot flows are for PM reasons */
 	sdev->first_boot = false;

+#if !IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_DSP_OPS_TEST)


can we use if (!IS_ENABLED) instead of #if

plbossart · 2023-12-02T00:42:10Z

sound/soc/sof/debug-dsp-ops.c

+
+/*
+ * test firmware boot by passing the firmware file as the argument:
+ * ex: echo boot_firmware,intel/avs/tgl/community/dsp_basefw.bin > dsp_test_op


why combine the two parts?

echo intel/avs/tgl/community/dsp_basefw.bin > filename
echo 1 > boot

plbossart · 2023-12-02T00:43:00Z

sound/soc/sof/debug-dsp-ops.c

+		return ret;
+
+	/* resume DMA trace */
+	return sof_fw_trace_resume(sdev);


this looks like code duplication, I don't follow what it's supposed to do

Do we really want to do this or run something like the python stuff on top of a simplified SOF driver.

We have to be really careful not to open a pandora box of new stuff that we'll have to maintain forever twice.

@plbossart the idea is to be able to run the python scripts on Linux with minimal effort by using the debugfs option. This is in essence a simplified SOF driver isnt it?

So is this basically an alternative to https://github.com/thesofproject/sof-diagnostic-driver that don't have to load/unload?

Isn't the sof-diagnostic-driver approach more generic/flexible?

Well, the question is really "what is the partitioning between scripts and driver"? Do we want to reuse snd_sof_run_firmware and hard-code it in a sequence used by a script, or do we want the scripts to do something that runs the firmware?
If you really want to use scripts, then the driver part should be rather minimal and probably a pass-through to access registers.

but @plbossart if we want to write scripts to do all the register reads/writes, it would essentially be rewriting the SOF driver. Isn't this a saner approach?

but @plbossart if we want to write scripts to do all the register reads/writes, it would essentially be rewriting the SOF driver. Isn't this a saner approach?

the question is whether
a) you want to reuse the existing Python scripts used by FW validation or
b) you want to develop NEW python scripts based on a TBD API

It's not clear from this PR what the direction is.
a) is complicated because I don't know what the interface requirements are, but
b) doesn't seem very appealing to me since it'd be a 3rd way of interacting with the DSP.

@marc-hb this isnt a replacement for the diagnostic driver. This is a way to enable us to run the python scripts for FW testing on Linux with minimal effort.

What other purpose does the sof-diagnostic-driver have?

@ranj063, it depends which python script we are talking about... One set I have seen is implementing IPC message crafting, sending (yep, with register poking and then polling for completion).
It might be hard to get it to download the firmware because of the DMA configuring, that is true.

@plbossart @ujfalusi @andyross @marc-hb this PR is NOT about IPC ABI validation but a feature to support debug and compliance/stress testing of kernel + SOF + Zephyr functional IP programming flows with userspace scripting (not userspace IP programming).

We need to test the kernel functional IP code alongside the associated Zephyr IP code together (with the same timing and sequencing) which is not possible using Python to flip bits via mmap(). We already have a Python mmap() CI tool today that verifies IPC ABI compliance only. i.e. not useful to certify IP programming compliance.

Yes we can do a lot of test with aplay/amixer, but these tools are end user applications not designed to strongly validate/stress the kernel + SOF + Zephyr combination.

e.g. we need to be able to stress (yes, low hanging fruit will come first)

D3 -> D0 and vice versa

DSP core off/on

S0ix entry/exit

FW boot

IMR save/restore

topology load/unload

I will assert that you can do all of the above with sof-test. Just run thousands of cycles of suspend-resume, module load/unload, etc.

plbossart · 2023-12-02T00:44:03Z

sound/soc/sof/sof-priv.h

-int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev);
-int snd_sof_load_firmware_memcpy(struct snd_sof_dev *sdev);
+int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev, const char *fw_filename);
+int snd_sof_load_firmware_memcpy(struct snd_sof_dev *sdev, const char *fw_filename);


can this go to a separate patch?

marc-hb · 2023-12-06T05:38:31Z

https://github.com/zephyrproject-rtos/zephyr/ already has a pure-Python driver from @andyross for testing purposes:

zephyrproject-rtos/zephyr@9073db4

Newer, much simpler firmware load tool

This is a smaller firmware loader with fewer dependencies and much
faster operation:

+ No need for an externally built SOF diag_driver, it gets its DMA
  memory in userspace and works with any unmodified kernel (that has
  hugetlbfs anyway)

+ Does not leak kernel memory on failure (diag_driver was basically a
  front end for kmalloc(), and if the script exited early...)

+ Much smaller: 230 lines of python in one file vs. 1600 in nine.

+ Much faster; no needless operations and sleeping steps.  Completes
  load and launches hello_world in 0.2s of real time.

+ Correctly resets the stream state and can actually recover form the
  "wedged DSP" state that the previous loader would sometimes get
  stuck in.

+ Clearer structure, easier to use as a testbed for driver-side
  interaction.

Signed-off-by: Andy Ross <[email protected]>

This was the first, very old commit, newer version is now here:
https://github.com/zephyrproject-rtos/zephyr/tree/main/soc/xtensa/intel_adsp/tools

andyross · 2023-12-06T15:37:00Z

@marc-hb To be fair: cavstool isn't PM-aware, so you generally have to unload the kernel driver in order to use it. But it's been very successful as a paradigm. I even banged out a broad equivalent for bringing up Zephyr on the mt8195 DSP (which has a MUCH simpler host control interface, again to be fair).

I don't personally see any reason not to have this in the kernel, it seems like an obvious thing for the driver to provide. I will say though that I'd prefer to see it start from the perspective of "loading arbitrary firmware to the device and getting output" instead of just assuming everything is an audio blob that handles ALSA streams. The SOF unit testing story for DSP code has always been kinda weak, and Zephyr has had to pick up a lot of that, historically.

ujfalusi

@ranj063, imho if this test mode is activated then we should not boot the 'default' firmware on module load but wait for the firmware path to be loaded and the instruction to boot the DSP with it.

If this mode is selected we should be printing a warning that only firmware testing is allowed via some tbd interface and the kconfig should be more 'alarming' to deter users from selecting it. Currently it is 'why not, it should not do much harm' category.

Note: We sort of have an interface to do simple things via the message injector with the default firmware, we need to prevent rpm to keep the DSP up, but we can send arbitrary messages all while we have the sound card..

ujfalusi · 2023-12-07T13:18:10Z

sound/soc/sof/Kconfig

+	help
+	  This option is used for testing DSP operations such as load/unload FW,
+	  set power state, enable/disable core etc. from userspace.
+	  Say Y if you want to retain DSP context for FW exceptions.


I would mention that this will disable the audio support completely

ujfalusi · 2023-12-07T13:27:54Z

sound/soc/sof/debug-dsp-ops.c

+		return ret;
+
+	/* resume DMA trace */
+	return sof_fw_trace_resume(sdev);


@ranj063, it depends which python script we are talking about... One set I have seen is implementing IPC message crafting, sending (yep, with register poking and then polling for completion).
It might be hard to get it to download the firmware because of the DMA configuring, that is true.

ujfalusi · 2023-12-07T13:31:40Z

sound/soc/sof/debug-dsp-ops.c

+}
+
+/* ops are executed as "op_name,argument1,argument2...". For example, to set the DSP power state
+ * to D3: echo "load_firmware,<PATH>/sof-tgl.ri" > dsp_test_op"


why not separate files for the firmware path, set_power_state, etc?
How one would guess the valid op_names and what they are set at the moment?

ujfalusi · 2023-12-07T13:32:48Z

sound/soc/sof/debug-dsp-ops.c

+	dfse->type = SOF_DFSENTRY_TYPE_BUF;
+	dfse->sdev = sdev;
+
+	debugfs_create_file("dsp_test_op", 0222, sdev->debugfs_root, dfse, &sof_dsp_ops_tester_fops);


dsp_test or rather separate files under dsp_test/ directory?

ujfalusi · 2023-12-07T13:35:26Z

sound/soc/sof/loader.c

+					plat_data->fw_filename);
+		if (!fw_filename)
+			return -ENOMEM;
+	}


Hm, freeing up the passed fw_filename is OK?

ujfalusi · 2023-12-07T13:39:44Z

sound/soc/sof/ipc4-loader.c

@@ -157,7 +159,7 @@ static size_t sof_ipc4_fw_parse_basefw_ext_man(struct snd_sof_dev *sdev)
 		if (ret)
 			return ret;
 	}
-
+#endif


Why are we skipping this part?

ujfalusi · 2023-12-07T13:47:14Z

sound/soc/sof/loader.c

@@ -14,22 +14,23 @@
 #include "sof-priv.h"
 #include "ops.h"

-int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev)
+int snd_sof_load_firmware_raw(struct snd_sof_dev *sdev, const char *fw_filename)


you should not neglect the snd_sof_load_firmware_memcpy()

ujfalusi · 2023-12-07T13:50:53Z

sound/soc/sof/Kconfig

@@ -266,6 +266,14 @@ config SND_SOC_SOF_DEBUG_RETAIN_DSP_CONTEXT
 	  Say Y if you want to retain DSP context for FW exceptions.
 	  If unsure, select "N".

+config SND_SOC_SOF_DEBUG_DSP_OPS_TEST
+	bool "SOF enable DSP ops testing"


What does constitute a "DSP ops"? Isn't this something like "DSP test mode"? Should we have a sof_debug flag for this to be enabled/disabled instead of re-building the kernel?

ujfalusi · 2023-12-07T13:56:24Z

sound/soc/sof/core.c

@@ -488,6 +489,7 @@ static int sof_probe_continue(struct snd_sof_dev *sdev)
 			"error: failed to register machine driver %d\n", ret);
 		goto fw_trace_err;
 	}
+#endif

 	ret = sof_register_clients(sdev);


Why are we loading clients in DSP test mode? They might interfere with the testing, no?

ujfalusi · 2023-12-07T13:58:32Z

sound/soc/sof/core.c

 fw_trace_err:
+#endif
 	sof_fw_trace_free(sdev);


We also enable ftrace? That will send messages, again, might interfere with testing?

marc-hb · 2024-01-17T02:55:19Z

The ability to make the DSP panic after some time parameter would also be useful to test recovery features like these:

The --skip-to-first-trace option was added as a precaution to exclude log pollution from before the start of a test but how much pollution it excludes was never assessed; we just don't know. Hopefully not much? Replace, see below. mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"), mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots, not on when booting from IMR. Booting from IMR has now been enabled by default in most configurations. That combination breaks --skip-to-first-trace. The only practical short-term solution is to turn off --skip-to-first-trace and take the risk of old logs polluting the test results, which is what this commit does. If results pollution proves to be a problem, we could in the longer term try to disable IMR boot just before running this test using sof_debug as discussed in thesofproject#1153 This would also measure from a possibly cleaner DSP state. New, DSP "debug ops" being proposed in thesofproject/linux#4726 might also help. Fixes: thesofproject#1153 Signed-off-by: Baofeng Tian <[email protected]> Signed-off-by: Chao Song <[email protected]>

The --skip-to-first-trace option was added as a precaution to exclude log pollution from before the start of a test, normally the residue traces are related to trigger stop IPC, dma put and component reset (as you can see). There could be 10 ~ 20 lines. mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"), mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots, not on when booting from IMR. Booting from IMR has now been enabled by default in most configurations. That combination breaks --skip-to-first-trace. The only practical short-term solution is to turn off --skip-to-first-trace and take the risk of old logs polluting the test results, which is what this commit does. If results pollution proves to be a problem, we could in the longer term try to disable IMR boot just before running this test using sof_debug as discussed in thesofproject#1153 This would also measure from a possibly cleaner DSP state. New, DSP "debug ops" being proposed in thesofproject/linux#4726 might also help. Fixes: thesofproject#1153 Signed-off-by: Baofeng Tian <[email protected]> Signed-off-by: Chao Song <[email protected]>

ujfalusi · 2024-01-17T18:04:52Z

The ability to make the DSP panic after some time parameter would also be useful to test recovery features like these:
* [Firmware recovery #1675](https://github.com/thesofproject/linux/issues/1675)

* [ASoC: SOF: ipc4-pcm: Workaround for crashed firmware on system suspend #4780](https://github.com/thesofproject/linux/pull/4780)

It is not a scope of this PR.

marc-hb · 2024-01-17T18:17:59Z

The ability to make the DSP panic

It is not a scope of this PR.

Understood but would it be a useful stepping stone? Looks like one.

ujfalusi · 2024-01-17T18:23:52Z

The ability to make the DSP panic

It is not a scope of this PR.

Understood but would it be a useful stepping stone? Looks like one.

To be honest I would rather not do such a thing.

The --skip-to-first-trace option was added as a precaution to exclude log pollution from before the start of a test, normally the residue traces are related to trigger stop IPC, dma put and component reset (as you can see). There could be 10 ~ 20 lines. mtrace timestamps used to return to zero on every DSP boot, which --skip-to-first-trace depends on. After SOF commit 76b0979c6453 ("sync time between host and fw for logging"), mtrace timestamps still return to zero (briefly) but only on "cold" DSP boots, not on when booting from IMR. Booting from IMR has now been enabled by default in most configurations. That combination breaks --skip-to-first-trace. The only practical short-term solution is to turn off --skip-to-first-trace and take the risk of old logs polluting the test results, which is what this commit does. If results pollution proves to be a problem, we could in the longer term try to disable IMR boot just before running this test using sof_debug as discussed in #1153 This would also measure from a possibly cleaner DSP state. New, DSP "debug ops" being proposed in thesofproject/linux#4726 might also help. Fixes: #1153 Signed-off-by: Baofeng Tian <[email protected]> Signed-off-by: Chao Song <[email protected]>

plbossart · 2024-03-06T19:48:33Z

I'll close this for now since it's not an active development. We can always re-open this later.

plbossart reviewed Dec 2, 2023

View reviewed changes

ujfalusi reviewed Dec 7, 2023

View reviewed changes

This was referenced Jan 17, 2024

[BUG] performance test is failing following timestamp sync change sof#8480 thesofproject/sof-test#1153

Closed

ASoC: SOF: ipc4-pcm: Workaround for crashed firmware on system suspend #4780

Merged

marc-hb mentioned this pull request Jan 17, 2024

lib.sh: perf_analyze(): drop --skip-to-first-trace thesofproject/sof-test#1150

Merged

plbossart closed this Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASoC: SOF: debug: Introduce debug option to test DSP ops #4726

ASoC: SOF: debug: Introduce debug option to test DSP ops #4726

ranj063 commented Dec 2, 2023

plbossart left a comment

plbossart Dec 2, 2023

ujfalusi Dec 7, 2023

plbossart Dec 2, 2023

plbossart Dec 2, 2023

plbossart Dec 2, 2023

ranj063 Dec 4, 2023

marc-hb Dec 5, 2023

plbossart Dec 5, 2023

ranj063 Dec 5, 2023

plbossart Dec 5, 2023

marc-hb Dec 6, 2023

ujfalusi Dec 7, 2023

lgirdwood Dec 7, 2023 •

edited

Loading

plbossart Dec 7, 2023

plbossart Dec 2, 2023

marc-hb commented Dec 6, 2023

andyross commented Dec 6, 2023

ujfalusi left a comment

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

ujfalusi Dec 7, 2023

marc-hb commented Jan 17, 2024

ujfalusi commented Jan 17, 2024

marc-hb commented Jan 17, 2024

ujfalusi commented Jan 17, 2024

plbossart commented Mar 6, 2024

ASoC: SOF: debug: Introduce debug option to test DSP ops #4726

ASoC: SOF: debug: Introduce debug option to test DSP ops #4726

Conversation

ranj063 commented Dec 2, 2023

plbossart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgirdwood Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marc-hb commented Dec 6, 2023

andyross commented Dec 6, 2023

ujfalusi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marc-hb commented Jan 17, 2024

ujfalusi commented Jan 17, 2024

marc-hb commented Jan 17, 2024

ujfalusi commented Jan 17, 2024

plbossart commented Mar 6, 2024

lgirdwood Dec 7, 2023 •

edited

Loading