Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reverse dai trigger config and disable CONFIG_DMA_DW_SUSPEND_DRAIN for mtl as well #8835

Merged

Conversation

ujfalusi
Copy link
Contributor

@ujfalusi ujfalusi commented Feb 5, 2024

Hi,

In the logs for MTL we see:

[  233.967141] <inf> dai_intel_ssp: dai_ssp_stop: dai_ssp_stopTX stop
[  233.967160] <inf> dai_intel_ssp: dai_ssp_update_bits: dai_ssp_update_bits base 2a000, reg 0, mask 80, value 0
[  233.967166] <inf> dai_intel_ssp: dai_ssp_stop: dai_ssp_stop SSE clear for SSP2
[  233.968515] <err> dma_dw_common: dw_dma_stop: dw_dma_stop: dma 0 channel drain time out

which is caused by the fact that the DW DMA tried to drain it's FIFO when the SSP has been already stopped, thus it will not signal the DMA request, the drain will time out.

The correct config to use for ifdef for reversing the stop trigger order
is CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE

Fixes: a6a80ec ("dai-zephyr: Fix the ordering of DAI and DMA triggers")
Signed-off-by: Peter Ujfalusi <[email protected]>
@@ -79,6 +79,9 @@ CONFIG_INTEL_ADSP_IPC=y
CONFIG_WATCHDOG=y
CONFIG_LL_WATCHDOG=y

# CONFIG_DMA_DW_SUSPEND_DRAIN=y needs reversed trigger order
CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE=y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not needed for cAVS as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only ace15_mtpm selects CONFIG_DMA_DW_SUSPEND_DRAIN, it is only needed in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, what differences existing on the DW/SSP side that would require a different behavior between generations? This looks rather weird to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know, but the fact is that if the SSP is stopped, it will not trigger the DAM request so DW will not going to drain.
We don't see this issue on cAVS2.5 aqs the SUSPENSE_DRAIN is not enabled but we see it on MTL. LNL do not have it either, but it is not using DW anyways.

I cannot connect the dot why this change was added, what it suppose to fix and why it has not been blamed for this regression.
fwi, zephyr commit 6226f9e6e44f ("dma: dw: fix the return value check") made the drain timeout fatal, which imho should be a warn and we should be proceeding to disable the DMA channel force-ably

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the reverse order config depend on suspend_drain in the kconfig description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know, but the fact is that if the SSP is stopped, it will not trigger the DAM request so DW will not going to drain. We don't see this issue on cAVS2.5 aqs the SUSPENSE_DRAIN is not enabled but we see it on MTL. LNL do not have it either, but it is not using DW anyways.

I cannot connect the dot why this change was added, what it suppose to fix and why it has not been blamed for this regression. fwi, zephyr commit 6226f9e6e44f ("dma: dw: fix the return value check") made the drain timeout fatal, which imho should be a warn and we should be proceeding to disable the DMA channel force-ably

so this begs the question, why are we enabling CONFIG_DMA_DW_SUSPEND_DRAIN only for MTL? And what issue does enabling this manifest in our CI tests? Would enabling this on previous gens also result in the same issue?

Copy link
Contributor Author

@ujfalusi ujfalusi Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the recommended sequence, but it should have came with the reverse trigger. But, the reverse trigger should be only for DW serviced links?
I'm sure it will have the same effect on cAVS2.5 on SSP at least. If the SSP is stopped, it is not going to trigger the DMA request and the DW drain to work it needs it, there is not 'free-run' bit in DW to just drain afaik.

Copy link
Contributor Author

@ujfalusi ujfalusi Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, setting only CONFIG_DMA_DW_SUSPEND_DRAIN for TGL and I can reproduce the drain timeout.
The CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE do not fix the issue.

This change in Zephyr can recover the system in case of drain timeout:

diff --git a/drivers/dma/dma_dw_common.c b/drivers/dma/dma_dw_common.c
index 22de9a7d9ef8..cd1076c686ed 100644
--- a/drivers/dma/dma_dw_common.c
+++ b/drivers/dma/dma_dw_common.c
@@ -594,10 +597,8 @@ int dw_dma_stop(const struct device *dev, uint32_t channel)
 	/* now we wait for FIFO to be empty */
 	bool fifo_empty = WAIT_FOR(dw_read(dev_cfg->base, DW_CFG_LOW(channel)) & DW_CFGL_FIFO_EMPTY,
 				DW_DMA_TIMEOUT, k_busy_wait(DW_DMA_TIMEOUT/10));
-	if (!fifo_empty) {
-		LOG_ERR("%s: dma %d channel drain time out", __func__, channel);
-		return -ETIMEDOUT;
-	}
+	if (!fifo_empty)
+		LOG_WRN("%s: dma %s channel %d drain time out", __func__, dev->name, channel);
 #endif
 
 	dw_write(dev_cfg->base, DW_DMA_CHAN_EN, DW_CHAN_MASK(channel));
@@ -606,7 +607,7 @@ int dw_dma_stop(const struct device *dev, uint32_t channel)
 	bool is_disabled = WAIT_FOR(!(dw_read(dev_cfg->base, DW_DMA_CHAN_EN) & DW_CHAN(channel)),
 				    DW_DMA_TIMEOUT, k_busy_wait(DW_DMA_TIMEOUT/10));
 	if (!is_disabled) {
-		LOG_ERR("%s: dma %d channel disable timeout", __func__, channel);
+		LOG_ERR("%s: dma %s channel %d disable timeout", __func__, dev->name, channel);
 		return -ETIMEDOUT;
 	}

The strange thing that I'm seeing on TGL:
hw:0,0 works
hw:0,1 is disabled in BIOS and it fails always the playback because the playback is not progressing.

Running this in one terminal:

test="aplay -q -d1 -Dhw:0,0 -fdat /dev/zero"
iteration=0

while :
do
        $test
        ((iteration++))
        echo "Iteration: ${iteration}"
        sleep 1
done

Then run this on another terminal:

aplay -q -d1 -Dhw:0,1 -fdat /dev/zero

We will have drain timeout randomly on hw:0,0 or on hw:0,1 SSP.

@@ -1122,7 +1122,7 @@ static int dai_comp_trigger_internal(struct dai_data *dd, struct comp_dev *dev,
break;
case COMP_TRIGGER_PAUSE:
comp_dbg(dev, "dai_comp_trigger_internal(), PAUSE");
#if CONFIG_COMP_DAI_TRIGGER_ORDER_REVERSE
#if CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi is this saying that a6a80ec ("dai-zephyr: Fix the ordering of DAI and DMA triggers") only fixed the start trigger but kept the behavior for stop the same as before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit added the Kconfig:
config COMP_DAI_**STOP_**TRIGGER_ORDER_REVERSE

and in code it used:
#if CONFIG_COMP_DAI_TRIGGER_ORDER_REVERSE

This patch corrects that config option use in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, @plbossart, the drain issue on UPX is not the same, hw:0,1 does not work at all, it always fail, but it allowed me to have a fix for the handling of such.

I still think that this PR will help the MTL issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi can you please remind me what this kconfig CONFIG_DMA_DW_SUSPEND_DRAIN is meant for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063, in case you want to stop the DMA without loosing data in it's FIFO (we don't afaik) then the recommended stop sequence is:
SUSPEND+DRAIN and wait for drain to complete
Clear the EN bit to disable the channel

In case of pause you would need only toggle the SUSPEND bit.

There is no case documented when you just want to disable the channel and you don't care about the FIFO, but it is afaik is the clearing of EN (the FIFO will be lost).

Imho the correct fix would be to not enable CONFIG_DMA_DW_SUSPEND_DRAIN for any Intel platform.

The purpose of CONFIG_DMA_DW_SUSPEND_DRAIN is to empty the FIFO before
disabling the channel by draining it.
Since the peripheral is disabled before the DMA
(CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE is not selected), the DMA will
not be able to do that causing drain timeout.

The stop is used in two cases:
Stream stop:
the content of the FIFO does not matter as we stop the stream.

Pause/resume:
On pause the DMA is suspended (DW_CFGL_SUSPEND bit set)
On resume the DMA is stopped, re-configured and then started again instead
of resuming

The peripheral is started after the DMA stop and start.

Leftover audio data might cause audio glitch on resume, it is probably
better to disable the draining.

Note: if we want to have draining enabled we need to select the
CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE at the same time to force the
DMA to be stopped before the DAI.

Signed-off-by: Peter Ujfalusi <[email protected]>
@ujfalusi ujfalusi force-pushed the peter/pr/mtl_reverse_trigger_01 branch from 91a952d to 83fb77c Compare February 7, 2024 11:59
@ujfalusi
Copy link
Contributor Author

ujfalusi commented Feb 7, 2024

Changes since v1:

  • disable the CONFIG_DMA_DW_SUSPEND_DRAIN for mtl also

@ujfalusi ujfalusi changed the title Fix reverse dai trigger config and enable it for MTL Fix reverse dai trigger config and disable CONFIG_DMA_DW_SUSPEND_DRAIN for mtl as well Feb 7, 2024
@lgirdwood
Copy link
Member

@kv2019i good for you ?

@kv2019i
Copy link
Collaborator

kv2019i commented Feb 9, 2024

@jxstelter We really need to fix the fuzz problem, the CI check now fails all the time since the L3 heap change.

Looks good, merging.

@kv2019i kv2019i merged commit 557e581 into thesofproject:main Feb 9, 2024
40 of 44 checks passed
@jxstelter
Copy link
Contributor

@jxstelter We really need to fix the fuzz problem, the CI check now fails all the time since the L3 heap change.

Looks good, merging.

@kv2019i The fuzz test fails on IPC3 since it uses directly IPC buffer content to set memory capability during rmalloc. I saw some attempts to fix that issue validating input parameters. But k_panic() which I put there could be really too strong. Will change it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants