-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reverse dai trigger config and disable CONFIG_DMA_DW_SUSPEND_DRAIN for mtl as well #8835
Fix reverse dai trigger config and disable CONFIG_DMA_DW_SUSPEND_DRAIN for mtl as well #8835
Conversation
The correct config to use for ifdef for reversing the stop trigger order is CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE Fixes: a6a80ec ("dai-zephyr: Fix the ordering of DAI and DMA triggers") Signed-off-by: Peter Ujfalusi <[email protected]>
@@ -79,6 +79,9 @@ CONFIG_INTEL_ADSP_IPC=y | |||
CONFIG_WATCHDOG=y | |||
CONFIG_LL_WATCHDOG=y | |||
|
|||
# CONFIG_DMA_DW_SUSPEND_DRAIN=y needs reversed trigger order | |||
CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE=y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not needed for cAVS as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only ace15_mtpm selects CONFIG_DMA_DW_SUSPEND_DRAIN, it is only needed in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, what differences existing on the DW/SSP side that would require a different behavior between generations? This looks rather weird to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't know, but the fact is that if the SSP is stopped, it will not trigger the DAM request so DW will not going to drain.
We don't see this issue on cAVS2.5 aqs the SUSPENSE_DRAIN is not enabled but we see it on MTL. LNL do not have it either, but it is not using DW anyways.
I cannot connect the dot why this change was added, what it suppose to fix and why it has not been blamed for this regression.
fwi, zephyr commit 6226f9e6e44f ("dma: dw: fix the return value check") made the drain timeout fatal, which imho should be a warn and we should be proceeding to disable the DMA channel force-ably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the reverse order config depend on suspend_drain in the kconfig description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't know, but the fact is that if the SSP is stopped, it will not trigger the DAM request so DW will not going to drain. We don't see this issue on cAVS2.5 aqs the SUSPENSE_DRAIN is not enabled but we see it on MTL. LNL do not have it either, but it is not using DW anyways.
I cannot connect the dot why this change was added, what it suppose to fix and why it has not been blamed for this regression. fwi, zephyr commit 6226f9e6e44f ("dma: dw: fix the return value check") made the drain timeout fatal, which imho should be a warn and we should be proceeding to disable the DMA channel force-ably
so this begs the question, why are we enabling CONFIG_DMA_DW_SUSPEND_DRAIN only for MTL? And what issue does enabling this manifest in our CI tests? Would enabling this on previous gens also result in the same issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the recommended sequence, but it should have came with the reverse trigger. But, the reverse trigger should be only for DW serviced links?
I'm sure it will have the same effect on cAVS2.5 on SSP at least. If the SSP is stopped, it is not going to trigger the DMA request and the DW drain to work it needs it, there is not 'free-run' bit in DW to just drain afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063, setting only CONFIG_DMA_DW_SUSPEND_DRAIN
for TGL and I can reproduce the drain timeout.
The CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE
do not fix the issue.
This change in Zephyr can recover the system in case of drain timeout:
diff --git a/drivers/dma/dma_dw_common.c b/drivers/dma/dma_dw_common.c
index 22de9a7d9ef8..cd1076c686ed 100644
--- a/drivers/dma/dma_dw_common.c
+++ b/drivers/dma/dma_dw_common.c
@@ -594,10 +597,8 @@ int dw_dma_stop(const struct device *dev, uint32_t channel)
/* now we wait for FIFO to be empty */
bool fifo_empty = WAIT_FOR(dw_read(dev_cfg->base, DW_CFG_LOW(channel)) & DW_CFGL_FIFO_EMPTY,
DW_DMA_TIMEOUT, k_busy_wait(DW_DMA_TIMEOUT/10));
- if (!fifo_empty) {
- LOG_ERR("%s: dma %d channel drain time out", __func__, channel);
- return -ETIMEDOUT;
- }
+ if (!fifo_empty)
+ LOG_WRN("%s: dma %s channel %d drain time out", __func__, dev->name, channel);
#endif
dw_write(dev_cfg->base, DW_DMA_CHAN_EN, DW_CHAN_MASK(channel));
@@ -606,7 +607,7 @@ int dw_dma_stop(const struct device *dev, uint32_t channel)
bool is_disabled = WAIT_FOR(!(dw_read(dev_cfg->base, DW_DMA_CHAN_EN) & DW_CHAN(channel)),
DW_DMA_TIMEOUT, k_busy_wait(DW_DMA_TIMEOUT/10));
if (!is_disabled) {
- LOG_ERR("%s: dma %d channel disable timeout", __func__, channel);
+ LOG_ERR("%s: dma %s channel %d disable timeout", __func__, dev->name, channel);
return -ETIMEDOUT;
}
The strange thing that I'm seeing on TGL:
hw:0,0 works
hw:0,1 is disabled in BIOS and it fails always the playback because the playback is not progressing.
Running this in one terminal:
test="aplay -q -d1 -Dhw:0,0 -fdat /dev/zero"
iteration=0
while :
do
$test
((iteration++))
echo "Iteration: ${iteration}"
sleep 1
done
Then run this on another terminal:
aplay -q -d1 -Dhw:0,1 -fdat /dev/zero
We will have drain timeout randomly on hw:0,0 or on hw:0,1 SSP.
@@ -1122,7 +1122,7 @@ static int dai_comp_trigger_internal(struct dai_data *dd, struct comp_dev *dev, | |||
break; | |||
case COMP_TRIGGER_PAUSE: | |||
comp_dbg(dev, "dai_comp_trigger_internal(), PAUSE"); | |||
#if CONFIG_COMP_DAI_TRIGGER_ORDER_REVERSE | |||
#if CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commit added the Kconfig:
config COMP_DAI_**STOP_**TRIGGER_ORDER_REVERSE
and in code it used:
#if CONFIG_COMP_DAI_TRIGGER_ORDER_REVERSE
This patch corrects that config option use in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063, @plbossart, the drain issue on UPX is not the same, hw:0,1 does not work at all, it always fail, but it allowed me to have a fix for the handling of such.
I still think that this PR will help the MTL issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi can you please remind me what this kconfig CONFIG_DMA_DW_SUSPEND_DRAIN is meant for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063, in case you want to stop the DMA without loosing data in it's FIFO (we don't afaik) then the recommended stop sequence is:
SUSPEND+DRAIN and wait for drain to complete
Clear the EN bit to disable the channel
In case of pause you would need only toggle the SUSPEND bit.
There is no case documented when you just want to disable the channel and you don't care about the FIFO, but it is afaik is the clearing of EN (the FIFO will be lost).
Imho the correct fix would be to not enable CONFIG_DMA_DW_SUSPEND_DRAIN
for any Intel platform.
The purpose of CONFIG_DMA_DW_SUSPEND_DRAIN is to empty the FIFO before disabling the channel by draining it. Since the peripheral is disabled before the DMA (CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE is not selected), the DMA will not be able to do that causing drain timeout. The stop is used in two cases: Stream stop: the content of the FIFO does not matter as we stop the stream. Pause/resume: On pause the DMA is suspended (DW_CFGL_SUSPEND bit set) On resume the DMA is stopped, re-configured and then started again instead of resuming The peripheral is started after the DMA stop and start. Leftover audio data might cause audio glitch on resume, it is probably better to disable the draining. Note: if we want to have draining enabled we need to select the CONFIG_COMP_DAI_STOP_TRIGGER_ORDER_REVERSE at the same time to force the DMA to be stopped before the DAI. Signed-off-by: Peter Ujfalusi <[email protected]>
91a952d
to
83fb77c
Compare
Changes since v1:
|
@kv2019i good for you ? |
@jxstelter We really need to fix the fuzz problem, the CI check now fails all the time since the L3 heap change. Looks good, merging. |
@kv2019i The fuzz test fails on IPC3 since it uses directly IPC buffer content to set memory capability during rmalloc. I saw some attempts to fix that issue validating input parameters. But k_panic() which I put there could be really too strong. Will change it. |
Hi,
In the logs for MTL we see:
which is caused by the fact that the DW DMA tried to drain it's FIFO when the SSP has been already stopped, thus it will not signal the DMA request, the drain will time out.