Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipc4 multicore issue fixes #8143

Merged
merged 2 commits into from
Sep 5, 2023

Conversation

RanderWang
Copy link
Collaborator

@RanderWang RanderWang commented Sep 4, 2023

BUG: Various IPC timed out with multi-core run simultaneously #7774.

This issue is very random and can be reproduced after many cycles test. The direct reason is that : there is a self loop
in the ipc message list like: ipc msg -> M1 -> M2 ->..... > Mn ->Mn , this result to a infinite loop with ipc spin lock hold for the logging function. This is one major reason that the issue can't be reproduced with logging disabled.

-static bool is_notification_queued(void)
 {
        struct ipc *ipc = ipc_get();
-       struct list_item *slist;
-       struct ipc_msg *msg;
        k_spinlock_key_t key;
        bool queued = false;
 
        key = k_spin_lock(&ipc->lock);
-       if (list_is_empty(&ipc->msg_list))
-               goto out;
-
-       list_for_item(slist, &ipc->msg_list) {
-               msg = container_of(slist, struct ipc_msg, list);
-
-               if (msg == &msg_notify) {
-                       queued = true;
-                       break;
-               }
-       }
-
-out:
        k_spin_unlock(&ipc->lock, key);
 
        return queued;

Why the single ipc message is double added ? The reason is that In multicore case, IPC message is dispatched from primary core to secondary core which send reply message to host. Primary core will do nothing if IPC_TASK_SECONDARY_CORE is set. But in rare case, the secondary code finish the reply message and clear this flag before the ipc thread in primary core check this flag because the primary core was busy with logging thread and check it after awhile, then primary core also send reply message again. This results to the reply message being inserted two times in ipc message list and result to infinite loop when visiting the list for the logging after awhile.

We can find some hint in kernel log before ipc timeout.

[58167.022730] snd_sof:sof_ipc4_route_setup: sof-audio-pci-intel-mtl 0000:00:1f.3: bind gain.4.1:0 -> dai-copier.SSP.NoCodec-1.playback:0
[58167.022743] snd_sof:: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx      : 0x45050006|0x50004: MOD_BIND
[58167.023200] snd_sof:: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx reply: 0x65000000|0x50004: MOD_BIND
[58167.023374] snd_sof:: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx reply: 0x65000000|0x50004: MOD_BIND
[58167.023417] snd_sof:snd_sof_ipc_reply: 0000:00:1f.3: no reply expected, received 0x65000000, will be ignored

For the case in bug:#7774

[ 2398.695476] : snd_sof:: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx done : 0x40030006|0x2030010: MOD_INIT_INSTANCE [data size: 64]
[ 2398.695496] : sof-audio-pci-intel-mtl ipc tx 0x44030006|0x30000018: MOD_LARGE_CONFIG_SET [data size: 24]
[ 2398.695718] : sof-audio-pci-intel-mtl ipc tx reply: 0x64000000|0x30000018: MOD_LARGE_CONFIG_SET
[ 2398.695995] : sof-audio-pci-intel-mtl ipc tx reply: 0x64000000|0x30000018: MOD_LARGE_CONFIG_SET
[ 2398.696006] : sof-audio-pci-intel-mtl  ipc tx done : 0x44030006|0x30000018: MOD_LARGE_CONFIG_SET [data size: 24]
[ 2398.696018] : snd_sof:snd_sof_ipc_reply: 0000:00:1f.3: no reply expected, received 0x64000000, will be ignored
[ 2398.696043] : snd_sof:sof_widget_setup_unlocked: 0000:00:1f.3: widget gain.6.1 setup complete
[ 2398.696058] : snd_sof:sof_ipc4_widget_setup:  Create widget dai-copier.SSP.NoCodec-2.playback instance 3 - pipe 6 - core 2
[ 2398.696064] : snd_sof:sof_ipc4_log_header: : ipc tx      : 0x40030004|0x203002f: MOD_INIT_INSTANCE [data size: 188]
[ 2399.201595] : sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x40030004|0x203002f
[ 2399.201612] : sof-audio-pci-intel-mtl 0000:00:1f.3: Attempting to prevent DSP from entering D3 state to preserve context
[ 2399.201618] : sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 2399.201643] : sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0xc0030004|0x203002f|0x0, target: 0x40030004|0x203002f|0x0, ctl: 0x3
[ 2399.201650] : sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 2399.201655] : sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 2399.201659] : sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout

This PR checks whether the msg has been done by secondary core and do nothing for primary core. Also simply the check for logging.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch @RanderWang , the SECONDARY_CORE checking indeed does not seem. One minor code style proposal inline, but otherwise looks good.

src/ipc/ipc4/handler.c Outdated Show resolved Hide resolved
src/ipc/ipc4/handler.c Outdated Show resolved Hide resolved
In multicore case, IPC message is dispatched from primary core to
secondary core which send reply message to host. Primary core will
do nothing if IPC_TASK_SECONDARY_CORE is set. But in rare case, the
secondary code finish the reply message and clear this flag before
the ipc thread in primary core check this flag, then primary core
also send reply message again. This results to the reply message being
inserted two times in ipc message list and infinite loop when visiting
the list.

This patch will check msg_reply state and do nothing if it is prepared
. We don't need to init reply message since it is initialized after
deleting from the ipc list.

Signed-off-by: Rander Wang <[email protected]>
Use list_is_empty to check the message is queued or not. The notify
message is initialized to empty after deleting from the ipc msg list.
We use the same idea in ipc_msg_send.

Signed-off-by: Rander Wang <[email protected]>
@RanderWang
Copy link
Collaborator Author

@kv2019i @lyakh Update for your comments. Thanks!

@RanderWang
Copy link
Collaborator Author

Validated this PR with 3 different MTL RVPs, all passed a 3 hours multicore stress test.

Copy link
Collaborator

@softwarecki softwarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is good, but there's an issue with the long inline comment.

src/ipc/ipc4/handler.c Show resolved Hide resolved
@lgirdwood lgirdwood merged commit 6bd69e6 into thesofproject:main Sep 5, 2023
40 of 41 checks passed
@lgirdwood
Copy link
Member

Lets get this tested in daily test tomorrow. @mwasko fyi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants