Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross core fixes for mtl-006-drop-stable #8194

Conversation

serhiy-katsyuba-intel
Copy link
Contributor

No description provided.

src/ipc/ipc4/helper.c Show resolved Hide resolved
* For cross-core connection such problems result in weird bugs: corrupted
* memory due to wrong invalidate size.
*
* Note, this workaround will not work for 8 bit audio!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WA is good for now - soon comp_buffers won't be used in cross core connections at all

/* Just ignore full sinks and copy data to other connected mixouts */
full_sinks[i] = (frames == 0);
if (frames == 0) {
///if (sink_c->sink && sink_c->sink->state == COMP_STATE_ACTIVE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style, pls fix even in 006

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as it stands we have a few locations, e.g. in pipeline scheduling core, that aren't cross-core safe. You can find multiple "FIXME" and other comments around the code... I wouldn't be surprised if attempts to run cross-core with these changes result in crashes.

@@ -46,24 +46,6 @@ DECLARE_TR_CTX(mixout_tr, SOF_UUID(mixout_uuid), LOG_LEVEL_INFO);
#define MIXIN_MAX_SINKS IPC4_MIXIN_MODULE_MAX_OUTPUT_QUEUES
#define MIXOUT_MAX_SOURCES IPC4_MIXOUT_MODULE_MAX_INPUT_QUEUES

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the commit description: does this mean, that we now always would have a buffer between a mixin and a mixout? Only to support those probably rather rare cases where we have to connect cross-cores? I think it would be a pity to lose that optimisation. No way to have both? Maybe even a separate component, but hopefully that isn't needed. But I'd make a clear separation between the two cases - maybe add a new file for that.

*/
avail_frames = input_buffers[0].size; /// audio_stream_get_avail_frames(input_buffers[0].data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment style, and that's probably not a comment, you wanted to have upstream?

full_sinks[i] = (frames == 0);
if (frames == 0) {
///if (sink_c->sink && sink_c->sink->state == COMP_STATE_ACTIVE)
/// return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left-over

@@ -433,13 +429,12 @@ int ipc_comp_connect(struct ipc *ipc, ipc_pipe_comp_connect *_connect)
source->direction_set = true;
}

irq_local_enable(flags);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wrote that it cannot be preempted - how about the scheduler? Cannot a timer interrupt preempt it and then run the LL-scheduler and access these buffers in an inconsistent state? Doesn't seem safe to me

Copy link
Contributor

@marcinszkudlinski marcinszkudlinski Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @lyakh
@lyakh problem is that we cannot proceed IDC when interrupts are locked, but the bigger problem is that we DO need to have connected LL pipelines this week on branch 006, there's no time to do it better in the way we discussed before

@serhiy-katsyuba-intel maybe we can try to locate all modules structures in shared memory and execute bind on primary core? Just a suggestion without deep analysis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wrote that it cannot be preempted - how about the scheduler? Cannot a timer interrupt preempt it and then run the LL-scheduler and access these buffers in an inconsistent state? Doesn't seem safe to me

I assume driver should send bind IPC only when pipelines (with binding components) are not running. If this assumption is true -- bind operation can only be preempted to run some other pipelines but not the pipelines containing binding components.

Second assumption, is that IPC has no bugs (hope this does not sound too naive :) ). So while bind IPC is processing, IPC register has BUSY flag set indicating to driver FW is not ready to receive next IPC. After bind IPC processing finished, FW sends IPC reply that clears BUSY flag and only after that driver could send another IPC, like start the pipelines.

If these two assumptions are correct, when we a safe.

The problem with irq_local_disable() (actually, the name is bad -- it disables interrupts globally on all cores) is that after interrupts are disabled IDC stops working, so thing like process_on_core(..., BLOCKING) or comp_bind_remote() will dead lock.

@serhiy-katsyuba-intel maybe we can try to locate all modules structures in shared memory and execute bind on primary core? Just a suggestion without deep analysis.

No, I guess component expects that all its ops are called from same core. It might allocate some data in cached memory in .create() and expects it can always access it without any multi-core tricks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-katsyuba-intel so you suggest that the fix:

prevent the IPC task getting preempted which could result in buffers being only half connected when a pipeline task gets executed.

may be actually caused by driver or IPC driver problem - races that lead to binding modules on running pipeline?
the issue in question: #6653

I checked that the pipeline state is not verified before bind operation, if the above is true it is worth to reject bind when the pipeline is in incorrect state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that the pipeline state is not verified before bind operation, if the above is true it is worth to reject bind when the pipeline is in incorrect state

Yes, that might be the good option.

Another option is to block preempting without blocking interrupts. Zephyr might have some API for this, if, not, than it could be done on LL scheduler level -- there are zephyr_ll_lock()/unlock(), however, they are implemented now just by calling irq_local_disable()/enable() :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume driver should send bind IPC only when pipelines (with binding components) are not running.

@serhiy-katsyuba-intel @marcinszkudlinski the problem is with forked streams: when a stream is running and then you bind or unbind another stream to or from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh, added k_sched_lock() in a separate PR #8205 to guard bind/unbind.

However, if bind allowed to be called on running pipeline then we have more problems: .params() and .prepare() will not be called after bind for component on a running pipeline. Buffer stream params (like format, channel number) will be not initialized if capture pipeline is binded to some other running pipeline -- comp_verify_params() only initializes source buffers on capture pipeline.
Skipped .prepare() call may also create problems, e.g. mixin and mixout use .prepare() to select proper gain and mix functions depending of format.
So we need to address those problems if we allow bind to be called for running pipeline.

@serhiy-katsyuba-intel serhiy-katsyuba-intel force-pushed the cross-core-fixes-006 branch 3 times, most recently from ee9feab to 6a9018e Compare September 14, 2023 12:50
Heavily reimplemented version of mixin and mixout to support mixin on
one core to be connected to mixout(s) of another core.

This version adds buffer between mixin and mixout and so is slightly
slower then previous version.

Signed-off-by: Serhiy Katsyuba <[email protected]>
There are multiple problems with audio_stream_avail_frames_aligned()
implementation:

(1) default alignment initialization in audio_stream_init() does not
work as frame format is set later,

(2) if alignment was not explicitly initialized,
audio_stream_avail_frames_aligned() returns wrong (higher) number
of frames,

(3) audio_stream_avail_frames_aligned() returns wrong result for formats
with frame size not equal to power of 2, e.g. for streams with 3, 5 and
7 channels.

For cross-core connection such problems result in weird bugs: corrupted
memory due to wrong invalidate size.

Signed-off-by: Serhiy Katsyuba <[email protected]>
There are multiple problems with audio_stream_avail_frames_aligned()
implementation:

(1) default alignment initialization in audio_stream_init() does not
work as frame format is set later,

(2) if alignment was not explicitly initialized,
audio_stream_avail_frames_aligned() returns wrong (higher) number
of frames,

(3) audio_stream_avail_frames_aligned() returns wrong result for formats
with frame size not equal to power of 2, e.g. for streams with 3, 5 and
7 channels.

For cross-core connection such problems result in weird bugs: corrupted
memory due to wrong invalidate size.

Signed-off-by: Serhiy Katsyuba <[email protected]>
Bind for components located on different cores is processed on core 0,
so buffer is created on core 0, not on its source component core.

Signed-off-by: Serhiy Katsyuba <[email protected]>
comp_bind() may call comp_bind_remote() which uses IDC and so interrupts
should be enabled, otherwise dead lock will happen. Same for
comp_unbind().

Those irq_local_disable() and irq_local_enable() are actually global.

irq_local_disable()/irq_local_enable() are not really needed as
bind/unbind cannot be preempted anyway because of IPC synchronisation:
only after bind/unbind IPC reply next IPC (e.g. pipeline start)
could come.

Signed-off-by: Serhiy Katsyuba <[email protected]>
@marcinszkudlinski marcinszkudlinski merged commit 413ceaa into thesofproject:mtl-006-drop-stable Sep 14, 2023
25 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants