-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross core fixes for mtl-006-drop-stable #8194
Cross core fixes for mtl-006-drop-stable #8194
Conversation
* For cross-core connection such problems result in weird bugs: corrupted | ||
* memory due to wrong invalidate size. | ||
* | ||
* Note, this workaround will not work for 8 bit audio! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WA is good for now - soon comp_buffers won't be used in cross core connections at all
/* Just ignore full sinks and copy data to other connected mixouts */ | ||
full_sinks[i] = (frames == 0); | ||
if (frames == 0) { | ||
///if (sink_c->sink && sink_c->sink->state == COMP_STATE_ACTIVE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style, pls fix even in 006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as it stands we have a few locations, e.g. in pipeline scheduling core, that aren't cross-core safe. You can find multiple "FIXME" and other comments around the code... I wouldn't be surprised if attempts to run cross-core with these changes result in crashes.
@@ -46,24 +46,6 @@ DECLARE_TR_CTX(mixout_tr, SOF_UUID(mixout_uuid), LOG_LEVEL_INFO); | |||
#define MIXIN_MAX_SINKS IPC4_MIXIN_MODULE_MAX_OUTPUT_QUEUES | |||
#define MIXOUT_MAX_SOURCES IPC4_MIXOUT_MODULE_MAX_INPUT_QUEUES | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to the commit description: does this mean, that we now always would have a buffer between a mixin and a mixout? Only to support those probably rather rare cases where we have to connect cross-cores? I think it would be a pity to lose that optimisation. No way to have both? Maybe even a separate component, but hopefully that isn't needed. But I'd make a clear separation between the two cases - maybe add a new file for that.
*/ | ||
avail_frames = input_buffers[0].size; /// audio_stream_get_avail_frames(input_buffers[0].data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment style, and that's probably not a comment, you wanted to have upstream?
full_sinks[i] = (frames == 0); | ||
if (frames == 0) { | ||
///if (sink_c->sink && sink_c->sink->state == COMP_STATE_ACTIVE) | ||
/// return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left-over
@@ -433,13 +429,12 @@ int ipc_comp_connect(struct ipc *ipc, ipc_pipe_comp_connect *_connect) | |||
source->direction_set = true; | |||
} | |||
|
|||
irq_local_enable(flags); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you wrote that it cannot be preempted - how about the scheduler? Cannot a timer interrupt preempt it and then run the LL-scheduler and access these buffers in an inconsistent state? Doesn't seem safe to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @lyakh
@lyakh problem is that we cannot proceed IDC when interrupts are locked, but the bigger problem is that we DO need to have connected LL pipelines this week on branch 006, there's no time to do it better in the way we discussed before
@serhiy-katsyuba-intel maybe we can try to locate all modules structures in shared memory and execute bind on primary core? Just a suggestion without deep analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you wrote that it cannot be preempted - how about the scheduler? Cannot a timer interrupt preempt it and then run the LL-scheduler and access these buffers in an inconsistent state? Doesn't seem safe to me
I assume driver should send bind IPC only when pipelines (with binding components) are not running. If this assumption is true -- bind operation can only be preempted to run some other pipelines but not the pipelines containing binding components.
Second assumption, is that IPC has no bugs (hope this does not sound too naive :) ). So while bind IPC is processing, IPC register has BUSY flag set indicating to driver FW is not ready to receive next IPC. After bind IPC processing finished, FW sends IPC reply that clears BUSY flag and only after that driver could send another IPC, like start the pipelines.
If these two assumptions are correct, when we a safe.
The problem with irq_local_disable() (actually, the name is bad -- it disables interrupts globally on all cores) is that after interrupts are disabled IDC stops working, so thing like process_on_core(..., BLOCKING) or comp_bind_remote() will dead lock.
@serhiy-katsyuba-intel maybe we can try to locate all modules structures in shared memory and execute bind on primary core? Just a suggestion without deep analysis.
No, I guess component expects that all its ops are called from same core. It might allocate some data in cached memory in .create() and expects it can always access it without any multi-core tricks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serhiy-katsyuba-intel so you suggest that the fix:
prevent the IPC task getting preempted which could result in buffers being only half connected when a pipeline task gets executed.
may be actually caused by driver or IPC driver problem - races that lead to binding modules on running pipeline?
the issue in question: #6653
I checked that the pipeline state is not verified before bind operation, if the above is true it is worth to reject bind when the pipeline is in incorrect state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that the pipeline state is not verified before bind operation, if the above is true it is worth to reject bind when the pipeline is in incorrect state
Yes, that might be the good option.
Another option is to block preempting without blocking interrupts. Zephyr might have some API for this, if, not, than it could be done on LL scheduler level -- there are zephyr_ll_lock()/unlock(), however, they are implemented now just by calling irq_local_disable()/enable() :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume driver should send bind IPC only when pipelines (with binding components) are not running.
@serhiy-katsyuba-intel @marcinszkudlinski the problem is with forked streams: when a stream is running and then you bind or unbind another stream to or from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh, added k_sched_lock() in a separate PR #8205 to guard bind/unbind.
However, if bind allowed to be called on running pipeline then we have more problems: .params() and .prepare() will not be called after bind for component on a running pipeline. Buffer stream params (like format, channel number) will be not initialized if capture pipeline is binded to some other running pipeline -- comp_verify_params() only initializes source buffers on capture pipeline.
Skipped .prepare() call may also create problems, e.g. mixin and mixout use .prepare() to select proper gain and mix functions depending of format.
So we need to address those problems if we allow bind to be called for running pipeline.
ee9feab
to
6a9018e
Compare
Heavily reimplemented version of mixin and mixout to support mixin on one core to be connected to mixout(s) of another core. This version adds buffer between mixin and mixout and so is slightly slower then previous version. Signed-off-by: Serhiy Katsyuba <[email protected]>
There are multiple problems with audio_stream_avail_frames_aligned() implementation: (1) default alignment initialization in audio_stream_init() does not work as frame format is set later, (2) if alignment was not explicitly initialized, audio_stream_avail_frames_aligned() returns wrong (higher) number of frames, (3) audio_stream_avail_frames_aligned() returns wrong result for formats with frame size not equal to power of 2, e.g. for streams with 3, 5 and 7 channels. For cross-core connection such problems result in weird bugs: corrupted memory due to wrong invalidate size. Signed-off-by: Serhiy Katsyuba <[email protected]>
There are multiple problems with audio_stream_avail_frames_aligned() implementation: (1) default alignment initialization in audio_stream_init() does not work as frame format is set later, (2) if alignment was not explicitly initialized, audio_stream_avail_frames_aligned() returns wrong (higher) number of frames, (3) audio_stream_avail_frames_aligned() returns wrong result for formats with frame size not equal to power of 2, e.g. for streams with 3, 5 and 7 channels. For cross-core connection such problems result in weird bugs: corrupted memory due to wrong invalidate size. Signed-off-by: Serhiy Katsyuba <[email protected]>
Bind for components located on different cores is processed on core 0, so buffer is created on core 0, not on its source component core. Signed-off-by: Serhiy Katsyuba <[email protected]>
comp_bind() may call comp_bind_remote() which uses IDC and so interrupts should be enabled, otherwise dead lock will happen. Same for comp_unbind(). Those irq_local_disable() and irq_local_enable() are actually global. irq_local_disable()/irq_local_enable() are not really needed as bind/unbind cannot be preempted anyway because of IPC synchronisation: only after bind/unbind IPC reply next IPC (e.g. pipeline start) could come. Signed-off-by: Serhiy Katsyuba <[email protected]>
6a9018e
to
bdfec1d
Compare
413ceaa
into
thesofproject:mtl-006-drop-stable
No description provided.