-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeline 2.0: Introducing Data Processing Queue #8020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks good, but we do need to prioritze MCPS over partitioning of code between private and public API calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The issue with spinlock with non-shared use needs to fixed, but otherwise seems good to go.
b72d2ad
to
40f1df6
Compare
@lgirdwood
|
40f1df6
to
57428be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment w.r.t. locks inline, but not blocking this PR. My immediate concerns addressed, so +1 to this.
src/audio/dp_queue.c
Outdated
union lock_key { | ||
k_spinlock_key_t spinlock; | ||
int irq_lock; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes a bit out of the scope of this PR, but given we have very little use of locking in sof/src/audio (mostly in the DAI code, chain-dma and kpb), this is worth a discussion.
Given we might want to run some of the audio modules in unprivileged code, having code that uses spinlocks or irqlocks is not really future proof. So I wonder if we should just move to mutexes in places like this, which would be available both for user-space and kernel threads.
This does incur a perf penalty, and might cause problems as we still have non-Zephyr SOF users who run audio components in IRQ context, so this is probably not something we can do now, but I'd say this should be the longterm solution for audio code.
b7e91cf
to
1239780
Compare
1239780
to
e77de7e
Compare
thesofproject#8044 thesofproject#8020 thesofproject#8002 Signed-off-by: Marcin Szkudlinski <[email protected]>
@marcinszkudlinski I'm seeing a CI build error with Zephyr
|
github.com/#8002 |
e77de7e
to
ab6adaa
Compare
rebased after #8002 merge |
@marcinszkudlinski Is #8006 still a hard dependency (like in description), or is this ok to merge once reviews-and-testing is ok? |
All requested changes addressed
PLEASE SKIP 1st COMMIT IN REVIEW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
065572c
to
90eb460
Compare
e2b7706
to
ede4956
Compare
all deps resolved, rebased on newest main |
@marcinszkudlinski Can you check in the CI results. Thanks ! |
ede4956
to
1dde6e4
Compare
looks like there's no results: pushing an empty commit (rebase to newest main) to retrigger |
1dde6e4
to
03fcb12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One extra merge-conflict bit in the commit message, can you clean up @marcinszkudlinski . Otherwise I can re-confirm my +1 for this.
Data provided by source interface cannot be modified in any way by the module using source API. Mark pointers as const Signed-off-by: Marcin Szkudlinski <[email protected]>
DP queue is a lockless circular buffer providing safe consumer/producer cached operations cross cores prerequisites: 1) incoming and outgoing data rate MUST be the same 2) Both data consumer and data producer declare max chunk sizes they want to use (IBS/OBS) required Buffer size: - 2*MAX(IBS,OBS) if the larger of IBS/OBS is multiplication of smaller - 3*MAX(IBS,OBS) otherwise The queue may work in 2 modes 1) local mode in case both receiver and sender are located on the same core and cache coherency does not matter. dp_queue structure is located in cached memory In this case DP Queue is a simple ring buffer 2) shared mode In this case we need to writeback cache when new data arrive and invalidate cache on secondary core. dp_queue structure is located in shared memory dpQueue is a lockless consumer/producer safe buffer. It is achieved by having only 2 shared variables: write_offset - can be modified by data producer only read_offset - can be modified by data consumer only as 32 bit operations are atomic, it is multi-thread and multi-core save There some explanation needed how free_space and available_data are calculated number of avail data in circular buffer may be calculated as: data_avail = write_offset - read_offset and check for wrap around if (data_avail < 0) data_avail = buffer_size + data_avail The problem is when write_offset == read_offset, !!! it may mean either that the buffer is empty or the buffer is completely filled !!! To solve the above issue having only 2 variables mentioned before: - allow both offsets to point from 0 to DOUBLE buffer_size - when calculating pointers to data, use: data_bufer[offset % buffer_size] - use double buffer size in wrap around check when calculating available data And now: - write_offset == read_offset always means "buffer empty" - write_offset == read_offset + buffer_size always means "buffer full" - data_avail = write_offset - read_offset if (data_avail < 0) data_avail = 2 * buffer_size + data_avail Signed-off-by: Marcin Szkudlinski <[email protected]>
03fcb12
to
d98ed58
Compare
all green |
if (max_ibs_obs % min_ibs_obs == 0) | ||
dp_queue->data_buffer_size = 2 * max_ibs_obs; | ||
else | ||
dp_queue->data_buffer_size = 3 * max_ibs_obs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand this calculation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I should reconsider 3x as it was estimated in first implementation when data needed to be passed in chunks
of cacheline size.
Now I went trough the same example I was analyzing before and probably 3* is not needed anymore
Will double check it and compare with reference FW and probably remove in next commit
remainint comments can be addressed in a follow-up
CI is good. Zephyr mainline CIs fail due to zephyrproject-rtos/zephyr#62464 @marcinszkudlinski @lyakh let's address the size_t/int variables, and if needed, clarify the rationale for 2/3x buffer sizing, in a follow-up PR. Thanks all for reviews! |
re-inroducing DP queue. This PR contains DpQueue code only
dependencies:
some sparse warning are reported now till RFC changes are implemented
DP queue is a lockless circular buffer
providing safe consumer/producer cached operations cross cores
prerequisites:
max chunk sizes they want to use (IBS/OBS)
required Buffer size:
- 2MAX(IBS,OBS) if the larger of IBS/OBS
is multiplication of smaller
- 3MAX(IBS,OBS) otherwise
The queue may work in 2 modes
local mode
in case both receiver and sender are located on
the same core and cache coherency
does not matter. dp_queue structure is located in cached memory
In this case DP Queue is a simple ring buffer
shared mode
In this case we need to writeback cache when new data
arrive and invalidate cache on secondary core.
dp_queue structure is located in shared memory
dpQueue is a lockless consumer/producer safe buffer.
It is achieved by having only 2 shared variables:
write_offset - can be modified by data producer only
read_offset - can be modified by data consumer only
as 32 bit operations are atomic, it is multi-thread and multi-core save
There some explanation needed how free_space and
available_data are calculated
number of avail data in circular buffer may be calculated as:
data_avail = write_offset - read_offset
and check for wrap around
if (data_avail < 0) data_avail = buffer_size + data_avail
The problem is when write_offset == read_offset,
!!! it may mean either that the buffer is empty
or the buffer is completely filled !!!
To solve the above issue having only 2 variables mentioned before:
data_bufer[offset % buffer_size]
available data
And now:
write_offset == read_offset
always means "buffer empty"
write_offset == read_offset + buffer_size
always means "buffer full"
data_avail = write_offset - read_offset
if (data_avail < 0) data_avail = 2 * buffer_size + data_avail