Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline 2.0: Introducing Data Processing Queue #8020

Merged
merged 2 commits into from
Sep 12, 2023

Conversation

marcinszkudlinski
Copy link
Contributor

@marcinszkudlinski marcinszkudlinski commented Aug 9, 2023

re-inroducing DP queue. This PR contains DpQueue code only

dependencies:

some sparse warning are reported now till RFC changes are implemented

DP queue is a lockless circular buffer
providing safe consumer/producer cached operations cross cores

prerequisites:

  1. incoming and outgoing data rate MUST be the same
  2. Both data consumer and data producer declare
    max chunk sizes they want to use (IBS/OBS)

required Buffer size:
- 2MAX(IBS,OBS) if the larger of IBS/OBS
is multiplication of smaller
- 3
MAX(IBS,OBS) otherwise

The queue may work in 2 modes

  1. local mode
    in case both receiver and sender are located on
    the same core and cache coherency
    does not matter. dp_queue structure is located in cached memory
    In this case DP Queue is a simple ring buffer

  2. shared mode
    In this case we need to writeback cache when new data
    arrive and invalidate cache on secondary core.
    dp_queue structure is located in shared memory

dpQueue is a lockless consumer/producer safe buffer.
It is achieved by having only 2 shared variables:

write_offset - can be modified by data producer only
read_offset - can be modified by data consumer only

as 32 bit operations are atomic, it is multi-thread and multi-core save

There some explanation needed how free_space and
available_data are calculated

number of avail data in circular buffer may be calculated as:
data_avail = write_offset - read_offset
and check for wrap around
if (data_avail < 0) data_avail = buffer_size + data_avail

The problem is when write_offset == read_offset,
!!! it may mean either that the buffer is empty
or the buffer is completely filled !!!

To solve the above issue having only 2 variables mentioned before:

  • allow both offsets to point from 0 to DOUBLE buffer_size
  • when calculating pointers to data, use:
    data_bufer[offset % buffer_size]
  • use double buffer size in wrap around check when calculating
    available data

And now:

  • write_offset == read_offset
    always means "buffer empty"

  • write_offset == read_offset + buffer_size
    always means "buffer full"

  • data_avail = write_offset - read_offset
    if (data_avail < 0) data_avail = 2 * buffer_size + data_avail

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good, but we do need to prioritze MCPS over partitioning of code between private and public API calls.

src/audio/dp_queue.c Outdated Show resolved Hide resolved
src/audio/dp_queue.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The issue with spinlock with non-shared use needs to fixed, but otherwise seems good to go.

src/audio/dp_queue.c Outdated Show resolved Hide resolved
src/audio/dp_queue.c Outdated Show resolved Hide resolved
src/audio/dp_queue.c Show resolved Hide resolved
@marcinszkudlinski
Copy link
Contributor Author

@lgirdwood
Probably a lock like below will need to be used in the system several times. Maybe it is a good idea to export it to a library?

union lock_key {
	k_spinlock_key_t spinlock;
	int irq_lock;
};

static inline union lock_key dp_queue_lock(struct dp_queue *dp_queue)
{
	/* use cross-core spinlock in case of shared queue
	 * When shared, dp_queue structure is located in not cached memory
	 * as required by spinlock
	 */
	union lock_key key;

	if (dp_queue_is_shared(dp_queue))
		key.spinlock = k_spin_lock(&dp_queue->lock);
	else
		/* use faster irq_lock in case of not shared queue (located in cached mem) */
		key.irq_lock = irq_lock();

	return key;
}

static inline void dp_queue_unlock(struct dp_queue *dp_queue, union lock_key key)
{
	if (dp_queue_is_shared(dp_queue))
		k_spin_unlock(&dp_queue->lock, key.spinlock);
	else
		irq_unlock(key.irq_lock);
}

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment w.r.t. locks inline, but not blocking this PR. My immediate concerns addressed, so +1 to this.

union lock_key {
k_spinlock_key_t spinlock;
int irq_lock;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes a bit out of the scope of this PR, but given we have very little use of locking in sof/src/audio (mostly in the DAI code, chain-dma and kpb), this is worth a discussion.

Given we might want to run some of the audio modules in unprivileged code, having code that uses spinlocks or irqlocks is not really future proof. So I wonder if we should just move to mutexes in places like this, which would be available both for user-space and kernel threads.

This does incur a perf penalty, and might cause problems as we still have non-Zephyr SOF users who run audio components in IRQ context, so this is probably not something we can do now, but I'd say this should be the longterm solution for audio code.

src/audio/dp_queue.c Outdated Show resolved Hide resolved
@marcinszkudlinski marcinszkudlinski force-pushed the DP2-dp_queue branch 2 times, most recently from b7e91cf to 1239780 Compare August 16, 2023 06:34
src/audio/dp_queue.c Outdated Show resolved Hide resolved
marcinszkudlinski added a commit to marcinszkudlinski/sof that referenced this pull request Aug 17, 2023
@lgirdwood
Copy link
Member

@marcinszkudlinski I'm seeing a CI build error with Zephyr

ccache /home/cwd_user/zephyr-sdk-0.16.1/xtensa-intel_ace15_mtpm_zephyr-elf/bin/xtensa-intel_ace15_mtpm_zephyr-elf-gcc -DCC_OPTIMIZE_FLAGS=\"-O2\" -DKERNEL -DXCC_TOOLS_VERSION=\"zephyr\" -D__ZEPHYR__=1 -DRELATIVE_FILE=\"../sof/src/audio/dp_queue.c\" -I/zep_workspace/zephyr/include -I/zep_workspace/build-mtl/zephyr/include/generated -I/zep_workspace/zephyr/soc/xtensa/intel_adsp/ace -I/zep_workspace/zephyr/soc/xtensa/intel_adsp/ace/include -I/zep_workspace/zephyr/soc/xtensa/intel_adsp/ace/include/intel_ace15_mtpm -I/zep_workspace/zephyr/soc/xtensa/intel_adsp/common/include -I/zep_workspace/zephyr/drivers -I/zep_workspace/sof/zephyr/include -I/zep_workspace/sof/src/platform/intel/ace/include -I/zep_workspace/sof/src/platform/meteorlake/include -I/zep_workspace/sof/src/include/sof/audio/module_adapter/iadk -I/zep_workspace/sof/src/include/sof/audio/module_adapter/library -I/zep_workspace/modules/hal/xtensa/include -I/zep_workspace/modules/hal/xtensa/XTENSA_HAL -I/zep_workspace/modules/hal/xtensa/zephyr/soc/intel_ace15_mtpm -I/zep_workspace/sof/rimage/src/include -I/zep_workspace/sof/src/include -I/zep_workspace/sof/src/arch/xtensa/include -I/zep_workspace/sof/third_party/include -I/zep_workspace/sof/xtos/include -isystem /zep_workspace/zephyr/lib/libc/minimal/include -isystem /opt/toolchains/zephyr-sdk-0.16.1/xtensa-intel_ace15_mtpm_zephyr-elf/bin/../lib/gcc/xtensa-intel_ace15_mtpm_zephyr-elf/12.2.0/include -isystem /opt/toolchains/zephyr-sdk-0.16.1/xtensa-intel_ace15_mtpm_zephyr-elf/bin/../lib/gcc/xtensa-intel_ace15_mtpm_zephyr-elf/12.2.0/include-fixed -fno-strict-aliasing -O2 -imacros /zep_workspace/build-mtl/zephyr/include/generated/autoconf.h -ffreestanding -fno-common -g -gdwarf-4 -fdiagnostics-color=always --sysroot=/home/cwd_user/zephyr-sdk-0.16.1/xtensa-intel_ace15_mtpm_zephyr-elf/xtensa-intel_ace15_mtpm_zephyr-elf -imacros /zep_workspace/zephyr/include/zephyr/toolchain/zephyr_stdint.h -Wall -Wformat -Wformat-security -Wno-format-zero-length -Wno-pointer-sign -Wpointer-arith -Wexpansion-to-defined -Wno-unused-but-set-variable -Werror=implicit-int -fno-pic -fno-pie -Werror -fno-asynchronous-unwind-tables -fstrict-overflow -fno-reorder-functions --param=min-pagesize=0 -fno-defer-pop -fmacro-prefix-map=/zep_workspace/sof/app=CMAKE_SOURCE_DIR -fmacro-prefix-map=/zep_workspace/zephyr=ZEPHYR_BASE -fmacro-prefix-map=/zep_workspace=WEST_TOPDIR -ffunction-sections -fdata-sections -mlongcalls -DXTENSA_TOOLCHAIN_VARIANT=1 -std=c99 -nostdinc -fno-inline-functions -std=gnu99 -MD -MT modules/sof/CMakeFiles/modules_sof.dir/zep_workspace/sof/src/audio/dp_queue.c.obj -MF modules/sof/CMakeFiles/modules_sof.dir/zep_workspace/sof/src/audio/dp_queue.c.obj.d -o modules/sof/CMakeFiles/modules_sof.dir/zep_workspace/sof/src/audio/dp_queue.c.obj -c /zep_workspace/sof/src/audio/dp_queue.c
/zep_workspace/sof/src/audio/dp_queue.c: In function 'dp_queue_create':
/zep_workspace/sof/src/audio/dp_queue.c:344:9: error: implicit declaration of function 'sink_set_obs'; did you mean 'sink_set_rate'? [-Werror=implicit-function-declaration]
  344 |         sink_set_obs(&dp_queue->sink_api, ibs);
      |         ^~~~~~~~~~~~
      |         sink_set_rate
/zep_workspace/sof/src/audio/dp_queue.c:345:9: error: implicit declaration of function 'source_set_ibs'; did you mean 'source_set_rate'? [-Werror=implicit-function-declaration]
  345 |         source_set_ibs(&dp_queue->source_api, obs);
      |         ^~~~~~~~~~~~~~
      |         source_set_rate
cc1: all warnings being treated as errors
[227/378] Building C object modules/sof/CMakeFiles/modules_sof.dir/lib/alloc.c.obj
[228/378] Building C object modules/sof/CMakeFiles/modules_sof.dir/lib/cpu.c.obj

@marcinszkudlinski
Copy link
Contributor Author

github.com/#8002
dependency

@marcinszkudlinski
Copy link
Contributor Author

rebased after #8002 merge

@kv2019i
Copy link
Collaborator

kv2019i commented Aug 18, 2023

@marcinszkudlinski Is #8006 still a hard dependency (like in description), or is this ok to merge once reviews-and-testing is ok?

lyakh
lyakh previously requested changes Aug 21, 2023
src/audio/dp_queue.c Show resolved Hide resolved
src/audio/dp_queue.c Outdated Show resolved Hide resolved
src/audio/dp_queue.c Outdated Show resolved Hide resolved
src/audio/dp_queue.c Show resolved Hide resolved
src/audio/dp_queue.c Show resolved Hide resolved
src/audio/dp_queue.c Outdated Show resolved Hide resolved
src/include/sof/audio/dp_queue.h Outdated Show resolved Hide resolved
src/include/sof/audio/dp_queue.h Outdated Show resolved Hide resolved
src/include/sof/audio/dp_queue.h Outdated Show resolved Hide resolved
src/include/sof/audio/dp_queue.h Outdated Show resolved Hide resolved
@marcinszkudlinski
Copy link
Contributor Author

All requested changes addressed

  1. inline functions for performance
  2. rebased on [DNM] Direct use of cached / not-cached shared memory alias instead of coherent.h in case of buffers #8106 to avoid rebase problems

PLEASE SKIP 1st COMMIT IN REVIEW

Copy link
Collaborator

@RanderWang RanderWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marcinszkudlinski marcinszkudlinski changed the title pipeline 2.0: Introducing Data Processing Queue [DNM] pipeline 2.0: Introducing Data Processing Queue Sep 6, 2023
@marcinszkudlinski marcinszkudlinski changed the title [DNM] pipeline 2.0: Introducing Data Processing Queue pipeline 2.0: Introducing Data Processing Queue Sep 11, 2023
@marcinszkudlinski marcinszkudlinski force-pushed the DP2-dp_queue branch 3 times, most recently from e2b7706 to ede4956 Compare September 11, 2023 13:00
@marcinszkudlinski
Copy link
Contributor Author

all deps resolved, rebased on newest main

@lgirdwood
Copy link
Member

all deps resolved, rebased on newest main

@marcinszkudlinski Can you check in the CI results. Thanks !

@marcinszkudlinski
Copy link
Contributor Author

all deps resolved, rebased on newest main

@marcinszkudlinski Can you check in the CI results. Thanks !

looks like there's no results:
https://sof-ci.01.org/sof-pr-viewer/#/build/PR8020/build12764971

pushing an empty commit (rebase to newest main) to retrigger

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One extra merge-conflict bit in the commit message, can you clean up @marcinszkudlinski . Otherwise I can re-confirm my +1 for this.

src/audio/audio_stream.c Show resolved Hide resolved
src/audio/dp_queue.c Outdated Show resolved Hide resolved
Data provided by source interface cannot be modified in any way
by the module using source API.
Mark pointers as const

Signed-off-by: Marcin Szkudlinski <[email protected]>
DP queue is a lockless circular buffer
providing safe consumer/producer cached operations cross cores

prerequisites:
 1) incoming and outgoing data rate MUST be the same
 2) Both data consumer and data producer declare
    max chunk sizes they want to use (IBS/OBS)

required Buffer size:
	- 2*MAX(IBS,OBS) if the larger of IBS/OBS
          is multiplication of smaller
	- 3*MAX(IBS,OBS) otherwise

The queue may work in 2 modes
1) local mode
   in case both receiver and sender are located on
   the same core and cache coherency
   does not matter. dp_queue structure is located in cached memory
   In this case DP Queue is a simple ring buffer

2) shared mode
   In this case we need to writeback cache when new data
   arrive and invalidate cache on secondary core.
   dp_queue structure is located in shared memory

dpQueue is a lockless consumer/producer safe buffer.
It is achieved by having only 2 shared variables:

 write_offset - can be modified by data producer only
 read_offset - can be modified by data consumer only

 as 32 bit operations are atomic, it is multi-thread and multi-core save

There some explanation needed how free_space and
available_data are calculated

number of avail data in circular buffer may be calculated as:
	data_avail = write_offset - read_offset
  and check for wrap around
	if (data_avail < 0) data_avail = buffer_size + data_avail

The problem is when write_offset == read_offset,
!!! it may mean either that the buffer is empty
    or the buffer is completely filled !!!

To solve the above issue having only 2 variables mentioned before:
 - allow both offsets to point from 0 to DOUBLE buffer_size
 - when calculating pointers to data, use:
         data_bufer[offset % buffer_size]
 - use double buffer size in wrap around check when calculating
   available data

And now:
  - write_offset == read_offset
		always means "buffer empty"
  - write_offset == read_offset + buffer_size
		always means "buffer full"

  - data_avail = write_offset - read_offset
	if (data_avail < 0) data_avail = 2 * buffer_size + data_avail

Signed-off-by: Marcin Szkudlinski <[email protected]>
@marcinszkudlinski
Copy link
Contributor Author

all green

src/include/sof/audio/dp_queue.h Show resolved Hide resolved
src/audio/dp_queue.c Show resolved Hide resolved
src/audio/dp_queue.c Show resolved Hide resolved
src/audio/dp_queue.c Show resolved Hide resolved
if (max_ibs_obs % min_ibs_obs == 0)
dp_queue->data_buffer_size = 2 * max_ibs_obs;
else
dp_queue->data_buffer_size = 3 * max_ibs_obs;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand this calculation...

Copy link
Contributor Author

@marcinszkudlinski marcinszkudlinski Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I should reconsider 3x as it was estimated in first implementation when data needed to be passed in chunks
of cacheline size.
Now I went trough the same example I was analyzing before and probably 3* is not needed anymore

Will double check it and compare with reference FW and probably remove in next commit

@lyakh lyakh dismissed their stale review September 12, 2023 12:02

remainint comments can be addressed in a follow-up

@kv2019i
Copy link
Collaborator

kv2019i commented Sep 12, 2023

CI is good. Zephyr mainline CIs fail due to zephyrproject-rtos/zephyr#62464

@marcinszkudlinski @lyakh let's address the size_t/int variables, and if needed, clarify the rationale for 2/3x buffer sizing, in a follow-up PR. Thanks all for reviews!

@kv2019i kv2019i merged commit bec031a into thesofproject:main Sep 12, 2023
37 of 41 checks passed
@marcinszkudlinski marcinszkudlinski deleted the DP2-dp_queue branch September 26, 2023 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dp_scheduler MTL Applies to Meteor Lake platform pipeline2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants