PUSCH receiver kernels #108

mbertuletti · 2024-04-25T12:50:29Z

Changelog

Added

FFT f16
Complex matmul f16, complex matmul fixed-point 16
Cholesky decomposition f16, Cholesky decomposition fixed-point 16
Linear system solution f16, Linear system solution fixed-point 16
MMSE application f16, MMSE application fixed-point 16
OFDM application f16

Checklist

Automated tests pass
Changelog updated
Code style guideline is observed

Please check our contributing guidelines before opening a Pull Request.

yichao-zh

Hi Marco, this is very good merge request. I reviewed files and there almost nothing really need to futher changes, only very few suggestions. And also, something maybe we can do together with this merge:

For the MatMul kernels. I saw the "conflict optimization" kernel (which I did during the PUSCH DATE) was included in matmul_f32, but it is not included the integar kernels, could you also help to copy-paste into the integar kernel file?
Dont forget to update the "change log" file.

Thats all! Thanks!

yichao-zh · 2024-12-10T03:01:37Z

software/apps/baremetal/axpy_f16/main.c

I think it is better to add a "#define local_parallel" instead of commenting the others out.

Okay, thanks.

yichao-zh · 2024-12-10T03:03:30Z

software/apps/baremetal/axpy_f32/main.c

Same at this kernel, if we keep all of the kernels in this "main.c", we should add #define. Otherwise, we only leave one kernel instead of commenting others out. What do you think?

[Lint] Fix format error

SamuelRiedel · 2024-12-13T15:00:10Z

software/apps/baremetal/Makefile

-ALL_GCC := $(filter-out matmul_f16 matmul_f32, $(ALL))
-ALL_LLVM := $(filter-out synth_i32 chest_q16 cfft_radix2_q16 cfft_radix4_q16, $(ALL))
+FP_APPS := axpy_f16 axpy_f32
+FP_APPS += cfft_radix4_f16 chest_f16 cholesky_f16
+FP_APPS += cmatmul_f16 matmul_f16 matmul_f32
+FP_APPS += dotp_f16 dotp_f32
+FP_APPS += mimo_mmse_f32 mimo_mmse_f16 mimo_mmse_f8 ofdm_f16
+
+I_APPS := synth_i32
+I_APPS += cfft_radix2_q16 cfft_radix4_q16 chest_q16 cholesky_q16 cholesky_q32
+I_APPS += cmatmul_q16 mimo_mmse_q16
+
+ALL_GCC := $(filter-out $(FP_APPS), $(ALL))
+ALL_LLVM := $(filter-out $(I_APPS), $(ALL))


Since we have the convention of adding the i32/f16/... suffix, we could easily automatically find all FP and I apps with a wildcard, right?

SamuelRiedel · 2024-12-13T15:08:28Z

software/apps/baremetal/mimo_mmse_f16/main.c

+  // Check the result
+  if (core_id == 0) {
+    for (uint32_t i = 0; i < 2 * N_TX * N_ITR; i++) {
+      uint32_t x = (*(uint32_t *)&l1_x[i]) & 0x0000FFFF;


Why do we do this in Banshee?

Thanks for noticing, this comes from the work on end-to-end MIMO decoding. This is how I extracted the results for Monte Carlo simulation with Banshee.

SamuelRiedel · 2024-12-13T15:10:19Z

software/apps/baremetal/mimo_mmse_f16/main.c

+
+#else
+#define N_ROUNDS (1)
+#define DMA_TRANSFER1


In general, there are a lot of defines and parameters like this that are not documented. What happens if we don't have this define?

I will remove unuseful code and document the defines.

SamuelRiedel · 2024-12-13T15:15:24Z

software/apps/baremetal/mimo_mmse_f8/main.c

+#ifndef BANSHEE
+  uint32_t num_cores = mempool_get_core_count();
+  mempool_barrier_init(core_id); // Initialize barrier and synchronize
+#endif


That we have so much special treatment of Banshee is a bit bad. What is missing in Banshee to make this work? I see that the DMA is missing, but that we could also 'hide' in the DMA runtime functions instead of like it's done here. And why are the barriers treated differently?

Thank you for noticing, this can be removed here. The special treatment of barriers comes from the work on the end-to-end MIMO to run single-core Monte Carlo simulations with Banshee.

SamuelRiedel · 2024-12-15T21:33:53Z

software/runtime/runtime.mk

@@ -26,7 +26,8 @@ DATA_DIR           ?= $(abspath $(ROOT_DIR)/../data)

 COMPILER      ?= gcc
 XPULPIMG      ?= $(xpulpimg)
-ZFINX      		?= $(zfinx)
+ZFINX         ?= $(zfinx)
+XDIVSQRT	  ?= $(xDivSqrt)


Indentation

Thanks for pointing out

yichao-zh · 2024-12-17T09:23:18Z

software/kernels/baremetal/mempool_matmul_f32.h

Just found an issue this morning. As we are using LLVM to compile the floating point kernel, the ASM syntax is slightly difference from integer ASM in GCC. For example, starting from line 460 (Sorry that I cannot comment on it inline), the branch-jump is defined like "init_comp/inner_loop/store", however, LLVM doesn't support the use of "text" to define the jump location, so we should change it to "10f->10, 11f->11, 12f->12" (pay attention: a. the variable "1" was already used; b. The branch asm need to add "f" following the jump "number").

Thanks for fixing this in advance!

mbertuletti requested review from SamuelRiedel, sermazz and yichao-zh as code owners April 25, 2024 12:50

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 6 times, most recently from 3fae5dd to d7c809d Compare April 25, 2024 13:55

mbertuletti force-pushed the mbertuletti/mimo_receiver branch from d7c809d to b9ff155 Compare July 5, 2024 14:58

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 5 times, most recently from 21bbf18 to 4ba934c Compare August 27, 2024 14:51

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 6 times, most recently from 26dacc3 to bf0e68d Compare September 5, 2024 12:22

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 4 times, most recently from d1605f1 to a1bc514 Compare September 13, 2024 06:59

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 2 times, most recently from c7c4ea6 to 8d5ad46 Compare October 16, 2024 09:20

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 3 times, most recently from e11e157 to 0f9df05 Compare October 28, 2024 13:19

mbertuletti added 3 commits December 6, 2024 11:57

[software] Remove load of che inputs from inner loop

0e6b37c

[software] Add shuffle instruction in cfft_radix4_f16

0e3894f

[software] Clean-up complex matmuls

f0270f8

mbertuletti force-pushed the mbertuletti/mimo_receiver branch from 5010e69 to 916a29a Compare December 6, 2024 10:57

yichao-zh reviewed Dec 10, 2024

View reviewed changes

mbertuletti and others added 8 commits December 10, 2024 10:43

[software] Add f32 and f16 dotp/axpy kernels

5f3c750

[software] Clean-up data transfers in mimo_mmse_f16

33701fa

[software] Add mimo_mmse_f16 with fcdotp extensions

f0570a5

[software] Add mimo_mmse_f8 kernels

5984c35

[software] Clean up folded mimo_mmse_f16 and Ltrisol_f16

0596309

[software] Adapt generation of data to #PR103

bbab0ca

[github] Change Ubuntu version to 22.04

3b5886b

[software] Add matmul kernel with the conflict optimization scheme

3ea70e0

[Lint] Fix format error

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 4 times, most recently from 82c5e0b to c2db140 Compare December 10, 2024 14:01

mbertuletti added 2 commits December 10, 2024 17:00

[software] Move the port-conflict optimized matmul to matmul_i32p

5bee548

Update CHANGELOG.md

0f1de6f

mbertuletti force-pushed the mbertuletti/mimo_receiver branch from c2db140 to 0f1de6f Compare December 10, 2024 16:00

SamuelRiedel reviewed Dec 15, 2024

View reviewed changes

yichao-zh reviewed Dec 17, 2024

View reviewed changes

mbertuletti force-pushed the mbertuletti/mimo_receiver branch 5 times, most recently from 191563b to 9594bd6 Compare December 19, 2024 13:44

mbertuletti added 2 commits December 19, 2024 14:52

[software] Add explanation for the use of defines

c53ec74

[software] Cross-out defines for Banshee Monte-Carlo simulation

264879e

mbertuletti force-pushed the mbertuletti/mimo_receiver branch from 9594bd6 to 264879e Compare December 19, 2024 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PUSCH receiver kernels #108

PUSCH receiver kernels #108

mbertuletti commented Apr 25, 2024

yichao-zh left a comment

yichao-zh Dec 10, 2024

mbertuletti Dec 10, 2024

yichao-zh Dec 10, 2024

mbertuletti Dec 10, 2024

SamuelRiedel Dec 13, 2024

SamuelRiedel Dec 13, 2024

mbertuletti Dec 19, 2024

SamuelRiedel Dec 13, 2024

mbertuletti Dec 19, 2024

SamuelRiedel Dec 13, 2024

mbertuletti Dec 19, 2024

SamuelRiedel Dec 15, 2024

mbertuletti Dec 19, 2024

yichao-zh Dec 17, 2024

PUSCH receiver kernels #108

Are you sure you want to change the base?

PUSCH receiver kernels #108

Conversation

mbertuletti commented Apr 25, 2024

Changelog

Added

Checklist

yichao-zh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment