Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sw: Restructuring of BLAS and DNN kernels #137

Merged
merged 53 commits into from
Jul 4, 2024
Merged

sw: Restructuring of BLAS and DNN kernels #137

merged 53 commits into from
Jul 4, 2024

Conversation

viv-eth
Copy link
Contributor

@viv-eth viv-eth commented May 8, 2024

This PR will introduce substantial changes to the SW library in the BLAS and DNN kernels.

GEMM

This PR comprises the following changes in the BLAS library:

  1. We moved to a generic impl function that will fetch the correct kernel from the config.json file. All of this is wrapped inside a gemm_args_t struct that holds all kernel-related data. The arguments in the struct are loaded into the cluster's TCDM to avoid costly DRAM accesses when calling the function. The roadmap is to adapt all kernels in the sw library to this format to improve performance.
  2. We added the option to run multiple tests in parallel.
  3. We aligned the tests to work with the new GEMM layout and added a naive FP16 kernel for verification.
  4. We added more assertions to the datagen script of unsupported or invalid configurations with the relevant error messages.
  5. We adjusted other kernels, such as fused_concat_linear to the new layout.

FlashAttention-2

  1. We added the transposition layer that is used inside the kernel.
  2. We added the FP16 and FP8 MiniFloat implementations.
  3. We added multiple configurations for testing.
  4. We aligned the kernel to work with the new GEMM layout.
  5. Several fixes to the kernel were integrated and casts from lower to higher precision and vice versa.

Treewide

This PR also includes some treewide changes.

  1. Remove all math library related sources, which are no longer needed after hw: Fix memory consistency between int and FP datapaths #90.

  2. Remove all uses of the $PYTHON environment variable, and instead consistently rely on shebang directives to pick up the active virtual environment's python executable.

@viv-eth viv-eth mentioned this pull request May 30, 2024
@viv-eth viv-eth force-pushed the dnn-additions branch 19 times, most recently from 016eb62 to da74393 Compare June 19, 2024 12:37
@viv-eth viv-eth marked this pull request as ready for review June 19, 2024 15:08
@fischeti fischeti changed the title [DRAFT] Restructering of BLAS and DNN kernels sw: Restructuring of BLAS and DNN kernels Jun 20, 2024
@viv-eth viv-eth force-pushed the dnn-additions branch 2 times, most recently from 390ef52 to 8ecb9e6 Compare June 28, 2024 14:03
@colluca colluca merged commit ce68d22 into main Jul 4, 2024
27 checks passed
@colluca colluca deleted the dnn-additions branch July 4, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants