Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple matmul on gemm accelerator #39

Merged
merged 16 commits into from
Dec 18, 2023
Merged

Conversation

JosseVanDelm
Copy link
Contributor

@JosseVanDelm JosseVanDelm commented Dec 5, 2023

Update: This PR just mainly puts everything in place (data layout and movement wise) to execute stuff on the gemm accelerator.

Right now it mostly bypasses MLIR stuff, because still missing:

  • support for more than 1D malloc
  • Support for matmul operation offloading
  • Support

WIP on adding a simple quantized matmul to this repository

func.func public @simple_matmul(%A: memref<64x64xi8>,
                                %B: memref<64x64xi8>,
                                %C: memref<64x64xi32>) -> () {
  %c_0 = arith.constant 0 : i32
  linalg.quantized_matmul ins(%A, %B, %c_0, %c_0: memref<64x64xi8>, memref<64x64xi8>, i32, i32)
  outs(%C: memref<64x64xi32>)
  return
}

Needs work:

  • figure out what parameters in set_batch_gemm can be extracted from the operation/memref itself and which ones we can assume to be hardcoded for now? most are still hardcoded because we are not working on 4d memrefs yet.
  • figure out a way such that N, M , K defines don't clash with the library name (maybe?)
  • add an MLIR example, although might require help from @jorendumoulin for that.
  • Add support for or remove entirely the call to setup the CSRs for the accelerator. accelerator not considered in this PR
  • Add golden model for datageneration
  • Fix automatic datageneration instead of using hardcoded values.
  • Add tests to CI

@JosseVanDelm JosseVanDelm self-assigned this Dec 5, 2023
@JosseVanDelm JosseVanDelm changed the title Josse/add simple matmul Add simple matmul on gemm accelerator Dec 5, 2023
@JosseVanDelm JosseVanDelm marked this pull request as ready for review December 14, 2023 12:56
Copy link
Contributor

@jorendumoulin jorendumoulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Some minor comments.
I mostly don't like how we use custom variables and libraries from another repo without clearly referencing their location, but I don't know what is a better solution

Comment on lines 55 to 58
C_golden = np.matmul(A.astype(np.dtype("int32")), B.astype(np.dtype("int32")))
C = np.zeros(C_golden.shape, np.dtype("int32"))

assert A.shape[1] == B.shape[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert must come before the np.matmul computation. If the shapes do not correspond, the matmul would fail anyway, making the assert useless


# C = A.B
A = np.random.randint(low_bound, high_bound, size=A_size, dtype=np.dtype("int8"))
# A = np.ones(A_size, dtype=np.dtype("int8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

A = np.random.randint(low_bound, high_bound, size=A_size, dtype=np.dtype("int8"))
# A = np.ones(A_size, dtype=np.dtype("int8"))
B = np.random.randint(low_bound, high_bound, size=B_size, dtype=np.dtype("int8"))
# B = np.ones(B_size, dtype=np.dtype("int8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MLIR docs show quite a nice way to have only one memref struct specification for all types, in C++

template<typename T, size_t N>
struct MemRefDescriptor {
  T *allocated;
  T *aligned;
  intptr_t offset;
  intptr_t sizes[N];
  intptr_t strides[N];
};

do you know if something similar is possible in C? if not, I guess this is fine for now

Copy link
Contributor Author

@JosseVanDelm JosseVanDelm Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible, but it looks a bit complicated https://isocpp.org/wiki/faq/mixing-c-and-cpp , let's take this for a future PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good!

Comment on lines 3 to 4
#include "snax-gemm-lib.h"
#include "snax-gemm-params.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it feels kind of weird to include header files from another repository without specifying where they come from 😕 . I guess it also doesn't make sense to have them duplicate... Is there a way to share this in a better way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not, maybe make it clear with a comment where these files come from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Clear like this?

Comment on lines 81 to 82
memrefA.stride[0] = sizeof(int8_t);
memrefA.stride[1] = sizeof(int8_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to use a standard layout, this would also not be correct, but rather:

Suggested change
memrefA.stride[0] = sizeof(int8_t);
memrefA.stride[1] = sizeof(int8_t);
memrefA.stride[0] = sizeof(int8_t);
memrefA.stride[1] = sizeof(int8_t) * M_size;

Maybe just set them to 0 to make it very clear we are not using them.

memrefA.aligned_data = memrefA.data;
memrefA.shape[0] = M_size;
memrefA.shape[1] = K_size;
// These are not considered correctly right now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment not very clear.

Suggested change
// These are not considered correctly right now
// Strides are not used due to the tiled-block layout.
// Instead we use the variables strideInnermostA, ldA and strideA

kernels/simple_matmul/main.c Outdated Show resolved Hide resolved
kernels/simple_matmul/main.c Outdated Show resolved Hide resolved
kernels/simple_mult/main.c Outdated Show resolved Hide resolved
Copy link
Contributor

@jorendumoulin jorendumoulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!!

@@ -55,6 +55,7 @@ int main() {

int nerr = 0;
for (int i = 0; i < N; i++) {
printf("result: %d golden: %d\n", memrefD.aligned_data[i], G[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To delete then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make a seperate PR for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait no

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's gone now haha

@JosseVanDelm JosseVanDelm merged commit f44b93a into main Dec 18, 2023
3 checks passed
@JosseVanDelm JosseVanDelm deleted the Josse/add-simple-matmul branch February 12, 2024 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants