Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid recompiling the kernel when the workload changes #3

Open
dineiar opened this issue Feb 26, 2024 · 0 comments
Open

Avoid recompiling the kernel when the workload changes #3

dineiar opened this issue Feb 26, 2024 · 0 comments

Comments

@dineiar
Copy link
Member

dineiar commented Feb 26, 2024

Currently, the kernel code generation is bound to a specific workload size (dimensions) here. So, whenever the dimensions change, the code is regenerated and recompiled.

@gabriellaraujo1903 reported a substantial overhead of this behavior:

When the workload size changes, GSParLib recompiles the GPU kernel.

For instance, suppose we execute a vector sum where the vector's size is 10,000; Then we run another vector sum where the vector's size is 50,000. In this case, GSParLib will recompile the GPU kernel.

This behaviour imposes a performance degradation when a GPU kernel is executed several times, and the workload size continuously changes.

It occurs in the MG program from NPB. It is an iterative program where the GPU kernels are called thousands of times, and the workload varies continuously. Recompiling, in this case, imposes a considerable performance degradation; GPU execution time can be even worse than the serial code.

On the other hand, CUDA does not require a GPU kernel recompilation when changing the workload size. If I'm correct, only batching would require recompilation; I do not remember right now.

This issue aims to avoid recompiling the kernel when the workload changes. We need to investigate if we can reuse the code when the workload changes.

The workload is passed as an argument in the kernel launch, so maybe we just need to remove the extra compilation step. The code probably needs to be recompiled when dimensions (x, y, z) are added or removed, so the main aim of this issue is to avoid recompiling when the workload size changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant