Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation]: Shall we modify the configurations in v2python for the other kernels? #12

Open
xinji1 opened this issue Apr 2, 2024 · 5 comments

Comments

@xinji1
Copy link

xinji1 commented Apr 2, 2024

Description of errors

In current Readme.md, we need to change the configurations in python/rules.py, but actually we need to make our own folders like /v2python/rules/flash/ right?

Attach any links, screenshots, or additional evidence you think will be helpful.

No response

@xinyazhang
Copy link
Collaborator

Yes, the current steps to add new kernels are

  1. Add Triton kernel file to tritonsrc
  2. Add Rules files that define KernelDescription objects for these new kernels under v2python/rules/<new kernel family>
    • Do not add files under directory v2python/rules directly, it is unlikely the new Triton source only contains one kernel.
  3. Update v2python/rules/__init__.py to add the new KernelDescription objects to v2python.rules.kernels variable.

(Note: Keep this Issue open until the README.md is updated)

@xinji1
Copy link
Author

xinji1 commented Apr 10, 2024

Thanks for the reply! Another little question here: what should we do when it comes to the v2python/rules/[my_own_kernel]/tune-kernel-gfx90a.json? I just found that you provide quite a lot tuning templetes for different kernerls. Specifically,

  1. is it related to the final result? i mean if i could get the final .so even with an empty tune-kernel-gfx90a.json?
  2. If not, how many configures should i provide?
  3. I found that not all of input variables are necessary for the corresponding kernel (like N_HEADS and D_HEAD for attn_fwd), so which parameter is necessary in this .json file (i appreciate if you could take flash.attn_fwd as an example)?
  4. in you "attn_fwd" part of this .json file, you only provide one tensor's shape and dtype. Does it mean that i need to give other tensors' metainfo if they are not in the same shape/dtype?
    image

@xinyazhang
Copy link
Collaborator

is it related to the final result? i mean if i could get the final .so even with an empty tune-kernel-gfx90a.json?

Theoretically you can, but it's an untested code path right now.

The json file we called "turing database" in internal slides (nothing secret, just unsuitable to release as part of source code Repos). The tuning database did not exist until commit 099141a, and before that you need to specify the PERF_CHOICES manually in KernelDesciption subclasses.

However, since the introduction of the tuning database, it somehow becomes the central of the build process and guides AOTriton how to dispatch user inputs to actual GPU kernels, and the original approach is not well tested since its performance is abysmal (could be 10x slower)

If not, how many configures should i provide?

This totally depends on your needs, and actual kernel you want to compile. The tuning database is an AOT version of @triton.autotune, and the actual configurations used can be found in tritonsrc/attn_torch_function.py. Meanwhile tritonsrc/tune_flash.py provides a list of (seqlen_q, seqlen_k, D_HEAD, Q_dtype, ...) to probe the optimal configurations.

Note, Triton kernel compiled with certain configurations can run without segfaults or other runtime errors, but will not give your correct results. See tritonsrc/attn_torch_function.py for a possible solution.

I found that not all of input variables are necessary for the corresponding kernel (like N_HEADS and D_HEAD for attn_fwd), so which parameter is necessary in this .json file (i appreciate if you could take flash.attn_fwd as an example)?

The JSON version is a little bit verbose. However, we have replaced the JSON with SQLite3 database.

You can check the UNIQUE constraints about what columns are used to locate tuning database entries with sqlite3 v2python/rules/tuning_database.sqlite3 '.schema'.

in you "attn_fwd" part of this .json file, you only provide one tensor's shape and dtype. Does it mean that i need to give other tensors' metainfo if they are not in the same shape/dtype?

No, you don't need to. The current database already described all tensors' shapes and dtypes.
You can take a look at v2python/rules/flash/attn_fwd.py about the constraints among tensors.

More specifically, for attn_fwd kernel, QKVO have the same dtype, and roughly the same shape sans seqlen_q vs seqlen_k. The seqlen_q and seqlen_k are already provided separately.

@xinji1
Copy link
Author

xinji1 commented Apr 30, 2024

Thanks for your reply!
Another question here: according to this, rocm/triton will be deprecated soon? Will AOTriton turn to support openai/triton ?

@xinyazhang
Copy link
Collaborator

Will AOTriton turn to support openai/triton ?

Certainly we will, but the migration takes time and extensive regression tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants