-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rope2d #75
Comments
Just to confirm, do you mean an example where rope is fused into FlashAttention as opposed to hows it done in SAM2 where q,k are done prior and then ran with Flash? |
Yes, Does this fit in the flex API or not? |
This currently does not fit within the Flex API since this is typically implemented by pre-mutating Q and K where we don't provide any ways to mutate QK before the dot product operation. |
Is it in the roadmap? |
Not currently, from what I know fusion ends up not being beneficial in training can be beneficial for memory bound cases in decoding I will leave this open though I think we have a few other things that are high priority, like learnable biases that I am working on but will think about how this can be supported |
Do you have some alternative SOTA 2d learnable bias in the roadmap? |
This is the 1d version but it could be interesting |
Can you add an example about Rope2d as in META Sam2 https://github.com/facebookresearch/sam2/blob/main/sam2/modeling/sam/transformer.py#L289
The text was updated successfully, but these errors were encountered: