-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sdpa for Beit #34941
Add sdpa for Beit #34941
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see you in another sdpa-PR @OmarManzoor! Thanks for working on the implementation 👍 Overall looking great, just a few notes:
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I'm getting the following values for benchmark
BeitForImageClassification
Image batch size | Eager (s/iter) | Eager CI, % | SDPA (s/iter) | SDPA CI, % | SDPA speedup |
---|---|---|---|---|---|
1 | 0.013 | ±0.2% | 0.011 | ±0.5% | 1.122 |
4 | 0.013 | ±0.2% | 0.011 | ±0.1% | 1.122 |
16 | 0.026 | ±0.1% | 0.021 | ±0.5% | 1.232 |
32 | 0.048 | ±0.8% | 0.039 | ±0.3% | 1.252 |
with the following env:
Python version: 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]
Transformers version: 4.47.0.dev0
Torch version: 2.5.0+cu118
GPU: NVIDIA A10G
Here is the code, can you run it with your env to check? Also my code might be not the best way to benchmark, would appreciate if you share your code!
Maybe your script might be better than what I am using but nevertheless here is the script I used for inference. |
Here are the results using your script BeitForImageClassification
Environment:Python version: 3.10.14 (main, Jul 23 2024, 15:53:02) [GCC 9.4.0]
Transformers version: 4.47.0.dev0
Torch version: 2.5.1+cu124
GPU: NVIDIA GeForce RTX 2060 SUPER |
Ok, looks good, can you update documentation benchmarks then? Maybe it's worth adding memory stats to my script and rerunning it |
Okay I think we can use your inference benchmarks but for the training part I think we can keep the current benchmarks? Also did you have a look at the script I shared, is there something that is not quite correct in that? |
@qubvel Do the slow tests still need to be run? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping! Looking great to me, just one comment
@ArthurZucker please review when you have bandwidth! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OmarManzoor : thank you for rebase and adding thresholds path for xpu. I verified it on my side against upstream pytorch xpu - both added tests work for me.
@dvrogozh Thank you for verifying |
@ArthurZucker Can this be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey all! Sorry for the delay, we are in the middle of a huge refactoring in #35235, which is why I wanted to wait a bit, but good work should be rewarded, so let's merge this! Thanks for being patient 🤗
Thank you! |
What does this PR do?
Related to #28005
Who can review?
@qubvel