-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sdpa for Detr #34826
Add sdpa for Detr #34826
Conversation
Hi @OmarManzoor! Thanks for taking this up, feel free to ping me when it's ready for review! |
@qubvel I posted a question in the description. Could you kindly have a look? |
@OmarManzoor Yes, for some models we see that the threshold should be different. You can override model tests |
@qubvel I tried make the required adjustments. I don't really know about these test failures. |
Hi @OmarManzoor! Thanks for iterating! Re tests: This one looks unrelated
However this one might be related to the PR
Can you also push an empty commit with the message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments regarding changes in related models such as Conditional DETR and Maskformer. We should not remove modules, instead # Copied from
should be utilized.
src/transformers/models/table_transformer/modeling_table_transformer.py
Outdated
Show resolved
Hide resolved
@@ -198,6 +198,7 @@ class DetrModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin | |||
# special case for head models | |||
def _prepare_for_class(self, inputs_dict, model_class, return_labels=False): | |||
inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels) | |||
inputs_dict.pop("decoder_input_ids", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment re this change?
src/transformers/models/conditional_detr/modeling_conditional_detr.py
Outdated
Show resolved
Hide resolved
@qubvel I tried running some benchmarks for training. This is the script I used. train benchmark script |
Interesting observation! Did you try with larger images? I observed something similar for the RT-DETR model, |
No not beyond 128. Does the script look okay though? with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=True) resulted is no kernel found errors. I had to set |
@qubvel I think the tests are generally looking okay now. I also pushed a commit with run slow for Detr. Also one thing that I observed was that this test for table transformer currently seems to be failing. This happens on the main branch as well when I tested locally, probably because the values don't match. transformers/tests/models/table_transformer/test_modeling_table_transformer.py Lines 594 to 601 in 1de7dc7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @OmarManzoor, thanks for iterating! Can you post here some benchmarks you have? I'd recommended to test it on real input, e.g. for DETR it might be an image of size ~1300x800, please use the image processor to prepare some inputs. At least for inference mode.
Can you push run-slow message once again? it should be the last one, otherwise I can't approve CI run, please use [run-slow] detr, table_transformer
. Thanks!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I don't think I can run for larger inputs since I only have a GPU with 8GB available to run the benchmarks. |
This is a benchmark I just ran for 1048 x 640 with float16 for training
|
Hi @OmarManzoor, thanks for sharing this! To move forward with this PR, we need to ensure it performs at least as fast as the eager implementation or offers better memory efficiency. I plan to test it on my side next week to see if the performance depends on specific hardware. Maybe worth benchmarking the compiled version as well. If we can't achieve this improvement now, it might not be ideal to merge at this point, as SDPA is the default setting. Slower performance and additional maintenance code could introduce challenges. However, this approach might work better with newer GPUs and future torch versions, so even if we don’t identify a solution right away, we could revisit and integrate it at a more optimal time. Looking forward to hearing your thoughts! |
I agree with you. However I would be grateful if you could try out some benchmarks on your end using better GPUs and better configurations and maybe some tweaks in the benchmarking script if required. That would confirm if we are getting any performance upgrades or not. If the results are similar to what I observed then I think we can close this PR for now. Also I haven't really tested for inference because not once did I observe any sort of improvement for the training script, even with float32 it was almost the same speed with eager being slightly better. |
Closing as no specific performance improvements were noted for Detr. Thank you for your support on this @qubvel and if you observe any performance improvements at your side, we can reopen. |
What does this PR do?
Towards #28005
Who can review?
CC: @amyeroberts @ArthurZucker