-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracker: generate
compatibility with torch.compile
#28981
Comments
generate
compatibility with torch.compile
generate
compatibility with torch.compile
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
👀 |
👀 |
1 similar comment
👀 |
Hi @gante, I'm planning on looking at these issues from the torch.compile side - two questions: (1) is there someone from HF that's committed to working on this in the event I identify some model changes that would help run Let me know when you can, also happy to chat on slack (I'm on the HF slack already) |
Hi, is there any updates? I am especially interested in the things related to #30647. Thanks! |
Hi! Do you plan on adding DONUT too? It would be highly appreciated :) As its decoder is a BART one, do you plan on doing the two alltogether for instance? |
@vbonnivardprobayes T5 compatibility is close to being done (#34089), BART and related models will get the changes next :) (cc @zucchini-nlp ) |
Are there any plans to add cuda graph support for models that are partitioned over multiple GPUs? |
@tsengalb99 if that's possible then yes :D (multi-device is not my speciality, cc @SunMarc ) |
Hey, @tsengalb99 , we are integration TP with transformers and it is also compatible with torch.compile . Could you confirm that it is comptible with cuda graph @kwen2501 ? Otherwise, PP should work with torch.compile cuda graph also with the latest pytorch 2.5 |
@SunMarc so this applies to models scattered across multiple GPUs with DeepSpeed via Accelerate too? |
generate
🤜 🤛torch.compile
Part of the PyTorch 2024 H2 roadmap.
This issue is a tracker of the compatibility between
.generate
andtorch.compile
(intro docs by pytorch). The goal is to enablefullgraph=True
compilation on the maingenerate
use cases.generate
use case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗Decoding Strategies (end-to-end compilation)
greedy_search
/sample
are compatible (Generate: end-to-end compilation #30788)beam_search
/beam_sample
are compatible, depends on the step aboveassisted_decoding
(aka speculative decoding) is compatible, depends on the steps aboveGenerate Flags and Options
LogitsProcessor
classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)StoppingCriteria
classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)Models
Notes:
Decoder-only:
Core generation
] Adds support for static KV cache #27931)gemma
] Adds support for Gemma 💎 #29167)torch.compile
implementation #29891)Encoder-decoder:
Quantization
Others
The text was updated successfully, but these errors were encountered: