-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch CUDA graphs with HF generate #27837
Comments
This is kind of planned as we want to support static caching to compile the models and have faster inference 😉 cc @gante might have already been asked in other issues as well |
@tsengalb99 as Arthur wrote, we are working on it :D Expect to see updates soon |
Are there any updates on this? And what is the main reason why cuda graphs don't work right now? |
Follow this PR #27931 for update, the dynamic KV cache is an issue |
PR is still very much active and now supports cuda graphs |
Great, looking forward to seeing it merged! Do you have an ETA on when that will happen?
From: Arthur ***@***.***>
Sent: Tuesday, January 30, 2024 12:46 AM
To: huggingface/transformers ***@***.***>
Cc: Albert Tseng ***@***.***>; Mention ***@***.***>
Subject: Re: [huggingface/transformers] torch CUDA graphs with HF generate (Issue #27837)
PR is still very much active and now supports cuda graphs
—
Reply to this email directly, view it on GitHub <#27837 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH6WZSDGXROQGEU3ISVVA7DYRCXOFAVCNFSM6AAAAABAGOCE5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJWGM2DANZZHA> .
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Only needs a final review so this week 😉 |
Hi Arthur,
I saw the PR got merged in - what is the recommended way to use cuda graphs during generation? I am wrapping the entire model with a torch cuda graph wrapper right now and am getting the same graph breaking errors as before.
Thanks,
Albert
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Arthur ***@***.***>
Sent: Sunday, February 4, 2024 9:24:13 PM
To: huggingface/transformers ***@***.***>
Cc: Albert Tseng ***@***.***>; Mention ***@***.***>
Subject: Re: [huggingface/transformers] torch CUDA graphs with HF generate (Issue #27837)
Only needs a final review so this week 😉
—
Reply to this email directly, view it on GitHub<#27837 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6WZSDUHK7DTUOHUS3KK4LYSA7E3AVCNFSM6AAAAABAGOCE5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRWGEYTCMRSGA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hey! Here is how I used it: https://gist.github.com/ArthurZucker/af34221def212259b43d55a2811d2dbb. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
A PR is coming for this! #29374 |
Feature request
In my experiments, I cannot get torch CUDA graphs to work with HF generate. CUDA graphs work fine when calling the forward pass of a model, but either due to static input/output sizes or something else, stream capture fails when calling .generate(). Can support for torch CUDA graphs be added?
Motivation
LLMs have a lot of kernel launches and CUDA graphs can remove most of the launch time. In my experiments with just forward call, CUDA graphs can be twice as fast as non-CUDA graph versions of the same model.
Your contribution
n/a
The text was updated successfully, but these errors were encountered: