tracker: `generate` compatibility with `torch.compile` #28981

gante · 2024-02-12T17:20:11Z

`generate` 🤜 🤛 `torch.compile`

Part of the PyTorch 2024 H2 roadmap.

This issue is a tracker of the compatibility between .generate and torch.compile (intro docs by pytorch). The goal is to enable fullgraph=True compilation on the main generate use cases.

⚠️ Is your generate use case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗

Decoding Strategies (end-to-end compilation)

greedy_search / sample are compatible (Generate: end-to-end compilation #30788)
beam_search / beam_sample are compatible, depends on the step above
assisted_decoding (aka speculative decoding) is compatible, depends on the steps above

Generate Flags and Options

all LogitsProcessor classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)
all StoppingCriteria classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)

Models

Notes:

models tagged as "important models" in our CI + popular models
language models released starting from v4.42 should ALL support compile

Decoder-only:

Encoder-decoder:

BART is compatible
T5 is compatible (T5 compile compatibilty #34089)
Whisper is compatible ([whisper] static kv cache #31166)

Quantization

BNB support
GPTQ support
AWQ support

Others

We have a benchmark script to quickly compare the impact of PRs
Add section to existing docs on the topic
Confirm that pipelines work after compiling generate

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-19T08:04:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

kadirnar · 2024-05-21T20:27:30Z

👀

guangy10 · 2024-07-26T21:23:05Z

👀

jiqing-feng · 2024-07-30T05:33:42Z

👀

mlazos · 2024-08-05T10:38:31Z

Hi @gante, I'm planning on looking at these issues from the torch.compile side - two questions: (1) is there someone from HF that's committed to working on this in the event I identify some model changes that would help run torch.compile more smoothly? and (2) are these the right models we should be focusing on? are there others that are also in need of torch.compile support?

Let me know when you can, also happy to chat on slack (I'm on the HF slack already)

fzyzcjy · 2024-09-27T03:18:15Z

Hi, is there any updates? I am especially interested in the things related to #30647. Thanks!

vbonnivardprobayes · 2024-10-15T07:48:56Z

Hi! Do you plan on adding DONUT too? It would be highly appreciated :) As its decoder is a BART one, do you plan on doing the two alltogether for instance?
Thanks!

gante · 2024-10-17T14:45:16Z

@vbonnivardprobayes T5 compatibility is close to being done (#34089), BART and related models will get the changes next :) (cc @zucchini-nlp )

gante · 2024-10-17T14:46:26Z

@fzyzcjy Not on beam search. I'll be separating prefil into a separate function as my next task, then I'll probably work on vectorized beam search :)

(see other tracker: #30810)

tsengalb99 · 2024-10-25T05:03:02Z

Are there any plans to add cuda graph support for models that are partitioned over multiple GPUs?

gante · 2024-10-29T17:13:26Z

@tsengalb99 if that's possible then yes :D (multi-device is not my speciality, cc @SunMarc )

SunMarc · 2024-11-04T16:39:48Z

Hey, @tsengalb99 , we are integration TP with transformers and it is also compatible with torch.compile . Could you confirm that it is comptible with cuda graph @kwen2501 ? Otherwise, PP should work with torch.compile cuda graph also with the latest pytorch 2.5

joanvelja · 2024-11-06T15:07:03Z

@SunMarc so this applies to models scattered across multiple GPUs with DeepSpeed via Accelerate too?

gante changed the title ~~generate compatibility with torch.compile~~ tracker: generate compatibility with torch.compile Feb 12, 2024

gante self-assigned this Feb 12, 2024

zucchini-nlp mentioned this issue Feb 14, 2024

Make LogitsProcessor compatible with torch.compile #29018

Closed

gante mentioned this issue Feb 14, 2024

Dynamic parallel processing Size Adjustment for Low Mem Beam Search #28833

Closed

5 tasks

gante mentioned this issue Mar 14, 2024

Stream ModelOutputs #29545

Closed

26 tasks

huggingface deleted a comment from github-actions bot Mar 25, 2024

sheepymeh mentioned this issue Mar 27, 2024

LLaVA torch.compile implementation #29891

Open

github-actions bot closed this as completed Apr 27, 2024

gante reopened this May 2, 2024

zucchini-nlp mentioned this issue May 4, 2024

Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach' #30631

Closed

4 tasks

ArthurZucker added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label May 9, 2024

zucchini-nlp mentioned this issue May 22, 2024

Phi: static cache & compile compatibility #30688

Closed

gante mentioned this issue May 25, 2024

(Have PR) Speed up BeamScorer to make GPT-2 generation 2-3x faster #30647

Closed

zucchini-nlp mentioned this issue Jul 30, 2024

Cache: new Cache format in decoder-only models #31421

Merged

gante mentioned this issue Aug 2, 2024

GenerationConfig throws Object is not JSON serializable when setting constraints #31070

Closed

4 tasks

anijain2305 mentioned this issue Sep 9, 2024

Compile compatibilty for decoder-only models #32617

Merged

gante mentioned this issue Sep 24, 2024

tracker: generate composability refactor #30810

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracker: `generate` compatibility with `torch.compile` #28981

tracker: `generate` compatibility with `torch.compile` #28981

gante commented Feb 12, 2024 •

edited

Loading

github-actions bot commented Apr 19, 2024

kadirnar commented May 21, 2024

guangy10 commented Jul 26, 2024

jiqing-feng commented Jul 30, 2024

mlazos commented Aug 5, 2024

fzyzcjy commented Sep 27, 2024

vbonnivardprobayes commented Oct 15, 2024 •

edited

Loading

gante commented Oct 17, 2024

gante commented Oct 17, 2024 •

edited

Loading

tsengalb99 commented Oct 25, 2024

gante commented Oct 29, 2024

SunMarc commented Nov 4, 2024 •

edited

Loading

joanvelja commented Nov 6, 2024

tracker: generate compatibility with torch.compile #28981

tracker: generate compatibility with torch.compile #28981

Comments

gante commented Feb 12, 2024 • edited Loading

generate 🤜 🤛 torch.compile

Decoding Strategies (end-to-end compilation)

Generate Flags and Options

Models

Quantization

Others

github-actions bot commented Apr 19, 2024

kadirnar commented May 21, 2024

guangy10 commented Jul 26, 2024

jiqing-feng commented Jul 30, 2024

mlazos commented Aug 5, 2024

fzyzcjy commented Sep 27, 2024

vbonnivardprobayes commented Oct 15, 2024 • edited Loading

gante commented Oct 17, 2024

gante commented Oct 17, 2024 • edited Loading

tsengalb99 commented Oct 25, 2024

gante commented Oct 29, 2024

SunMarc commented Nov 4, 2024 • edited Loading

joanvelja commented Nov 6, 2024

tracker: `generate` compatibility with `torch.compile` #28981

tracker: `generate` compatibility with `torch.compile` #28981

gante commented Feb 12, 2024 •

edited

Loading

`generate` 🤜 🤛 `torch.compile`

vbonnivardprobayes commented Oct 15, 2024 •

edited

Loading

gante commented Oct 17, 2024 •

edited

Loading

SunMarc commented Nov 4, 2024 •

edited

Loading