Finish short form / long from generation integration in Whisper #32263

kamilakesbi · 2024-07-27T15:59:46Z

Feature request

Unify the output format to Whisper Generate method for short form/long form generation.

Motivation

In PR #30984, short-form and long-form generation in Whisper were unified so that both benefit from generation with fallback.

However, the output to the generate method's format still varies depending on whether we're doing short form or long form generation, as we can see in this line.

For short form generation the output format can be either a torch tensor containing the sequence of token ids or an instance of ModelOutput with additional information (attention masks, hidden states, ...) if return_dict_in_generate is set to True (we can now also use return_segments with short form generation).
For long form generation the output is either a torch tensor with the sequence of token ids, or a dict containing the sequences of token ids and a list of all segments if return_segments is set to True. Note that if both return_dict_in_generate and return_segments are set to true, the additional information (attention masks, hidden states) will be contained in segments. However, at the moment we can't get an instance of ModelOutput as output with long form generation.

Should we work on this ?

Ideally, we should also unify the output format for the Whisper generate method so that users don't have to distinguish between short and long form audio. They should only have to specify wether they want to perform sequential generation (non chunked) or parallel generation (chunked) with the pipeline.

The aim of PR #30984 was to implement all the modifications to allow generation with fallback for short form audios without breaking Backward Compatibility on main. If we further unify the output format, we would break backward compatibility and have to adapt several tests.

cc @sanchit-gandhi @ArthurZucker Do you think we should complete the unification of Whisper Generation by unifying the output format?

The text was updated successfully, but these errors were encountered:

eustlb · 2024-12-18T15:10:37Z

Closing this as it's been done in #34135

kamilakesbi added the Feature request Request for a new feature label Jul 27, 2024

eustlb closed this as completed Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish short form / long from generation integration in Whisper #32263

Finish short form / long from generation integration in Whisper #32263

kamilakesbi commented Jul 27, 2024 •

edited

Loading

eustlb commented Dec 18, 2024

Finish short form / long from generation integration in Whisper #32263

Finish short form / long from generation integration in Whisper #32263

Comments

kamilakesbi commented Jul 27, 2024 • edited Loading

Feature request

Motivation

Should we work on this ?

eustlb commented Dec 18, 2024

kamilakesbi commented Jul 27, 2024 •

edited

Loading