-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add language to word timestamps for Whisper #31572
Add language to word timestamps for Whisper #31572
Conversation
_collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information
added missing comma
…b.com/robinderat/transformers into whisper-language-with-word-timestamps
Hi @robinderat, thanks for working on this! Could you please add a test in |
test that the pipeline can return both the language and timestamp
LGTM! thanks for iterating on this :) cc @amyeroberts for final review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @robinderat! 🤗
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@amyeroberts gentle reminder 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping - just some questions to better understand the diff
@@ -1197,12 +1197,16 @@ def _find_longest_common_sequence(sequences, token_timestamp_sequences=None): | |||
return total_sequence, [] | |||
|
|||
|
|||
def _collate_word_timestamps(tokenizer, tokens, token_timestamps, language): | |||
def _collate_word_timestamps(tokenizer, tokens, token_timestamps, language, return_language): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is return_language
always a bool, or could it be None
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a bool or None. However, the behavior for return_language=None is equal to that of return_language=False
) | ||
data = load_dataset("openslr/librispeech_asr", "clean", split="test", streaming=True, trust_remote_code=True) | ||
sample = next(iter(data)) | ||
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="en", task="transcribe") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this modification necessary for people to be able to use return_language
with the pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based my test on an existing one that had this modification in it. However, it does not seem to affect the tests, so I have now removed them.
Specifying return_language=True in the pipeline is all that is required for this to work
Removed model configurations that do not influence test results
Removed model configurations that do not influence test results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding and iterating on this!
* add language to words _collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information * ran style checks added missing comma * add new language test test that the pipeline can return both the language and timestamp * remove model configuration in test Removed model configurations that do not influence test results * remove model configuration in test Removed model configurations that do not influence test results
* add language to words _collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information * ran style checks added missing comma * add new language test test that the pipeline can return both the language and timestamp * remove model configuration in test Removed model configurations that do not influence test results * remove model configuration in test Removed model configurations that do not influence test results
* add language to words _collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information * ran style checks added missing comma * add new language test test that the pipeline can return both the language and timestamp * remove model configuration in test Removed model configurations that do not influence test results * remove model configuration in test Removed model configurations that do not influence test results
* add language to words _collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information * ran style checks added missing comma * add new language test test that the pipeline can return both the language and timestamp * remove model configuration in test Removed model configurations that do not influence test results * remove model configuration in test Removed model configurations that do not influence test results
Add language to word timestamps for Whisper
Fixes 29520
This fix enables whisper to return word-level timestamps and the predicted language at the same time. Before, enabling word-level timestamps would discard the language prediction.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@kamilakesbi
@sanchit-gandhi