-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Whisper] Fix slow tests #30152
[Whisper] Fix slow tests #30152
Conversation
text="This part of the speech", | ||
add_special_tokens=False, | ||
return_tensors="pt", | ||
sampling_rate=16_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the arg sampling_rate=16_000
whenever we call the processor significantly reduces the number of warnings on the logger
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @sanchit-gandhi Thanks a lot! There are still 2 failures, but I guess it's because the env. difference? What machine you used? If it is not T4, I can update the results with a T4 result and push to this PR. See below 2 failures
Full error log
|
The first failure is because we haven't passed a token corresponding to a user that has accepted the dataset term of use: https://huggingface.co/datasets/mozilla-foundation/common_voice_6_1 If the token for the CI runner also hasn't accepted the terms of use for the gated dataset, I'm happy to update the dataset to one that's un-gated! The second failure does indeed look like an env + machine difference - do you have easy access to the T4? I found it pretty difficult to debug yesterday on this machine given it's got a unique docker set-up |
For common voice, let's try not to use |
I can access. I will update the second failing test and push |
@sanchit-gandhi I updated the PR so the 2nd failing test ( I will let you handle the first one (that with dataset issue) 🙏 . request me for review once ready , thanks. |
@sanchit-gandhi WDYT about using "mozilla-foundation/common_voice_11_0"? |
This is also a gated dataset. In 5739f54 I've updated the test to use an exclusively un-gated dataset on the Hub, multilingual librispeech |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
It looks like
WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond
will give different outputs when I ran that single test vs the whole WhisperModelIntegrationTests
.
But @sanchit-gandhi is doing everything on his side.
for i in range(num_samples): | ||
assert decoded_all[i] == EXPECTED_TEXT[i] | ||
if isinstance(EXPECTED_TEXT[i], str): | ||
assert decoded_all[i] == EXPECTED_TEXT[i] | ||
elif isinstance(EXPECTED_TEXT[i], tuple): | ||
assert decoded_all[i] in EXPECTED_TEXT[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not able to have the same results on a T4 VM and on AWS K8S T4 runner. The difference is I screamed
vs I scream
so I decide to allow both of expected values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work - thanks for fixing all of these!
set_seed(0) | ||
processor = WhisperProcessor.from_pretrained("openai/whisper-large") | ||
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large") | ||
model.to(torch_device) | ||
|
||
token = os.getenv("HF_HUB_READ_TOKEN", True) | ||
ds = load_dataset("mozilla-foundation/common_voice_6_1", "ja", split="test", streaming=True, token=token) | ||
ds = load_dataset("facebook/multilingual_librispeech", "german", split="test", streaming=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change from japanese?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i probably clicked the resolved
button. See below for @sanchit-gandhi previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset used to load a Japanese sample is also gated. We've swapped to an un-gated dataset, as discussed in #30152 (comment).
* fix tests * style * more fixes * move model to device * move logits to cpu * update expected values * use ungated dataset * fix * fix * update --------- Co-authored-by: ydshieh <[email protected]>
* fix tests * style * more fixes * move model to device * move logits to cpu * update expected values * use ungated dataset * fix * fix * update --------- Co-authored-by: ydshieh <[email protected]>
* fix tests * style * more fixes * move model to device * move logits to cpu * update expected values * use ungated dataset * fix * fix * update --------- Co-authored-by: ydshieh <[email protected]>
What does this PR do?
Fixes failing slow integration tests for the Whisper model. Majority of the failing tests were simply due to the order of the expected transcriptions not matching the order of the ground-truth ones. This PR fixes this wrong ordering, and updates the tests to use the latest
.generate
API, rather than the deprecated forced decoder ids one.cc @ydshieh