[Whisper] Fix slow tests #30152

sanchit-gandhi · 2024-04-09T21:43:40Z

What does this PR do?

Fixes failing slow integration tests for the Whisper model. Majority of the failing tests were simply due to the order of the expected transcriptions not matching the order of the ground-truth ones. This PR fixes this wrong ordering, and updates the tests to use the latest .generate API, rather than the deprecated forced decoder ids one.

cc @ydshieh

sanchit-gandhi · 2024-04-09T21:44:32Z

tests/models/whisper/test_modeling_whisper.py

+            text="This part of the speech",
+            add_special_tokens=False,
+            return_tensors="pt",
+            sampling_rate=16_000,


Adding the arg sampling_rate=16_000 whenever we call the processor significantly reduces the number of warnings on the logger

HuggingFaceDocBuilderDev · 2024-04-09T22:27:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2024-04-10T08:18:47Z

Hi @sanchit-gandhi Thanks a lot! There are still 2 failures, but I guess it's because the env. difference? What machine you used? If it is not T4, I can update the results with a T4 result and push to this PR.

See below

2 failures

FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation_multilingual - huggingface_hub.utils._headers.LocalTokenNotFoundError: Token is required (`token=True`), but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens.

FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond - AssertionError: assert ' You know, f...ent. Me wild!' == ' You know, f...ent. Me, why?'

Full error log

======================================================================================================================================================== FAILURES =========================================================================================================================================================
_____________________________________________________________________________________________________________________________ WhisperModelIntegrationTests.test_large_generation_multilingual _____________________________________________________________________________________________________________________________

self = <tests.models.whisper.test_modeling_whisper.WhisperModelIntegrationTests testMethod=test_large_generation_multilingual>

    @slow
    def test_large_generation_multilingual(self):
        torch_device = "cpu"
        set_seed(0)
        processor = WhisperProcessor.from_pretrained("openai/whisper-large")
        model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")
        model.to(torch_device)
    
        token = os.getenv("HF_HUB_READ_TOKEN", True)
>       ds = load_dataset("mozilla-foundation/common_voice_6_1", "ja", split="test", streaming=True, token=token)

tests/models/whisper/test_modeling_whisper.py:1760: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.8/dist-packages/datasets/load.py:2556: in load_dataset
    builder_instance = load_dataset_builder(
/usr/local/lib/python3.8/dist-packages/datasets/load.py:2228: in load_dataset_builder
    dataset_module = dataset_module_factory(
/usr/local/lib/python3.8/dist-packages/datasets/load.py:1879: in dataset_module_factory
    raise e1 from None
/usr/local/lib/python3.8/dist-packages/datasets/load.py:1824: in dataset_module_factory
    raise e
/usr/local/lib/python3.8/dist-packages/datasets/load.py:1797: in dataset_module_factory
    dataset_info = hf_api.dataset_info(
/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py:119: in _inner_fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/hf_api.py:2280: in dataset_info
    headers = self._build_hf_headers(token=token)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/hf_api.py:8411: in _build_hf_headers
    return build_hf_headers(
/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py:119: in _inner_fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_headers.py:126: in build_hf_headers
    token_to_send = get_token_to_send(token)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

token = True

    def get_token_to_send(token: Optional[Union[bool, str]]) -> Optional[str]:
        """Select the token to send from either `token` or the cache."""
        # Case token is explicitly provided
        if isinstance(token, str):
            return token
    
        # Case token is explicitly forbidden
        if token is False:
            return None
    
        # Token is not provided: we get it from local cache
        cached_token = get_token()
    
        # Case token is explicitly required
        if token is True:
            if cached_token is None:
>               raise LocalTokenNotFoundError(
                    "Token is required (`token=True`), but no token found. You"
                    " need to provide a token or be logged in to Hugging Face with"
                    " `huggingface-cli login` or `huggingface_hub.login`. See"
                    " https://huggingface.co/settings/tokens."
                )
E               huggingface_hub.utils._headers.LocalTokenNotFoundError: Token is required (`token=True`), but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens.

/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_headers.py:160: LocalTokenNotFoundError
-------------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------------------------------------------------------------------------------
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
______________________________________________________________________________________________________________________ WhisperModelIntegrationTests.test_whisper_longform_multi_batch_hard_prev_cond ______________________________________________________________________________________________________________________

self = <tests.models.whisper.test_modeling_whisper.WhisperModelIntegrationTests testMethod=test_whisper_longform_multi_batch_hard_prev_cond>

    @slow
    def test_whisper_longform_multi_batch_hard_prev_cond(self):
        # fmt: off
        EXPECTED_TEXT = [
            " Folks, if you watch the show, you know I spent a lot of time right over there. Patiently and astutely scrutinizing the boxwood and mahogany chest set of the day's biggest stories, developing the central headline pawns, definitely maneuvering an oh-so-topical night to F6, faming of classic Sicilian, named or variation on the news, all the while seeing eight moves deep and patiently marshalling the latest press releases into a Fisher shows in lip-nitsky attack that culminates in the elegant lethal slow-played, all-pass on checkmate that is my nightly monologue, but sometimes sometimes folks I sometimes I start to the wake-up side down in the monkey bars of a condemned playground on a super fun site, get all hepped up on goofballs, rummage that would discard a tag bag of defective toys, yank out a fistball of disembodied doll limbs, toss them on a stain kid's place mad from a defunct denies, set up a table inside a rusty cargo container down by the warf and challenge toothless drifters to the godless bughouse blitz of tournament that is my segment, meanwhile.",
            " Folks, I spent a lot of time right over there night after night, actually. Carefully selecting for you the day's newsiest, most aerodynamic headlines, stress testing on those topical anti-lock breaks and power steering, painstakingly stitching, leather seating, so soft, it would make JD power and her associates blush. To create the luxury sedan that is my nightly monologue, but sometimes I just sometimes focus. I lurched to consciousness in the back of an abandoned school bus and slapped myself awake with a crusty floor mat. Before using a mouse-bitten timing belt to strap some old plywood to a couple of discarded oil drums, then by the light of a heathen-moon render a gas tank out of an empty big gulp, filled with white claw and de-natured alcohol, then light a match and let her rip in the dis-mented one man, soapbox derby of news that is my segment.",
            " Ladies and gentlemen, you know, I spent a lot of time right over there, raising the finest hosting news cattle firmly, yet tenderly milking the latest headlines from their jokes, swollen teats, churning the daily stories into the decadent Provincil style triple cream-breed. It is my nightly monologue, but sometimes sometimes I stagger home hungry after being released by the police and root around in the neighbor's trash can for an old milk carton scrape out the blooming dairy residue into the remains of a wet cheese rod I won from a rat in a pre-drawn street fight. Put it in a discarded paint can to leave it to ferment next to a trash fire than a hunker down in hallucinate while eating the Listeria latent demon custard of news that is my segment.",
            " Folks, you watched this show, you know I spend most of my time right over there, carefully sorting through the days, big stories, and selecting only the most subtle, and unblemished ostrich and crocodile news leather, which I then entrust to artisan graduates of the Ickel Greg Waferandi, who carefully died them in a pallet of bright, zesty shades, and adorn them in the finest most topical inlay work, using hand tools and double magnifying glasses, then assemble them according to now classic and elegant geometry using our signature saddle stitching, and line it with bees, wax, coated linen, and finally attach a mallet hammered strap, purled hardware, and close-shet to create for you the one of a kind hope kutur, Ernme, is burkin bag that is my monologue, but sometimes, sometimes folks, sometimes. Sometimes I wake up in the last car of an abandoned rollercoaster at Coney Island where I'm hiding from the triads, I have some engine lubricants out of a safe way bag and staggered down the shore to tear the sail off a beach skoener, then I ripped the coaxial cable out of an RV and elderly couple from Utah, Hank, and Mabel, lovely folks, and use it to stitch the sail into a loose pouch-like rock sack, and I stow in the back of a garbage truck to the junkyard, where I pick through to the debris for only the broken toys that make me the saddest, until I have loaded for you, the hobo fugitives bug out bindle of news that",
            " You know, folks, I spent a lot of time crafting for you a bespoke playlist of the day's big stories right over there. meticulously selecting the most topical chakra affirming scented candles, using Feng Shui, to perfectly align the joke energy in the exclusive boutique yoga retreat that is my monologue, but sometimes just sometimes, I go to the dumpster behind the waffle house at three in the morning, take off my shirt, cover myself and use fry oil, wrap my hands and some old duct tape I stole from a broken car window, pound a six pack of blueberry hard-seller and a second pill, as I stole from a parked ambulance, then arm wrestle a raccoon in the back alley vision quest of news that is my segment.",
            " You know, folks, I spend most of my time right over there. Mining the days, biggest, most important stories, collecting the finest, most topical iron or hand hammering it into joke panels, then I craft sheets of bronze and blazing with patterns that tell an epic tale of conquest and glory. Then, using the Germanic tradition press, black process, I place thin sheets of foil against the scenes and by hammering or otherwise applying pressure from the back, I project these scenes into a pair of cheat cards and a face plate, and finally using fluted strips of white, alloyed molding, I divide the designs into framed panels and hold it all together using bronze rivets to create the beautiful and intimidating, Anglo-Saxon battle helm that is my nightly monologue. But sometimes, sometimes, folks. Sometimes, just sometimes, I come to my senses fully naked on the deck of a pirate-be-seed, melee, container ship that picked me up floating on the detached door of a porta-potty in the Indian Ocean. Then, after a sunstroke induced realization of the crew of this ship plans to sell me an exchange for a bag of oranges to fight off scurvy, I lead a mutiny using only a PVC pipe and a pool chain that accepting my new role as captain and declaring myself King of the Windark Seas. I grab a dirty mop bucket covered in barnacles and adorn it with the teeth of the vanquished to create these shopping wet pirate crown of news that is my segment. Me, why?",
            " Folks, if you watch this show, you know I spend most of my time right over there carefully blending for you the day's newsiest, most topical flower eggs, milk and butter. And straining into a fine batter to make delicate and informative comedy pancakes, then I glaze them in the juice and zest of the most relevant midnight valencio oranges. And doubts at all, and I find delimane de voyage cognac, before from bang and basting them tables, I deserve you the James Beard Award worthy creeps to ZET. That is my nightly monologue, but sometimes sometimes folks, I wake up in the baggage hole of Greyhound bus, it's being hoisted by the scrapyard claw toward the burn pit. Escape to a nearby abandoned price chopper where I scrounge for old bread scraps, busted up in bags of starfruit candies and expired eggs. Chuck it all on a dirty hubcap and slap it over a tire fire before using the legs of a strained pair of sweatpants and as ovenmets to extract and serve the demented transients pound cake of news that is my segment.",
            " Folks, if you watch the show and I hope you do, I spend a lot of time right over there. Tirelessly studying the lineage of the day's most important thoroughbred stories and whole-stiner headlines, working with the best trainers money can buy to rear their comedy offspring with a hand that is stern yet gentle into the triple crown winning equine specimen that is my nightly monologue. But sometimes sometimes folks I break into an unincorporated veterinary genetics lab. And grab whatever test tubes I can find and then under a grow light I got from a discarded chia pet. I mixed the pill for DNA of a horse and whatever was in a tube labeled Keith Cohen-Extra. Slurring the concoction with caffeine pills and a microwave bread bowl, I screamed sing a prayer to Janice initiator of human life and God of Transformation as a half horse, half man freak ceases to life before me and the hideous collection of loose animal parts and corrupted men tissue that is my segment. Meanwhile!"
        ]
        # fmt: on
    
        processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")
        model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")
        model = model.to(torch_device)
    
        ds = load_dataset("distil-whisper/meanwhile", "default")["test"]
        ds = ds.cast_column("audio", Audio(sampling_rate=16000))
    
        num_samples = 8
    
        audio = ds[:num_samples]["audio"]
        audios = [x["array"] for x in audio]
    
        inputs = processor(
            audios,
            return_tensors="pt",
            truncation=False,
            padding="longest",
            return_attention_mask=True,
            sampling_rate=16_000,
        )
        inputs = inputs.to(device=torch_device)
    
        gen_kwargs = {
            "return_timestamps": True,
            "no_speech_threshold": 0.6,
            "temperature": (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
            "compression_ratio_threshold": 1.35,
            "condition_on_prev_tokens": True,
            "logprob_threshold": -1.0,
            "num_beams": 5,
        }
    
        torch.manual_seed(0)
        result = model.generate(**inputs, **gen_kwargs)
        decoded_all = processor.batch_decode(result, skip_special_tokens=True)
    
        for i in range(num_samples):
>           assert decoded_all[i] == EXPECTED_TEXT[i]
E           AssertionError: assert ' You know, f...ent. Me wild!' == ' You know, f...ent. Me, why?'
E             -  You know, folks, I spend most of my time right over there. Mining the days, biggest, most important stories, collecting the finest, most topical iron or hand hammering it into joke panels, then I craft sheets of bronze and blazing with patterns that tell an epic tale of conquest and glory. Then, using the Germanic tradition press, black process, I place thin sheets of foil against the scenes and by hammering or otherwise applying pressure from the back, I project these scenes into a pair of cheat cards and a face plate, and finally using fluted strips of white, alloy...
E             
E             ...Full output truncated (4 lines hidden), use '-vv' to show

tests/models/whisper/test_modeling_whisper.py:2684: AssertionError

sanchit-gandhi · 2024-04-10T13:19:29Z

The first failure is because we haven't passed a token corresponding to a user that has accepted the dataset term of use: https://huggingface.co/datasets/mozilla-foundation/common_voice_6_1

If the token for the CI runner also hasn't accepted the terms of use for the gated dataset, I'm happy to update the dataset to one that's un-gated!

The second failure does indeed look like an env + machine difference - do you have easy access to the T4? I found it pretty difficult to debug yesterday on this machine given it's got a unique docker set-up

ydshieh · 2024-04-11T13:14:11Z

For common voice, let's try not to use common_voice_6_1. Instead, like #27147, let's use something that doesn't require extra step, if possible.

ydshieh · 2024-04-11T13:14:58Z

do you have easy access to the T4? I found it pretty difficult to debug yesterday on this machine given it's got a unique docker set-up

I can access. I will update the second failing test and push

ydshieh · 2024-04-11T14:15:12Z

For common voice, let's try not to use common_voice_6_1. Instead, like #27147, let's use something that doesn't require extra step, if possible.

@sanchit-gandhi I updated the PR so the 2nd failing test (test_whisper_longform_multi_batch_hard_prev_cond) is passing now.

I will let you handle the first one (that with dataset issue) 🙏 .

request me for review once ready , thanks.

ydshieh · 2024-04-15T08:35:41Z

@sanchit-gandhi WDYT about using "mozilla-foundation/common_voice_11_0"?

…-slow-tests

sanchit-gandhi · 2024-04-15T16:28:26Z

This is also a gated dataset. In 5739f54 I've updated the test to use an exclusively un-gated dataset on the Hub, multilingual librispeech

tests/models/whisper/test_modeling_whisper.py

ydshieh

Thanks.

It looks like

WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond

will give different outputs when I ran that single test vs the whole WhisperModelIntegrationTests.

But @sanchit-gandhi is doing everything on his side.

ydshieh · 2024-04-16T14:28:45Z

tests/models/whisper/test_modeling_whisper.py

        for i in range(num_samples):
-            assert decoded_all[i] == EXPECTED_TEXT[i]
+            if isinstance(EXPECTED_TEXT[i], str):
+                assert decoded_all[i] == EXPECTED_TEXT[i]
+            elif isinstance(EXPECTED_TEXT[i], tuple):
+                assert decoded_all[i] in EXPECTED_TEXT[i]


not able to have the same results on a T4 VM and on AWS K8S T4 runner. The difference is I screamed vs I scream so I decide to allow both of expected values.

amyeroberts

Awesome work - thanks for fixing all of these!

amyeroberts · 2024-04-19T09:13:44Z

tests/models/whisper/test_modeling_whisper.py

        set_seed(0)
        processor = WhisperProcessor.from_pretrained("openai/whisper-large")
        model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")
        model.to(torch_device)

-        token = os.getenv("HF_HUB_READ_TOKEN", True)
-        ds = load_dataset("mozilla-foundation/common_voice_6_1", "ja", split="test", streaming=True, token=token)
+        ds = load_dataset("facebook/multilingual_librispeech", "german", split="test", streaming=True)


Why the change from japanese?

sorry, i probably clicked the resolved button. See below for @sanchit-gandhi previous comment

#30152 (comment)

The dataset used to load a Japanese sample is also gated. We've swapped to an un-gated dataset, as discussed in #30152 (comment).

* fix tests * style * more fixes * move model to device * move logits to cpu * update expected values * use ungated dataset * fix * fix * update --------- Co-authored-by: ydshieh <[email protected]>

sanchit-gandhi added 3 commits April 9, 2024 22:37

fix tests

6fc4579

style

fd1b7e2

more fixes

861d691

sanchit-gandhi commented Apr 9, 2024

View reviewed changes

sanchit-gandhi added 2 commits April 9, 2024 22:56

move model to device

2ca8ecf

move logits to cpu

d29547a

sanchit-gandhi requested a review from ydshieh April 9, 2024 22:50

sanchit-gandhi mentioned this pull request Apr 10, 2024

Support mixed-language batches in WhisperGenerationMixin #29688

Merged

5 tasks

update expected values

723aa33

sanchit-gandhi added 2 commits April 15, 2024 17:27

use ungated dataset

5739f54

Merge remote-tracking branch 'origin/whisper-slow-tests' into whisper…

5e04425

…-slow-tests

sanchit-gandhi commented Apr 15, 2024

View reviewed changes

tests/models/whisper/test_modeling_whisper.py Show resolved Hide resolved

ydshieh approved these changes Apr 16, 2024

View reviewed changes

ydshieh added 3 commits April 16, 2024 14:50

fix

63f77a1

fix

77893ab

update

aef905c

ydshieh reviewed Apr 16, 2024

View reviewed changes

ydshieh requested a review from amyeroberts April 16, 2024 14:29

ydshieh mentioned this pull request Apr 17, 2024

Fix all torch pipeline failures except one #30290

Merged

amyeroberts approved these changes Apr 19, 2024

View reviewed changes

ydshieh merged commit 4ed0e51 into huggingface:main Apr 19, 2024
19 checks passed

sanchit-gandhi deleted the whisper-slow-tests branch April 19, 2024 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Whisper] Fix slow tests #30152

[Whisper] Fix slow tests #30152

sanchit-gandhi commented Apr 9, 2024 •

edited

Loading

sanchit-gandhi Apr 9, 2024

HuggingFaceDocBuilderDev commented Apr 9, 2024

ydshieh commented Apr 10, 2024

sanchit-gandhi commented Apr 10, 2024

ydshieh commented Apr 11, 2024

ydshieh commented Apr 11, 2024

ydshieh commented Apr 11, 2024

ydshieh commented Apr 15, 2024

sanchit-gandhi commented Apr 15, 2024 •

edited

Loading

ydshieh left a comment

ydshieh Apr 16, 2024

amyeroberts left a comment

amyeroberts Apr 19, 2024

ydshieh Apr 19, 2024

sanchit-gandhi Apr 19, 2024

[Whisper] Fix slow tests #30152

[Whisper] Fix slow tests #30152

Conversation

sanchit-gandhi commented Apr 9, 2024 • edited Loading

What does this PR do?

sanchit-gandhi Apr 9, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 9, 2024

ydshieh commented Apr 10, 2024

2 failures

Full error log

sanchit-gandhi commented Apr 10, 2024

ydshieh commented Apr 11, 2024

ydshieh commented Apr 11, 2024

ydshieh commented Apr 11, 2024

ydshieh commented Apr 15, 2024

sanchit-gandhi commented Apr 15, 2024 • edited Loading

ydshieh left a comment

Choose a reason for hiding this comment

ydshieh Apr 16, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Apr 19, 2024

Choose a reason for hiding this comment

ydshieh Apr 19, 2024

Choose a reason for hiding this comment

sanchit-gandhi Apr 19, 2024

Choose a reason for hiding this comment

sanchit-gandhi commented Apr 9, 2024 •

edited

Loading

sanchit-gandhi commented Apr 15, 2024 •

edited

Loading