Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

Open
4 tasks
felipehertzer opened this issue Sep 18, 2024 · 16 comments
Open
4 tasks
Labels
Audio bug Core: Tokenization Internals of the library; Tokenization.

Comments

@felipehertzer
Copy link

System Info

  • transformers version: 4.44.2
  • Platform: macOS-15.0-arm64-arm-64bit
  • Python version: 3.12.6
  • Huggingface_hub version: 0.24.7
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.6.0.dev20240916 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No

Who can help?

@kamilakesbi @ArthurZucker @itazap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Hi, I am attempting to transcribe several audio files; however, the process intermittently encounters an exception with some of the files. The transcription works successfully in approximately 90% of the cases, but certain files trigger this exception unexpectedly. I am attaching one of the audio files that generates this exception for your review. Thank you.

  • I was able replicate it on a MacOS on CPU and Linux on CUDA.

1 Install Stable TS
pip install stable-ts

2 Run the code:

import stable_whisper

model = stable_whisper.load_hf_whisper('medium')
result = model.transcribe(
    audio = 'radio_18596_1726554951_1726554981.mp3',
)
print(result.text)

Audio sample: https://filebin.net/hivqswoer298m65m

Than I receive the follow exception:

Traceback (most recent call last):
  File "/tests/test.py", line 4, in <module>
    result = model.transcribe(
             ^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 236, in transcribe
    return transcribe_any(
           ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/non_whisper.py", line 342, in transcribe_any
    result = inference_func(**inference_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 116, in _inner_transcribe
    output = self._pipe(audio, **pipe_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 284, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1255, in __call__
    return next(
           ^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 587, in postprocess
    text, optional = self.tokenizer._decode_asr(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 835, in _decode_asr
    return _decode_asr(
           ^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1086, in _decode_asr
    resolved_tokens, resolved_token_timestamps = _find_longest_common_sequence(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1193, in _find_longest_common_sequence
    matches = sum(
              ^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1198, in <genexpr>
    and left_token_timestamp_sequence[left_start + idx]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'NoneType' and 'float'

Expected behavior

To be able to transcibe the audio files without this exception.

@itazap
Copy link
Collaborator

itazap commented Sep 18, 2024

Thanks for raising, looks like the below indeed does happen:

if i + 1 < len(token_timestamps):
end_time = round(token_timestamps[i + 1] + time_offset, 2)
else:
end_time = None # should never happen

since this loops over tokens and the last index + 1 will be out of range:

for i, token in enumerate(token_ids):

cc @eustlb @ylacombe wdyt about how the last timestamp should be handled ?

@LysandreJik LysandreJik added Core: Tokenization Internals of the library; Tokenization. Audio labels Sep 21, 2024
@moodpanda
Copy link

Im experiencing this as well on stable whisper
image

@aklacar1
Copy link

Any news here? Will it be fixed anytime soon ? Or is there a version where this is not a problem?

@itazap
Copy link
Collaborator

itazap commented Sep 30, 2024

Hey, working on a fix but having trouble consistently reproducing this (sometimes breaks sometimes not) 🤔 Are you experiencing the same? @aklacar1 @felipehertzer ?

@aklacar1
Copy link

aklacar1 commented Sep 30, 2024

@itazap Yes I am, however it happens on one of my videos, but not on other. I have no idea why one works and other does not. The one that does not work is almost 20 minutes long.

NOTE: To make sure I am giving you correct info, I am rerunning it now. Just waiting for Docker build to finish.

@aklacar1
Copy link

This time I used:

torch==2.0.0
torchvision==0.15
torchaudio==2.0.1
transformers=4.45.1
stable-ts[hf]==2.17.4

Last time I believe I had 2.4.x version of Torch, this time its 2.0.0. However, I get same results

File "/function/stable_whisper/whisper_word_level/hf_whisper.py", line 236, in transcribe
return transcribe_any(
File "/function/stable_whisper/non_whisper.py", line 342, in transcribe_any
result = inference_func(**inference_kwargs)
File "/function/stable_whisper/whisper_word_level/hf_whisper.py", line 116, in _inner_transcribe
output = self._pipe(audio, **pipe_kwargs)
File "/function/transformers/pipelines/automatic_speech_recognition.py", line 284, in call
return super().call(inputs, **kwargs)
File "/function/transformers/pipelines/base.py", line 1260, in call
return next(
File "/function/transformers/pipelines/pt_utils.py", line 125, in next
processed = self.infer(item, **self.params)
File "/function/transformers/pipelines/automatic_speech_recognition.py", line 598, in postprocess
text, optional = self.tokenizer._decode_asr(
File "/function/transformers/models/whisper/tokenization_whisper.py", line 835, in _decode_asr
return _decode_asr(
File "/function/transformers/models/whisper/tokenization_whisper.py", line 1034, in _decode_asr
resolved_tokens, resolved_token_timestamps = _find_longest_common_sequence(
File "/function/transformers/models/whisper/tokenization_whisper.py", line 1193, in _find_longest_common_sequence
matches = sum(
File "/function/transformers/models/whisper/tokenization_whisper.py", line 1198, in
and left_token_timestamp_sequence[left_start + idx]
TypeError: '<=' not supported between instances of 'NoneType' and 'float'

@felipehertzer
Copy link
Author

Hi @itazap,

Thank you for looking into the issue. I am able to reproduce it using the following audio file and stable-ts code:

Audio: https://file.io/wi9gbaf1GMvt

@ArthurZucker
Copy link
Collaborator

does #33625 fix your issue ? 🤗

@itazap
Copy link
Collaborator

itazap commented Oct 5, 2024

@felipehertzer apologies for the delay but I believe the link has expired, can you please reshare the file? 🙏

@felipehertzer
Copy link
Author

@itazap
Copy link
Collaborator

itazap commented Oct 9, 2024

@felipehertzer Thanks! Okay I am not super familiar with the whisper model but I think it has to do with stable-ts not adding a special token to end the text, but you can try this branch: #33625 and it should address the fix for now! 😊

@felipehertzer
Copy link
Author

Hi @itazap, Sorry for the delay, I just tested it and I can confirm that it have fixed the issue. Thank you.

Copy link

github-actions bot commented Nov 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@eustlb eustlb reopened this Dec 18, 2024
@eustlb
Copy link
Contributor

eustlb commented Dec 18, 2024

Reopening until a solution is merged into main

@eustlb
Copy link
Contributor

eustlb commented Dec 18, 2024

@felipehertzer do you still have the audio snippet ? the link has expired

@felipehertzer
Copy link
Author

Hey @eustlb here is the link updated. Thanks.

https://drive.google.com/file/d/1BNUV7K8XMYCRC-YE_6QJ4PpmktgTIuNS/view?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Audio bug Core: Tokenization Internals of the library; Tokenization.
Projects
None yet
Development

No branches or pull requests

7 participants