You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m experiencing an issue with transcription in non-English streams, particularly in Russian, where the number of hypothesis words generated is disproportionately large compared to confirmed words. The hypothesis words can be up to half of the confirmed words, which significantly impacts the accuracy and readability of the transcription. This issue is not present when transcribing English streams, where hypothesis words are more appropriately balanced.
Environment:
• Model: whisper-large-v3 turbo 958mb
• Device: MacBook Pro M3 Max (36GB RAM)
I have replicated this in other languages as well. This requires an algorithmic improvement to the Eager Streaming Mode in order to break out of diverging hypotheses. We are investigating a fix for this!
Hello,
I’m experiencing an issue with transcription in non-English streams, particularly in Russian, where the number of hypothesis words generated is disproportionately large compared to confirmed words. The hypothesis words can be up to half of the confirmed words, which significantly impacts the accuracy and readability of the transcription. This issue is not present when transcribing English streams, where hypothesis words are more appropriately balanced.
Environment:
Video of issue in whisperkit: https://youtu.be/JWEHgKwogG8
The text was updated successfully, but these errors were encountered: