[whisper] transcription is different from hf & openai #32900

jsoto-gladia · 2024-08-20T14:01:30Z

System Info

transformers version: 4.40.2
Platform: Linux-5.15.0-118-generic-x86_64-with-glibc2.35
Python version: 3.10.14
Huggingface_hub version: 0.21.4
Safetensors version: 0.4.2
Accelerate version: 0.24.1
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

silence-middle.wav.zip
the following wav file produces

with hf whisper repo (tiny model):
Split infinity, and a time when less is more. Where too much is never
with openai whisper repo (tiny model)
Split infinity, and a time when less is more.

Expected behavior

I would expect the result to be
Split infinity, and a time when less is more. Where too much is never

The text was updated successfully, but these errors were encountered:

jsoto-gladia · 2024-08-20T14:02:52Z

you are missing the re encoding mechanism happening when eos is reached within a 30s segment

amyeroberts · 2024-08-20T14:03:38Z

cc @ylacombe @sanchit-gandhi

ylacombe · 2024-09-02T16:32:51Z

Hey @jsoto-gladia, many thanks for opening this issue!

This looks like an interesting finding, do you think you could provide code snippets (both in transformers and in the whisper repo) to allow us to reproduce it ?

github-actions · 2024-09-27T08:05:52Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ylacombe · 2024-10-07T15:47:59Z

Hey @jsoto-gladia, I believe this PR addresses your problem: #33917

github-actions · 2024-11-01T08:06:35Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

jsoto-gladia added the bug label Aug 20, 2024

amyeroberts added the Audio label Aug 20, 2024

github-actions bot closed this as completed Oct 6, 2024

ylacombe reopened this Oct 7, 2024

github-actions bot closed this as completed Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[whisper] transcription is different from hf & openai #32900

[whisper] transcription is different from hf & openai #32900

jsoto-gladia commented Aug 20, 2024

jsoto-gladia commented Aug 20, 2024

amyeroberts commented Aug 20, 2024

ylacombe commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 27, 2024

ylacombe commented Oct 7, 2024

github-actions bot commented Nov 1, 2024

[whisper] transcription is different from hf & openai #32900

[whisper] transcription is different from hf & openai #32900

Comments

jsoto-gladia commented Aug 20, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

jsoto-gladia commented Aug 20, 2024

amyeroberts commented Aug 20, 2024

ylacombe commented Sep 2, 2024 • edited Loading

github-actions bot commented Sep 27, 2024

ylacombe commented Oct 7, 2024

github-actions bot commented Nov 1, 2024

ylacombe commented Sep 2, 2024 •

edited

Loading