-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update audio.py #1112
Update audio.py #1112
Conversation
fix black error
try appease black 2
appease flake8
No idea why Black isn't being satisfied...sigh... |
I've also tested the following, but it appears that FFMPEG just kicks ass so...hence this pull request: Here's what I tested just FYI:
|
You need to use It seems that the community would take simplicity over speedups that depend on external packages, in #1106 we decided to ditch CuPy feature extraction regardless of the speedup because the increased code complexity, so I'll be hesitant to move forward with this PR to be honest since faster whisper already allows this functionality, you can easily import |
I also benchmarked on my side using your script after modifying it to run each method 10 times and average the time taken 34 min file:
3 min file:
the performance is barely better for large files and actually slower for smaller files |
That's very strange...Can you show the script please? I carefully created the benchmarking script to compare apples-to-apples, but humans do make mistakes sometimes so... [EDIT] And do you happen to have the audio file tested? Can you try it on the one that my initial benchmark results were tested on...Here's a link to the file. https://huggingface.co/datasets/reach-vb/random-audios/blob/main/sam_altman_lex_podcast_367.flac |
Script
Results on your file:
|
I used your script verbatim except used windows-style path for audio file:
On a very long audio file, at least, it seems that straight-ffmpeg is faster for both of us. However, mine was Thoughts on our somewhat similar (but not entirely) results...
linux v windowsYou're using linux, right? Could be a difference... different versions of librariesI'm assuming you pip installed av and did not build it? Regarding the straight-ffmpeg test, you used ffmpeg version As a side note, I'm not aware how to download a specific "version" like 4.4.2. I'm only aware how to get ffmpeg for windows by going here and downloading a build that someone created for Windows. If you're aware of how to download a specific "version" for Windows I can definitely try that... your discrepancy for a 3 minute fileSince I wasn't given your specific 3-min file, I tested on a short 6 min .mp3 file I had. The results I got were:
In summary, straight-ffmpeg was My only thought would be to re-test with a newer version of straight-ffmpeg and compare it to av 3.1.0...Also, with a "relatively" short (compared to the altman podcast anyways) audio file, there might have been some background activity on your computer during that test. True...that could happen with any test including mine too...but with a longer audio file it's less likely to have a meaningful impact, in theory of course. Also, it defies commonsense...Let's assume hypothetically that we test using straight-ffmpeg against av that uses the same exact version of ffmpeg under the hood (obviating the versioning discrepancy above), and that we use the exact same parameters and/or methods (e.g. batch processing or what have you), there is no possible way that av is faster. One library that bootstraps another can't do anything but add overhead. AGAIN, assuming that we run ffmpeg the same exact way. A final thought, neither of us are professional benchmarkers like "Gamer's Nexus" on Youtube with separate benchmarks and insane benchmarking protocols so take everything I say with a grain of salt. Regarding detecting FFMPEG on windowsYou previously expressed concerns about detecting ffmpeg on windows... You also suggested bootstrapping ffmpeg from openai's vanilla whisper library, thus this pull request is unnecessary...In light of your comments regarding code simplicity, I fail to see how that is any simpler than allowing faster-whisper to use ffmpeg if it's in the system PATH, and if not...do what faster-whisper already does. It seems the contrary... I suggest that you keep this pull request open, subject to modification, unless/until there's a strong consensus that ffmpeg operates somehow differently on short files... |
I upgraded
Anyways, the main discussion here is not whether one is faster than the other because |
Thanks. Again, nothing changes from the user's perspective except that it'll use something that's faster if it's available. No code changes to a person creating a script that uses faster-whisper whatsoever. You touch upon a core issue though... I'd encourage faster-whisper to be more accommodating with pull requests; otherwise, it'll just become stagnant. The testing I do, and even responding in-depth to comments, take a significant amount of time that could be spent on other repositories, but I am trying to contribute to faster-whisper because I have soft spot for it. lol. This pull requests changes NOTHING as far as how faster-whisper operates when a person doesn't have FFMPEG in their system's PATH and ONLY offers a speedup for those that do...yet it's met with scrutiny that, in the terms of my legal profession, can only be described as "beyond a reasonable doubt." Food for thought... |
Before continuing with the discussion I should thank you for dedicating time and efforts to make this project better. I, as a maintainer, have a different POV, the default is not to accept a PR unless there are solid reasons to do so, and if you check the old closed PRs you'll find that this was mostly the case. I'd advice against accepting it and if it does I'll accept it if it can replace the old functionality because I'm against maintaining two pieces of code that have the same functionality even if one of them has an edge over the other. I recommend reading This and This to gain insights on my POV and other open source maintainers' as well. |
You're welcome, and I've seen enough ranging from your willingness to participate in a private bloke like me's whisper benchmarking repository to interactions on here to know you're thoughts are genuine and you're doing what you think is best. Just my two cents is all, which you're free to adopt, modify or what have you. Cheers! |
This re-introduces FFMPEG binary, which is much faster than AV (which bundles FFMPEG), to decode audio. It will only use FFMPEG directly if it's available; otherwise, it'll use AV just like before. Thus, there should be no impact whatsoever for users who traditionally don't install FFMPEG separately to use
faster-whisper
...while people who have FFMPEG somewhere on their system (as many do) will enjoy a nice speedup.BENCHMARK comparison
Pydub Backend:
convert_pydub took 17.846182 seconds
AV Backend:
convert_av took 9.008320 seconds
FFmpeg Backend:
convert_ffmpeg took 2.072908 seconds
This benchmark was done using the Sam Altman podcast, which is over two hours long. Used an RTX 4090 + 13900k. Obviously, the "actual" seconds saved will be less for smaller files...But with that being said, if batch processing is used and/or large files are in-fact processed, the "relative" speedup by using FFMPEG is stark.
Again,
faster-whisper
's reliance onAV
would not change, but there would simply be an option now to use FFMPEG directly viasub-process
.Pydub
, another backend, is show for comparison only. Here's the benchmark script...just add a different file name to test something different:BENCH SCRIPT HERE