-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explanation of using VAD, VAC #117
Comments
Hi,
yes, it works similarly. VACOnlineASRProcessor works on really short minimum vac-chunk-size. It sends the voiced audio to OnlineASRProcessor. It counts how big chunk arrived, and if its more then OnlineASRProc's min-chunk-size, it triggers processing. If end of voice is detected, it immediately triggers OnlineASRProc.finish() . --vad may be obsolete now, but a test should confirm it. It is applied on the OnlineASRProc's audio buffer whenever Whisper processes it. So it's repeated everytime which shouldn't. But if it would prevent hallucination or improve the quality, it could be good to keep the option. It probably also handles the context better than finish().
Yes, 0.5 sec. It is possible to make it configurable parameter. PR welcome.
Yes, just change the hardcoded parameter. |
I'm sorry but what is the 'OnlineASRProc.finalize', is it a function? I can't find the location in this project. |
sorry, I meant .finish() : whisper_streaming/whisper_online.py Line 492 in 225f038
Comment edited. Previously I wrote it from the top of my head, I didn't check the code. |
Could someone explain how to use VAD, VAC and how it works ?
Please provide more detailed description for parameters
--vac Use VAC = voice activity controller. Recommended. Requires torch.
--vac-chunk-size VAC_CHUNK_SIZE, VAC sample size in seconds.
--vad Use VAD = voice activity detection, with the default parameters.
In my own simple VAD implementation, mentioned in issue 105,
I skip silence from incoming streamed audio and add/process audio chunk to OnlineASRProcessor with length according to min-chunk-size parameter (e.g. 1 sec). Does VACOnlineASRProcessor work similarly? Does VACOnlineASRProcessor skip non-voice and then add only voice to OnlineASRProcessor (with fixed chunk size) ?
Additional questions :
The text was updated successfully, but these errors were encountered: