Explanation of using VAD, VAC #117

marcinmatys · 2024-09-02T11:26:46Z

Could someone explain how to use VAD, VAC and how it works ?

Please provide more detailed description for parameters
--vac Use VAC = voice activity controller. Recommended. Requires torch.
--vac-chunk-size VAC_CHUNK_SIZE, VAC sample size in seconds.
--vad Use VAD = voice activity detection, with the default parameters.

In my own simple VAD implementation, mentioned in issue 105,
I skip silence from incoming streamed audio and add/process audio chunk to OnlineASRProcessor with length according to min-chunk-size parameter (e.g. 1 sec). Does VACOnlineASRProcessor work similarly? Does VACOnlineASRProcessor skip non-voice and then add only voice to OnlineASRProcessor (with fixed chunk size) ?

Additional questions :

Does --vad param is deprecated or not ? Is it enough to use --vac ? Or when to use --vad and --vac ?
Why default value for --vac-chunk-size is 0.04 sec (only 40 ms) ? Why we want pass to VACOnlineASRProcessor such a small chunks ?
How long are chunks added to OnlineASRProcessor in VACOnlineASRProcessor.insert_audio_chunk ? Shouldn't it be a fixed length based on param min-chunk-size e.g. 1 sec ?
How long pause we detect by silero-vad ? 0.5 sec ? Is it possible to set this threshold by param ?
Is it possible to send some event after e.g 1-2 second pause occurred ? I want to call some action when user make e.g 2 sec pause in speaking

Gldkslfmsd · 2024-09-03T14:55:36Z

Hi,

In my own simple VAD implementation, mentioned in #105,
I skip silence from incoming streamed audio and add/process audio chunk to OnlineASRProcessor with length according to min-chunk-size parameter (e.g. 1 sec). Does VACOnlineASRProcessor work similarly? Does VACOnlineASRProcessor skip non-voice and then add only voice to OnlineASRProcessor (with fixed chunk size) ?

yes, it works similarly. VACOnlineASRProcessor works on really short minimum vac-chunk-size. It sends the voiced audio to OnlineASRProcessor. It counts how big chunk arrived, and if its more then OnlineASRProc's min-chunk-size, it triggers processing. If end of voice is detected, it immediately triggers OnlineASRProc.finish() .

--vad may be obsolete now, but a test should confirm it. It is applied on the OnlineASRProc's audio buffer whenever Whisper processes it. So it's repeated everytime which shouldn't. But if it would prevent hallucination or improve the quality, it could be good to keep the option. It probably also handles the context better than finish().

How long pause we detect by silero-vad ? 0.5 sec ? Is it possible to set this threshold by param ?

Yes, 0.5 sec. It is possible to make it configurable parameter. PR welcome.

Is it possible to send some event after e.g 1-2 second pause occurred ? I want to call some action when user make e.g 2 sec pause in speaking

Yes, just change the hardcoded parameter.

lq0104 · 2024-09-25T03:07:45Z

yes, it works similarly. VACOnlineASRProcessor works on really short minimum vac-chunk-size. It sends the voiced audio to OnlineASRProcessor. It counts how big chunk arrived, and if its more then OnlineASRProc's min-chunk-size, it triggers processing. If end of voice is detected, it immediately triggers OnlineASRProc.finalize .

I'm sorry but what is the 'OnlineASRProc.finalize', is it a function? I can't find the location in this project.

Gldkslfmsd · 2024-09-25T08:20:50Z

sorry, I meant .finish() :

whisper_streaming/whisper_online.py

Line 492 in 225f038

def finish(self):

Comment edited. Previously I wrote it from the top of my head, I didn't check the code.

Gldkslfmsd closed this as completed Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation of using VAD, VAC #117

Explanation of using VAD, VAC #117

marcinmatys commented Sep 2, 2024 •

edited

Loading

Gldkslfmsd commented Sep 3, 2024 •

edited

Loading

lq0104 commented Sep 25, 2024

Gldkslfmsd commented Sep 25, 2024

Explanation of using VAD, VAC #117

Explanation of using VAD, VAC #117

Comments

marcinmatys commented Sep 2, 2024 • edited Loading

Gldkslfmsd commented Sep 3, 2024 • edited Loading

lq0104 commented Sep 25, 2024

Gldkslfmsd commented Sep 25, 2024

marcinmatys commented Sep 2, 2024 •

edited

Loading

Gldkslfmsd commented Sep 3, 2024 •

edited

Loading