Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation of using VAD, VAC #117

Closed
marcinmatys opened this issue Sep 2, 2024 · 3 comments
Closed

Explanation of using VAD, VAC #117

marcinmatys opened this issue Sep 2, 2024 · 3 comments

Comments

@marcinmatys
Copy link

marcinmatys commented Sep 2, 2024

Could someone explain how to use VAD, VAC and how it works ?

Please provide more detailed description for parameters
--vac Use VAC = voice activity controller. Recommended. Requires torch.
--vac-chunk-size VAC_CHUNK_SIZE, VAC sample size in seconds.
--vad Use VAD = voice activity detection, with the default parameters.

In my own simple VAD implementation, mentioned in issue 105,
I skip silence from incoming streamed audio and add/process audio chunk to OnlineASRProcessor with length according to min-chunk-size parameter (e.g. 1 sec). Does VACOnlineASRProcessor work similarly? Does VACOnlineASRProcessor skip non-voice and then add only voice to OnlineASRProcessor (with fixed chunk size) ?

Additional questions :

  • Does --vad param is deprecated or not ? Is it enough to use --vac ? Or when to use --vad and --vac ?
  • Why default value for --vac-chunk-size is 0.04 sec (only 40 ms) ? Why we want pass to VACOnlineASRProcessor such a small chunks ?
  • How long are chunks added to OnlineASRProcessor in VACOnlineASRProcessor.insert_audio_chunk ? Shouldn't it be a fixed length based on param min-chunk-size e.g. 1 sec ?
  • How long pause we detect by silero-vad ? 0.5 sec ? Is it possible to set this threshold by param ?
  • Is it possible to send some event after e.g 1-2 second pause occurred ? I want to call some action when user make e.g 2 sec pause in speaking
@Gldkslfmsd
Copy link
Collaborator

Gldkslfmsd commented Sep 3, 2024

Hi,

In my own simple VAD implementation, mentioned in #105,
I skip silence from incoming streamed audio and add/process audio chunk to OnlineASRProcessor with length according to min-chunk-size parameter (e.g. 1 sec). Does VACOnlineASRProcessor work similarly? Does VACOnlineASRProcessor skip non-voice and then add only voice to OnlineASRProcessor (with fixed chunk size) ?

yes, it works similarly. VACOnlineASRProcessor works on really short minimum vac-chunk-size. It sends the voiced audio to OnlineASRProcessor. It counts how big chunk arrived, and if its more then OnlineASRProc's min-chunk-size, it triggers processing. If end of voice is detected, it immediately triggers OnlineASRProc.finish() .

--vad may be obsolete now, but a test should confirm it. It is applied on the OnlineASRProc's audio buffer whenever Whisper processes it. So it's repeated everytime which shouldn't. But if it would prevent hallucination or improve the quality, it could be good to keep the option. It probably also handles the context better than finish().

How long pause we detect by silero-vad ? 0.5 sec ? Is it possible to set this threshold by param ?

Yes, 0.5 sec. It is possible to make it configurable parameter. PR welcome.

Is it possible to send some event after e.g 1-2 second pause occurred ? I want to call some action when user make e.g 2 sec pause in speaking

Yes, just change the hardcoded parameter.

@lq0104
Copy link

lq0104 commented Sep 25, 2024

yes, it works similarly. VACOnlineASRProcessor works on really short minimum vac-chunk-size. It sends the voiced audio to OnlineASRProcessor. It counts how big chunk arrived, and if its more then OnlineASRProc's min-chunk-size, it triggers processing. If end of voice is detected, it immediately triggers OnlineASRProc.finalize .

I'm sorry but what is the 'OnlineASRProc.finalize', is it a function? I can't find the location in this project.

@Gldkslfmsd
Copy link
Collaborator

sorry, I meant .finish() :

def finish(self):

Comment edited. Previously I wrote it from the top of my head, I didn't check the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants