Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-speaker multi-track audio #131

Open
Tharun1718 opened this issue Oct 16, 2024 · 4 comments
Open

multi-speaker multi-track audio #131

Tharun1718 opened this issue Oct 16, 2024 · 4 comments

Comments

@Tharun1718
Copy link

Hi,

I am trying to transcribe live stereo audio to mono audio and transcribe them, is there any recommended methods to implement this, I have tried converting stereo to mono and my result is very inaccurate.

Thanks in advance for the help

@Gldkslfmsd
Copy link
Collaborator

what is in your stereo?
yes, converting to mono sounds best.
Or one whisper streaming per track?

@Tharun1718
Copy link
Author

hey thanks, actually i am trying to stream conversation between two people(agent and a customer), Can you suggest me some guides where I can study more on this, any suggestion would be great.

@Gldkslfmsd
Copy link
Collaborator

if you have the voices in separate tracks, it's good, you don't need diarization (good topic to know about).

Then you probably need voice activity controller that sends the speaking part track into WhisperStreaming. You can modify this class:

class VACOnlineASRProcessor(OnlineASRProcessor):

You can use multiple Silero VAD Iterator objects: https://github.com/ufal/whisper_streaming/blob/main/silero_vad.py , one for each track, to control sending the voice to the OnlineASRProcessor . When it returns the output, wrap it with info who spoke that.

In any way, you should make sure that the context of the previous turns is not cleared. Use finalize() but do not clear the HypothesisBuffer.

@Tharun1718
Copy link
Author

Thank you for you input, will work on it

@Gldkslfmsd Gldkslfmsd changed the title Does whisper_streaming support streaming stereo audio? multi-speaker stereo Oct 29, 2024
@Gldkslfmsd Gldkslfmsd changed the title multi-speaker stereo multi-speaker multi-track audio Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants