Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RealTimeVAD implementation for NodeJS #125

Open
ThEditor opened this issue Jul 7, 2024 · 14 comments
Open

RealTimeVAD implementation for NodeJS #125

ThEditor opened this issue Jul 7, 2024 · 14 comments

Comments

@ThEditor
Copy link

ThEditor commented Jul 7, 2024

So I tried using NonRealTimeVAD but my use case required a real-time version of it.

I've created a fork that adds this functionality but I've never really worked with playwright tests, so I wasn't able to open a pull request.

I've added RealTimeVAD class which builds on top of NonRealTimeVAD. Let me know if this change is something that can be pulled in (also, I need help with the playwright tests 😭 )

I've manually tested it using node-record-lpcm16.

@MhandsomeM
Copy link

MhandsomeM commented Jul 8, 2024

@ThEditor I am developing an automatic speech recognition function that requires node to judge vads to provide different paragraphs for translation. Can you tell me how to use it? Thank you very much.

@ThEditor
Copy link
Author

ThEditor commented Jul 8, 2024

@MhandsomeM The README.md of the fork shows how to use it, though the fork is not available as an npm package. Lemme know if I should do that, until then you can copy over RealTimeVAD class to your source.

@MhandsomeM
Copy link

MhandsomeM commented Jul 8, 2024

@ThEditor Thank you very much for your reply. I have seen the usage method in the document and tested it, but there is something wrong with the printout here. It should not be just composed of 0 and 255. Can you help me see it?

const options = {
  sampleRate: 16000, // Sample rate of input audio
  minBufferDuration: 1, // minimum audio buffer to store 
  maxBufferDuration: 5, // maximum audio buffer to store
  overlapDuration: 0.1,  // how much of the previous buffer exists in the new buffer
  silenceThreshold: 0.5, // threshold for ignoring pauses in speech
  frameSamples: 512, // frameSamples buffer
  positiveSpeechThreshold: 0.7,
  // negativeSpeechThreshold: 0.7,
  redemptionFrames: 10,
  preSpeechPadFrames: 5,
  minSpeechFrames: 30,
  submitUserSpeechOnPause: true,
};
const rtvad = new vad.RealTimeVAD(/** options */ options);

rtvad.on("data", (data) => {
  console.log("data", Buffer.from(data.audio).toJSON())
});
data {
  type: 'Buffer',
  data: [
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255,
    255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255,
    ... 27548 more items
  ]
}

@ThEditor
Copy link
Author

ThEditor commented Jul 8, 2024

Can you show me how exactly are you passing data to RealTimeVAD ?
(the part that calls the processAudio function)

@MhandsomeM
Copy link

MhandsomeM commented Jul 8, 2024

@ThEditor The chunk is the data transmitted by the microphone, the length is 256, and I will do some processing on the data.

const BUFFER_SIZE = 1536;
let bufferArr = Buffer.alloc(0);
// The length of the chunk is 256, which is spliced into 1536 here.
async function receiveAudioChunk(chunk) {
    bufferArr = Buffer.concat([bufferArr, chunk]);

    if (bufferArr.length >= BUFFER_SIZE) {
        await rtvad.processAudio(bufferArr)
        bufferArr = Buffer.alloc(0); // clear buffer
    }
}

I want to access my original data source during this period

rtvad.on("data", (data) => {
  console.log("data", Buffer.from(data.audio).toJSON())
});

@ThEditor
Copy link
Author

ThEditor commented Jul 8, 2024

I think it's better if we discuss this either on an issue in my fork or discord (id: theditor).

@Sheldenshi
Copy link

How do i install your fork as a package

@ThEditor
Copy link
Author

ThEditor commented Aug 7, 2024

How do i install your fork as a package

There isn't a publicly available package as of right now, I was waiting on a response from @ricky0123
Though since I haven't heard from him, I guess I can create one, I'll update you if I do

@skibart
Copy link

skibart commented Aug 14, 2024

Hi, great library from both @ricky0123 and @ThEditor

Unfortunately, I'm having trouble sending the correct data to rtvad. Could you show me how you send audio?

Here's my code, which captures audio from the microphone and sends it via WebSocket to Node.js.

const audioSettings = {
          audio: {
            echoCancellation: false,
            noiseSuppression: false,
            autoGainControl: false,
          },
        };

        try {
          microphoneStream = await navigator.mediaDevices.getUserMedia(
            audioSettings
          );

          socket = new WebSocket("wss://localhost:8081");

          recorder = new MediaRecorder(microphoneStream, {
            mimeType: 'audio/webm; codecs="opus"',
          });

          recorder.addEventListener("dataavailable", ({ data }) => {
            socket.send(data);
          });

          recorder.start(256);

node.js:

wss.on('connection', (ws) => {
  console.log('Client connected');
  ws.on('message', (audio) => {
    receiveAudioChunk(audio);
  });
});

const BUFFER_SIZE = 1536;
let bufferArr = Buffer.alloc(0);

async function receiveAudioChunk(chunk) {
  bufferArr = Buffer.concat([bufferArr, chunk]);

  if (bufferArr.length >= BUFFER_SIZE) {
    await rtvad.processAudio(bufferArr);
    bufferArr = Buffer.alloc(0);
  }
}

const options = {
  sampleRate: 16000,
  minBufferDuration: 1,
  maxBufferDuration: 5,
  overlapDuration: 0.1,
  silenceThreshold: 0.5,
};

const rtvad = new vad.RealTimeVAD(options);

rtvad.init();

rtvad.on('start', ({ start }) => {});

rtvad.on('data', ({ audio, start, end }) => {
  console.log('for now only Time ', start, end);
  //next i'll do something with audio
});

rtvad.on('end', ({ end }) => {});

server.listen(8081, () => {
  console.log('Server is listening on port 8081');
});


@ThEditor
Copy link
Author

@skibart
You're sending opus audio, try using pcm instead. Also make sure the sample rate is the same as the input audio.

@skibart
Copy link

skibart commented Aug 14, 2024

@ThEditor Now it's send audio correct. thx!

But now I'm not sure what happens here, can you help me with that?

Is that with audio? It's save, but there is error with open that.

wss.on('connection', (ws) => {
  console.log('Client connected');
  ws.on('message', (audio) => {
    receiveAudioChunk(audio);
  });
});

const BUFFER_SIZE = 100;
let bufferArr = Buffer.alloc(0);

async function receiveAudioChunk(chunk) {
  bufferArr = Buffer.concat([bufferArr, chunk]);

  if (bufferArr.length >= BUFFER_SIZE) {
    await rtvad.processAudio(bufferArr);
    bufferArr = Buffer.alloc(0);
  }
}

const options = {
  sampleRate: 16000, // Sample rate of input audio
  minBufferDuration: 1, // minimum audio buffer to store
  maxBufferDuration: 5, // maximum audio buffer to store
  overlapDuration: 0.1, // how much of the previous buffer exists in the new buffer
  silenceThreshold: 0.5, // threshold for ignoring pauses in speech
};

const rtvad = new vad.RealTimeVAD(options);

rtvad.init();

rtvad.on('start', ({ start }) => {});

rtvad.on('data', ({ audio, start, end }) => {
  console.log('Audio start', start);
  console.log('Audio end', end);
  save(audio);
});

function save(audio) {
  const outputFilePath = path.join(__dirname, `../audio/audio-${Date.now()}.wav`);
  const fileStream = fs.createWriteStream(outputFilePath, { flags: 'a' });
  fileStream.write(audio);
  fileStream.end();
}

in frontend:

   async function sendAudioStream() {
        const audioSettings = {
          audio: {
            echoCancellation: false,
            noiseSuppression: false,
            autoGainControl: false,
            sampleRate: 16000,
          },
        };

        try {
          microphoneStream = await navigator.mediaDevices.getUserMedia(
            audioSettings
          );

          const audioContext = new AudioContext();
          const source = audioContext.createMediaStreamSource(microphoneStream);
          const processor = audioContext.createScriptProcessor(4096, 1, 1);

          socket = new WebSocket("wss://localhost:8081");

          processor.onaudioprocess = function (e) {
            const inputData = e.inputBuffer.getChannelData(0);

            const pcmData = new Int16Array(inputData.length);
            for (let i = 0; i < inputData.length; i++) {
              pcmData[i] = Math.max(-1, Math.min(1, inputData[i])) * 0x7fff;
            }

            if (socket.readyState === WebSocket.OPEN) {
              socket.send(pcmData.buffer);
            }
          };

          source.connect(processor);
          processor.connect(audioContext.destination);

@ThEditor
Copy link
Author

ThEditor commented Aug 14, 2024

@skibart I'm not sure I understand what you mean to say?

@skibart
Copy link

skibart commented Aug 14, 2024

@ThEditor I mean that the output file 'audio', which is saved through the 'save(audio)' function, doesn't actually work. Do you know if this is caused by incorrect input? Or should the output audio be converted to another format? The saved WAV file is corrupted, and I can't play it in anything. How did you manage to solve this? Could you share the code in which you capture sound (from the microphone) and save it?

@ThEditor
Copy link
Author

@skibart
Try using Writer inside rtvad.on('data') from wav package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants