RealTimeVAD implementation for NodeJS #125

ThEditor · 2024-07-07T09:47:58Z

So I tried using NonRealTimeVAD but my use case required a real-time version of it.

I've created a fork that adds this functionality but I've never really worked with playwright tests, so I wasn't able to open a pull request.

I've added RealTimeVAD class which builds on top of NonRealTimeVAD. Let me know if this change is something that can be pulled in (also, I need help with the playwright tests 😭 )

I've manually tested it using node-record-lpcm16.

The text was updated successfully, but these errors were encountered:

MhandsomeM · 2024-07-08T01:43:22Z

@ThEditor I am developing an automatic speech recognition function that requires node to judge vads to provide different paragraphs for translation. Can you tell me how to use it? Thank you very much.

ThEditor · 2024-07-08T04:26:13Z

@MhandsomeM The README.md of the fork shows how to use it, though the fork is not available as an npm package. Lemme know if I should do that, until then you can copy over RealTimeVAD class to your source.

MhandsomeM · 2024-07-08T07:24:24Z

@ThEditor Thank you very much for your reply. I have seen the usage method in the document and tested it, but there is something wrong with the printout here. It should not be just composed of 0 and 255. Can you help me see it?

const options = {
  sampleRate: 16000, // Sample rate of input audio
  minBufferDuration: 1, // minimum audio buffer to store 
  maxBufferDuration: 5, // maximum audio buffer to store
  overlapDuration: 0.1,  // how much of the previous buffer exists in the new buffer
  silenceThreshold: 0.5, // threshold for ignoring pauses in speech
  frameSamples: 512, // frameSamples buffer
  positiveSpeechThreshold: 0.7,
  // negativeSpeechThreshold: 0.7,
  redemptionFrames: 10,
  preSpeechPadFrames: 5,
  minSpeechFrames: 30,
  submitUserSpeechOnPause: true,
};
const rtvad = new vad.RealTimeVAD(/** options */ options);

rtvad.on("data", (data) => {
  console.log("data", Buffer.from(data.audio).toJSON())
});

data {
  type: 'Buffer',
  data: [
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255,
    255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,
      0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255,
    ... 27548 more items
  ]
}

ThEditor · 2024-07-08T09:33:56Z

Can you show me how exactly are you passing data to RealTimeVAD ?
(the part that calls the processAudio function)

MhandsomeM · 2024-07-08T10:16:03Z

@ThEditor The chunk is the data transmitted by the microphone, the length is 256, and I will do some processing on the data.

const BUFFER_SIZE = 1536;
let bufferArr = Buffer.alloc(0);
// The length of the chunk is 256, which is spliced into 1536 here.
async function receiveAudioChunk(chunk) {
    bufferArr = Buffer.concat([bufferArr, chunk]);

    if (bufferArr.length >= BUFFER_SIZE) {
        await rtvad.processAudio(bufferArr)
        bufferArr = Buffer.alloc(0); // clear buffer
    }
}

I want to access my original data source during this period

rtvad.on("data", (data) => {
  console.log("data", Buffer.from(data.audio).toJSON())
});

ThEditor · 2024-07-08T11:07:15Z

I think it's better if we discuss this either on an issue in my fork or discord (id: theditor).

Sheldenshi · 2024-08-07T05:21:17Z

How do i install your fork as a package

ThEditor · 2024-08-07T10:34:30Z

How do i install your fork as a package

There isn't a publicly available package as of right now, I was waiting on a response from @ricky0123
Though since I haven't heard from him, I guess I can create one, I'll update you if I do

skibart · 2024-08-14T13:52:46Z

Hi, great library from both @ricky0123 and @ThEditor

Unfortunately, I'm having trouble sending the correct data to rtvad. Could you show me how you send audio?

Here's my code, which captures audio from the microphone and sends it via WebSocket to Node.js.

const audioSettings = {
          audio: {
            echoCancellation: false,
            noiseSuppression: false,
            autoGainControl: false,
          },
        };

        try {
          microphoneStream = await navigator.mediaDevices.getUserMedia(
            audioSettings
          );

          socket = new WebSocket("wss://localhost:8081");

          recorder = new MediaRecorder(microphoneStream, {
            mimeType: 'audio/webm; codecs="opus"',
          });

          recorder.addEventListener("dataavailable", ({ data }) => {
            socket.send(data);
          });

          recorder.start(256);

node.js:

wss.on('connection', (ws) => {
  console.log('Client connected');
  ws.on('message', (audio) => {
    receiveAudioChunk(audio);
  });
});

const BUFFER_SIZE = 1536;
let bufferArr = Buffer.alloc(0);

async function receiveAudioChunk(chunk) {
  bufferArr = Buffer.concat([bufferArr, chunk]);

  if (bufferArr.length >= BUFFER_SIZE) {
    await rtvad.processAudio(bufferArr);
    bufferArr = Buffer.alloc(0);
  }
}

const options = {
  sampleRate: 16000,
  minBufferDuration: 1,
  maxBufferDuration: 5,
  overlapDuration: 0.1,
  silenceThreshold: 0.5,
};

const rtvad = new vad.RealTimeVAD(options);

rtvad.init();

rtvad.on('start', ({ start }) => {});

rtvad.on('data', ({ audio, start, end }) => {
  console.log('for now only Time ', start, end);
  //next i'll do something with audio
});

rtvad.on('end', ({ end }) => {});

server.listen(8081, () => {
  console.log('Server is listening on port 8081');
});

ThEditor · 2024-08-14T15:35:22Z

@skibart
You're sending opus audio, try using pcm instead. Also make sure the sample rate is the same as the input audio.

skibart · 2024-08-14T16:31:19Z

@ThEditor Now it's send audio correct. thx!

But now I'm not sure what happens here, can you help me with that?

Is that with audio? It's save, but there is error with open that.

wss.on('connection', (ws) => {
  console.log('Client connected');
  ws.on('message', (audio) => {
    receiveAudioChunk(audio);
  });
});

const BUFFER_SIZE = 100;
let bufferArr = Buffer.alloc(0);

async function receiveAudioChunk(chunk) {
  bufferArr = Buffer.concat([bufferArr, chunk]);

  if (bufferArr.length >= BUFFER_SIZE) {
    await rtvad.processAudio(bufferArr);
    bufferArr = Buffer.alloc(0);
  }
}

const options = {
  sampleRate: 16000, // Sample rate of input audio
  minBufferDuration: 1, // minimum audio buffer to store
  maxBufferDuration: 5, // maximum audio buffer to store
  overlapDuration: 0.1, // how much of the previous buffer exists in the new buffer
  silenceThreshold: 0.5, // threshold for ignoring pauses in speech
};

const rtvad = new vad.RealTimeVAD(options);

rtvad.init();

rtvad.on('start', ({ start }) => {});

rtvad.on('data', ({ audio, start, end }) => {
  console.log('Audio start', start);
  console.log('Audio end', end);
  save(audio);
});

function save(audio) {
  const outputFilePath = path.join(__dirname, `../audio/audio-${Date.now()}.wav`);
  const fileStream = fs.createWriteStream(outputFilePath, { flags: 'a' });
  fileStream.write(audio);
  fileStream.end();
}

in frontend:

   async function sendAudioStream() {
        const audioSettings = {
          audio: {
            echoCancellation: false,
            noiseSuppression: false,
            autoGainControl: false,
            sampleRate: 16000,
          },
        };

        try {
          microphoneStream = await navigator.mediaDevices.getUserMedia(
            audioSettings
          );

          const audioContext = new AudioContext();
          const source = audioContext.createMediaStreamSource(microphoneStream);
          const processor = audioContext.createScriptProcessor(4096, 1, 1);

          socket = new WebSocket("wss://localhost:8081");

          processor.onaudioprocess = function (e) {
            const inputData = e.inputBuffer.getChannelData(0);

            const pcmData = new Int16Array(inputData.length);
            for (let i = 0; i < inputData.length; i++) {
              pcmData[i] = Math.max(-1, Math.min(1, inputData[i])) * 0x7fff;
            }

            if (socket.readyState === WebSocket.OPEN) {
              socket.send(pcmData.buffer);
            }
          };

          source.connect(processor);
          processor.connect(audioContext.destination);

ThEditor · 2024-08-14T16:35:22Z

@skibart I'm not sure I understand what you mean to say?

skibart · 2024-08-14T16:56:54Z

@ThEditor I mean that the output file 'audio', which is saved through the 'save(audio)' function, doesn't actually work. Do you know if this is caused by incorrect input? Or should the output audio be converted to another format? The saved WAV file is corrupted, and I can't play it in anything. How did you manage to solve this? Could you share the code in which you capture sound (from the microphone) and save it?

ThEditor · 2024-08-14T17:44:43Z

@skibart
Try using Writer inside rtvad.on('data') from wav package.

ThEditor mentioned this issue Jul 8, 2024

fix: resolve "data" callback audio incorrect issue ThEditor/vad#2

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RealTimeVAD implementation for NodeJS #125

RealTimeVAD implementation for NodeJS #125

ThEditor commented Jul 7, 2024

MhandsomeM commented Jul 8, 2024 •

edited

Loading

ThEditor commented Jul 8, 2024

MhandsomeM commented Jul 8, 2024 •

edited

Loading

ThEditor commented Jul 8, 2024 •

edited

Loading

MhandsomeM commented Jul 8, 2024 •

edited

Loading

ThEditor commented Jul 8, 2024

Sheldenshi commented Aug 7, 2024

ThEditor commented Aug 7, 2024

skibart commented Aug 14, 2024 •

edited

Loading

ThEditor commented Aug 14, 2024

skibart commented Aug 14, 2024

ThEditor commented Aug 14, 2024 •

edited

Loading

skibart commented Aug 14, 2024

ThEditor commented Aug 14, 2024

RealTimeVAD implementation for NodeJS #125

RealTimeVAD implementation for NodeJS #125

Comments

ThEditor commented Jul 7, 2024

MhandsomeM commented Jul 8, 2024 • edited Loading

ThEditor commented Jul 8, 2024

MhandsomeM commented Jul 8, 2024 • edited Loading

ThEditor commented Jul 8, 2024 • edited Loading

MhandsomeM commented Jul 8, 2024 • edited Loading

ThEditor commented Jul 8, 2024

Sheldenshi commented Aug 7, 2024

ThEditor commented Aug 7, 2024

skibart commented Aug 14, 2024 • edited Loading

ThEditor commented Aug 14, 2024

skibart commented Aug 14, 2024

ThEditor commented Aug 14, 2024 • edited Loading

skibart commented Aug 14, 2024

ThEditor commented Aug 14, 2024

MhandsomeM commented Jul 8, 2024 •

edited

Loading

MhandsomeM commented Jul 8, 2024 •

edited

Loading

ThEditor commented Jul 8, 2024 •

edited

Loading

MhandsomeM commented Jul 8, 2024 •

edited

Loading

skibart commented Aug 14, 2024 •

edited

Loading

ThEditor commented Aug 14, 2024 •

edited

Loading