-
-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RealTimeVAD implementation for NodeJS #125
Comments
@ThEditor I am developing an automatic speech recognition function that requires node to judge vads to provide different paragraphs for translation. Can you tell me how to use it? Thank you very much. |
@MhandsomeM The |
@ThEditor Thank you very much for your reply. I have seen the usage method in the document and tested it, but there is something wrong with the printout here. It should not be just composed of 0 and 255. Can you help me see it? const options = {
sampleRate: 16000, // Sample rate of input audio
minBufferDuration: 1, // minimum audio buffer to store
maxBufferDuration: 5, // maximum audio buffer to store
overlapDuration: 0.1, // how much of the previous buffer exists in the new buffer
silenceThreshold: 0.5, // threshold for ignoring pauses in speech
frameSamples: 512, // frameSamples buffer
positiveSpeechThreshold: 0.7,
// negativeSpeechThreshold: 0.7,
redemptionFrames: 10,
preSpeechPadFrames: 5,
minSpeechFrames: 30,
submitUserSpeechOnPause: true,
};
const rtvad = new vad.RealTimeVAD(/** options */ options);
rtvad.on("data", (data) => {
console.log("data", Buffer.from(data.audio).toJSON())
}); data {
type: 'Buffer',
data: [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255,
255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255,
... 27548 more items
]
} |
Can you show me how exactly are you passing data to |
@ThEditor The chunk is the data transmitted by the microphone, the length is 256, and I will do some processing on the data. const BUFFER_SIZE = 1536;
let bufferArr = Buffer.alloc(0);
// The length of the chunk is 256, which is spliced into 1536 here.
async function receiveAudioChunk(chunk) {
bufferArr = Buffer.concat([bufferArr, chunk]);
if (bufferArr.length >= BUFFER_SIZE) {
await rtvad.processAudio(bufferArr)
bufferArr = Buffer.alloc(0); // clear buffer
}
} I want to access my original data source during this period rtvad.on("data", (data) => {
console.log("data", Buffer.from(data.audio).toJSON())
}); |
I think it's better if we discuss this either on an issue in my fork or discord (id: |
How do i install your fork as a package |
There isn't a publicly available package as of right now, I was waiting on a response from @ricky0123 |
Hi, great library from both @ricky0123 and @ThEditor Unfortunately, I'm having trouble sending the correct data to rtvad. Could you show me how you send audio? Here's my code, which captures audio from the microphone and sends it via WebSocket to Node.js. const audioSettings = {
audio: {
echoCancellation: false,
noiseSuppression: false,
autoGainControl: false,
},
};
try {
microphoneStream = await navigator.mediaDevices.getUserMedia(
audioSettings
);
socket = new WebSocket("wss://localhost:8081");
recorder = new MediaRecorder(microphoneStream, {
mimeType: 'audio/webm; codecs="opus"',
});
recorder.addEventListener("dataavailable", ({ data }) => {
socket.send(data);
});
recorder.start(256);
node.js: wss.on('connection', (ws) => {
console.log('Client connected');
ws.on('message', (audio) => {
receiveAudioChunk(audio);
});
});
const BUFFER_SIZE = 1536;
let bufferArr = Buffer.alloc(0);
async function receiveAudioChunk(chunk) {
bufferArr = Buffer.concat([bufferArr, chunk]);
if (bufferArr.length >= BUFFER_SIZE) {
await rtvad.processAudio(bufferArr);
bufferArr = Buffer.alloc(0);
}
}
const options = {
sampleRate: 16000,
minBufferDuration: 1,
maxBufferDuration: 5,
overlapDuration: 0.1,
silenceThreshold: 0.5,
};
const rtvad = new vad.RealTimeVAD(options);
rtvad.init();
rtvad.on('start', ({ start }) => {});
rtvad.on('data', ({ audio, start, end }) => {
console.log('for now only Time ', start, end);
//next i'll do something with audio
});
rtvad.on('end', ({ end }) => {});
server.listen(8081, () => {
console.log('Server is listening on port 8081');
});
|
@skibart |
@ThEditor Now it's send audio correct. thx! But now I'm not sure what happens here, can you help me with that? Is that with audio? It's save, but there is error with open that. wss.on('connection', (ws) => {
console.log('Client connected');
ws.on('message', (audio) => {
receiveAudioChunk(audio);
});
});
const BUFFER_SIZE = 100;
let bufferArr = Buffer.alloc(0);
async function receiveAudioChunk(chunk) {
bufferArr = Buffer.concat([bufferArr, chunk]);
if (bufferArr.length >= BUFFER_SIZE) {
await rtvad.processAudio(bufferArr);
bufferArr = Buffer.alloc(0);
}
}
const options = {
sampleRate: 16000, // Sample rate of input audio
minBufferDuration: 1, // minimum audio buffer to store
maxBufferDuration: 5, // maximum audio buffer to store
overlapDuration: 0.1, // how much of the previous buffer exists in the new buffer
silenceThreshold: 0.5, // threshold for ignoring pauses in speech
};
const rtvad = new vad.RealTimeVAD(options);
rtvad.init();
rtvad.on('start', ({ start }) => {});
rtvad.on('data', ({ audio, start, end }) => {
console.log('Audio start', start);
console.log('Audio end', end);
save(audio);
});
function save(audio) {
const outputFilePath = path.join(__dirname, `../audio/audio-${Date.now()}.wav`);
const fileStream = fs.createWriteStream(outputFilePath, { flags: 'a' });
fileStream.write(audio);
fileStream.end();
} in frontend: async function sendAudioStream() {
const audioSettings = {
audio: {
echoCancellation: false,
noiseSuppression: false,
autoGainControl: false,
sampleRate: 16000,
},
};
try {
microphoneStream = await navigator.mediaDevices.getUserMedia(
audioSettings
);
const audioContext = new AudioContext();
const source = audioContext.createMediaStreamSource(microphoneStream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
socket = new WebSocket("wss://localhost:8081");
processor.onaudioprocess = function (e) {
const inputData = e.inputBuffer.getChannelData(0);
const pcmData = new Int16Array(inputData.length);
for (let i = 0; i < inputData.length; i++) {
pcmData[i] = Math.max(-1, Math.min(1, inputData[i])) * 0x7fff;
}
if (socket.readyState === WebSocket.OPEN) {
socket.send(pcmData.buffer);
}
};
source.connect(processor);
processor.connect(audioContext.destination); |
@skibart I'm not sure I understand what you mean to say? |
@ThEditor I mean that the output file 'audio', which is saved through the 'save(audio)' function, doesn't actually work. Do you know if this is caused by incorrect input? Or should the output audio be converted to another format? The saved WAV file is corrupted, and I can't play it in anything. How did you manage to solve this? Could you share the code in which you capture sound (from the microphone) and save it? |
So I tried using
NonRealTimeVAD
but my use case required a real-time version of it.I've created a fork that adds this functionality but I've never really worked with
playwright
tests, so I wasn't able to open a pull request.I've added
RealTimeVAD
class which builds on top ofNonRealTimeVAD
. Let me know if this change is something that can be pulled in (also, I need help with the playwright tests 😭 )I've manually tested it using node-record-lpcm16.
The text was updated successfully, but these errors were encountered: