You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I implemented a WebSocket-based version of the whisper_online_server to handle audio streams from clients over WebSocket connections. The implementation works as expected when a single client is streaming; however, when two clients stream simultaneously, significant issues arise:
High Latency: Latency increases drastically, sometimes reaching up to a minute for both clients.
Non-concurrent Handling: Instead of processing both streams concurrently, it appears as if the server handles them in turns, causing delays and bottlenecks.
Troubleshooting Attempts
I’ve tried both of the following approaches, but neither resolved the issue:
Shared ASR Model: Using a single ASR model instance shared across both streams with threading lock.
Separate ASR Model Instances: Creating a separate ASR model instance for each client stream.
Both approaches resulted in the same high-latency, turn-taking behavior.
Code
import sys
import asyncio
import io
import os
import threading
import librosa
import soundfile as sf
import numpy as np
import websockets
import uuid
from whisper_online import FasterWhisperASR, VACOnlineASRProcessor
# Initialize the model once and share it among all clients
model = FasterWhisperASR(lan="ar", modelsize="large-v2", compute_type="float16", device="cuda")
model.use_vad()
# Lock to ensure thread-safe access to the model
model_lock = threading.Lock()
def process_audio_sync(online_asr_processor, audio):
"""Synchronous function to process audio using the ASR model."""
online_asr_processor.insert_audio_chunk(audio)
# Lock the model during the critical section
with model_lock:
output = online_asr_processor.process_iter()
return output
async def process_audio(websocket: websockets.WebSocketServerProtocol, path):
# Create a per-client processor using the shared model
online_asr_processor = VACOnlineASRProcessor(
online_chunk_size=1,
asr=model,
tokenizer=None,
buffer_trimming=("segment", 15),
logfile=sys.stderr
)
loop = asyncio.get_running_loop()
async for message in websocket:
if not isinstance(message, bytes):
print(message)
continue
# Process the audio data
sound_file = sf.SoundFile(
io.BytesIO(message),
channels=1,
endian="LITTLE",
samplerate=16000,
subtype="PCM_16",
format="RAW"
)
audio, _ = librosa.load(sound_file, sr=16000, dtype=np.float32, mono=True)
# Offload the blocking ASR processing to a thread pool executor
output = await loop.run_in_executor(None, process_audio_sync, online_asr_processor, audio)
if output[0] is None:
continue
else:
output_formatted = online_asr_processor.to_flush([output])
await websocket.send(output_formatted[2].encode())
async def main():
# Disable the server's keepalive pings to prevent timeouts
async with websockets.serve(process_audio, "localhost", 8765, ping_interval=None, ping_timeout=None):
await asyncio.Future()
if __name__ == "__main__":
asyncio.run(main())
The text was updated successfully, but these errors were encountered:
hi,
it's known limitation. GPUs are the fastest with one process only. Sequential processing is faster than concurrent. Current Whisper-Streaming is intended for one client at a time. Batching -- #42 , could help, but still, there will be slow down. Refer to #42.
Hi @Gldkslfmsd,
thanks for you reply. so the current Whisper-Streaming cannot be used in production applications with many users? is it just a POC for the streaming?
Issue
I implemented a WebSocket-based version of the
whisper_online_server
to handle audio streams from clients over WebSocket connections. The implementation works as expected when a single client is streaming; however, when two clients stream simultaneously, significant issues arise:Troubleshooting Attempts
I’ve tried both of the following approaches, but neither resolved the issue:
Both approaches resulted in the same high-latency, turn-taking behavior.
Code
The text was updated successfully, but these errors were encountered: