Cannot handle multiple streams concurrently #138

MohammedShokr · 2024-11-13T11:00:27Z

Issue

I implemented a WebSocket-based version of the whisper_online_server to handle audio streams from clients over WebSocket connections. The implementation works as expected when a single client is streaming; however, when two clients stream simultaneously, significant issues arise:

High Latency: Latency increases drastically, sometimes reaching up to a minute for both clients.
Non-concurrent Handling: Instead of processing both streams concurrently, it appears as if the server handles them in turns, causing delays and bottlenecks.

Troubleshooting Attempts

I’ve tried both of the following approaches, but neither resolved the issue:

Shared ASR Model: Using a single ASR model instance shared across both streams with threading lock.
Separate ASR Model Instances: Creating a separate ASR model instance for each client stream.
Both approaches resulted in the same high-latency, turn-taking behavior.

Code

import sys
import asyncio
import io
import os
import threading  
import librosa
import soundfile as sf
import numpy as np
import websockets
import uuid
from whisper_online import FasterWhisperASR, VACOnlineASRProcessor

# Initialize the model once and share it among all clients
model = FasterWhisperASR(lan="ar", modelsize="large-v2", compute_type="float16", device="cuda")
model.use_vad()

# Lock to ensure thread-safe access to the model
model_lock = threading.Lock()

def process_audio_sync(online_asr_processor, audio):
    """Synchronous function to process audio using the ASR model."""
    online_asr_processor.insert_audio_chunk(audio)
    # Lock the model during the critical section
    with model_lock:
        output = online_asr_processor.process_iter()
    return output

async def process_audio(websocket: websockets.WebSocketServerProtocol, path):
    # Create a per-client processor using the shared model
    online_asr_processor = VACOnlineASRProcessor(
        online_chunk_size=1,
        asr=model,
        tokenizer=None,
        buffer_trimming=("segment", 15),
        logfile=sys.stderr
    )

    loop = asyncio.get_running_loop()

    async for message in websocket:
        if not isinstance(message, bytes):
            print(message)
            continue

        # Process the audio data
        sound_file = sf.SoundFile(
            io.BytesIO(message),
            channels=1,
            endian="LITTLE",
            samplerate=16000,
            subtype="PCM_16",
            format="RAW"
        )
        audio, _ = librosa.load(sound_file, sr=16000, dtype=np.float32, mono=True)

        # Offload the blocking ASR processing to a thread pool executor
        output = await loop.run_in_executor(None, process_audio_sync, online_asr_processor, audio)

        if output[0] is None:
            continue
        else:
            output_formatted = online_asr_processor.to_flush([output])
            await websocket.send(output_formatted[2].encode())

async def main():
    # Disable the server's keepalive pings to prevent timeouts
    async with websockets.serve(process_audio, "localhost", 8765, ping_interval=None, ping_timeout=None):
        await asyncio.Future()

if __name__ == "__main__":
    asyncio.run(main())

The text was updated successfully, but these errors were encountered:

Gldkslfmsd · 2024-11-13T15:35:57Z

hi,
it's known limitation. GPUs are the fastest with one process only. Sequential processing is faster than concurrent. Current Whisper-Streaming is intended for one client at a time. Batching -- #42 , could help, but still, there will be slow down. Refer to #42.

MohammedShokr · 2024-11-13T15:43:55Z

Hi @Gldkslfmsd,
thanks for you reply. so the current Whisper-Streaming cannot be used in production applications with many users? is it just a POC for the streaming?

Gldkslfmsd · 2024-11-13T15:52:00Z

yes, it's demo, POC. Not for many users concurrently.

Gldkslfmsd closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot handle multiple streams concurrently #138

Cannot handle multiple streams concurrently #138

MohammedShokr commented Nov 13, 2024

Gldkslfmsd commented Nov 13, 2024

MohammedShokr commented Nov 13, 2024

Gldkslfmsd commented Nov 13, 2024

Cannot handle multiple streams concurrently #138

Cannot handle multiple streams concurrently #138

Comments

MohammedShokr commented Nov 13, 2024

Issue

Troubleshooting Attempts

Code

Gldkslfmsd commented Nov 13, 2024

MohammedShokr commented Nov 13, 2024

Gldkslfmsd commented Nov 13, 2024