Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot handle multiple streams concurrently #138

Closed
MohammedShokr opened this issue Nov 13, 2024 · 3 comments
Closed

Cannot handle multiple streams concurrently #138

MohammedShokr opened this issue Nov 13, 2024 · 3 comments

Comments

@MohammedShokr
Copy link

Issue

I implemented a WebSocket-based version of the whisper_online_server to handle audio streams from clients over WebSocket connections. The implementation works as expected when a single client is streaming; however, when two clients stream simultaneously, significant issues arise:

  • High Latency: Latency increases drastically, sometimes reaching up to a minute for both clients.
  • Non-concurrent Handling: Instead of processing both streams concurrently, it appears as if the server handles them in turns, causing delays and bottlenecks.

Troubleshooting Attempts

I’ve tried both of the following approaches, but neither resolved the issue:

  • Shared ASR Model: Using a single ASR model instance shared across both streams with threading lock.
  • Separate ASR Model Instances: Creating a separate ASR model instance for each client stream.
    Both approaches resulted in the same high-latency, turn-taking behavior.

Code

import sys
import asyncio
import io
import os
import threading  
import librosa
import soundfile as sf
import numpy as np
import websockets
import uuid
from whisper_online import FasterWhisperASR, VACOnlineASRProcessor

# Initialize the model once and share it among all clients
model = FasterWhisperASR(lan="ar", modelsize="large-v2", compute_type="float16", device="cuda")
model.use_vad()

# Lock to ensure thread-safe access to the model
model_lock = threading.Lock()

def process_audio_sync(online_asr_processor, audio):
    """Synchronous function to process audio using the ASR model."""
    online_asr_processor.insert_audio_chunk(audio)
    # Lock the model during the critical section
    with model_lock:
        output = online_asr_processor.process_iter()
    return output

async def process_audio(websocket: websockets.WebSocketServerProtocol, path):
    # Create a per-client processor using the shared model
    online_asr_processor = VACOnlineASRProcessor(
        online_chunk_size=1,
        asr=model,
        tokenizer=None,
        buffer_trimming=("segment", 15),
        logfile=sys.stderr
    )

    loop = asyncio.get_running_loop()

    async for message in websocket:
        if not isinstance(message, bytes):
            print(message)
            continue

        # Process the audio data
        sound_file = sf.SoundFile(
            io.BytesIO(message),
            channels=1,
            endian="LITTLE",
            samplerate=16000,
            subtype="PCM_16",
            format="RAW"
        )
        audio, _ = librosa.load(sound_file, sr=16000, dtype=np.float32, mono=True)

        # Offload the blocking ASR processing to a thread pool executor
        output = await loop.run_in_executor(None, process_audio_sync, online_asr_processor, audio)

        if output[0] is None:
            continue
        else:
            output_formatted = online_asr_processor.to_flush([output])
            await websocket.send(output_formatted[2].encode())

async def main():
    # Disable the server's keepalive pings to prevent timeouts
    async with websockets.serve(process_audio, "localhost", 8765, ping_interval=None, ping_timeout=None):
        await asyncio.Future()

if __name__ == "__main__":
    asyncio.run(main())

@Gldkslfmsd
Copy link
Collaborator

hi,
it's known limitation. GPUs are the fastest with one process only. Sequential processing is faster than concurrent. Current Whisper-Streaming is intended for one client at a time. Batching -- #42 , could help, but still, there will be slow down. Refer to #42.

@MohammedShokr
Copy link
Author

Hi @Gldkslfmsd,
thanks for you reply. so the current Whisper-Streaming cannot be used in production applications with many users? is it just a POC for the streaming?

@Gldkslfmsd
Copy link
Collaborator

yes, it's demo, POC. Not for many users concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants