Skip to content
View KoljaB's full-sized avatar

Block or report KoljaB

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KoljaB/README.md

I develop high-performance voice applications that run in real time, handle heavy workloads, and maintain stable, low-latency performance even at global scale.

Many developers already rely on my open-source libraries (like RealtimeSTT, RealtimeTTS and Linguflex) for accurate, responsive transcription and text-to-speech.

My services include:

  • Setting up end-to-end real-time speech recognition and TTS pipelines for multi-user, low-latency environments.
  • Implementing GPU optimization, load balancing, and runtime configuration for scaling under heavy traffic.
  • Integrating various TTS engines (e.g. Coqui, StyleTTS2, Azure, ElevenLabs, ... ) and fine-tuning them for quality and speed.
  • Applying sophisticated audio chunking, compression, and sentence logic to improve transcription accuracy and responsiveness.
  • Enabling real-time streaming via browser-based clients with stable, continuous, and natural-sounding output.
  • Integrating voice activity detection (VAD), wake-word recognition, and other advanced speech features.
  • Incorporating large language models to support streaming tool-calling, contextual responses, and dynamic memory or RAG workflows.
  • Offering reliable backend infrastructure advice, including containerization and orchestration (e.g. Modal, Runpod, Kubernetes) for global-scale operations.
  • Providing guidance on tuning parameters to ensure optimal audio quality, stable performance, and minimal latency.
  • Advising on training data and model selection to achieve fast, robust, and context-aware results in speech and language tasks.

If you’re looking for someone to implement or enhance voice features quickly, reliably, and at scale, you’re more than welcome to contact me at [email protected].

Pinned Loading

  1. RealtimeSTT RealtimeSTT Public

    A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

    Python 2.2k 212

  2. RealtimeTTS RealtimeTTS Public

    Converts text to speech in realtime

    Python 2.2k 215

  3. Linguflex Linguflex Public

    Command Your World with Voice

    Python 466 48

  4. LocalAIVoiceChat LocalAIVoiceChat Public

    Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.

    Python 538 59

  5. stream2sentence stream2sentence Public

    Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.

    Python 35 10

  6. WhoSpeaks WhoSpeaks Public

    Efficient approach to speaker diarization using voice characteristics extraction

    Python 71 7