A high-performance ASR (Automatic Speech Recognition) server implementation using Whisper, supporting both gRPC and REST APIs.
This project provides a server implementation for speech-to-text transcription using OpenAI's Whisper model, optimized for different platforms and hardware acceleration options.
- gRPC Server
- Stream Transcription
- Web API
- Task Management
- Task Status
- Create Task By URL
- Create Task By Local File
- Authentication API Key Management
- Schedule Task
- Download Audio File
- Transcription
- Http Callback
- Authentication
- Multiple Platform Support
- MacOS (Metal)
- Linux (CUDA)
- Windows (CUDA)
- Rust toolchain (1.70 or later)
- For CUDA support: CUDA toolkit 11.x or later
- For Metal support (MacOS): XCode and Metal SDK
- etcd server running locally or accessible (unnecessary, only for microservice go-micro)
- Clone the repository:
git clone https://github.com/bean-du/SpeakSense
cd SpeakSense
- Download the Whisper model:
./script/download-ggml-model.sh
- Build the project:
# Standard build
cargo build --release
# With CUDA support
cargo build --release --features cuda
# With Metal support (MacOS)
cargo build --release --features metal
ASR_SQLITE_PATH
SQLite Path (default:sqlite://./asr_data/database/storage.db?mode=rwc
)ASR_AUDIO_PATH
Audio Path (default:./asr_data/audio/
)ETCD_DEFAULT_ENDPOINT
Etcd Endpoint (default:http://localhost:2379
)ASR_MODEL_PATH
Whisper Model Path (default:./models/ggml-large-v3.bin
)
cargo run --release
cargo run --release --features cuda
First, set the Metal resources path:
export GGML_METAL_PATH_RESOURCES="./resources"
cargo run --release --features metal
docker Only support linux cuda x86_64 now The easiest way to get started is using Docker Compose:
- Create required directories:
mkdir -p models asr_data/audio asr_data/database
- Download the Whisper model:
./script/download-ggml-model.sh
- Start the server:
# Standard version
docker-compose up -d
# With CUDA support
ASR_FEATURES=cuda docker-compose up -d
# With Metal support (MacOS)
ASR_FEATURES=metal docker-compose up -d
- Check the logs:
docker-compose logs -f
- Stop the server:
docker-compose down
The server will be available at:
- REST API: http://localhost:7200
- gRPC: localhost:7300
The default configuration includes:
- Automatic volume mapping for models and data persistence
- GPU support (when using CUDA feature)
- Optional etcd service
- Environment variable configuration
You can customize the configuration by:
- Modifying environment variables in docker-compose.yml
- Adding or removing services as needed
- Adjusting resource limits and port mappings
# Use local wav file
cargo run --example asr_client -- -i 2.wav
# Specify server address
cargo run --example asr_client -- -i test/2.wav -s http://127.0.0.1:7300
# Specify device id
cargo run --example asr_client -- -i input.wav -d test-device
curl -X POST http://localhost:7200/api/v1/asr/tasks \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{"audio_url": "https://example.com/audio.wav"}'
curl http://localhost:7200/api/v1/asr/tasks/{task_id} \
-H "Authorization: Bearer your-api-key"
The server supports various Whisper model sizes. You can download different models from Hugging Face: https://huggingface.co/ggerganov/whisper.cpp/tree/main
- For CUDA: Adjust batch size and worker threads based on your GPU memory
- For Metal: Ensure proper resource path configuration
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
- OpenAI Whisper
- whisper.cpp
- whisper-rs