Trying this project, please use code on
release
branch and ask owner to get released bmodel.
The codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can setup the environment with the following command:
pip install requirements.txt
It also requires the command-line tool ffmpeg
to be installed on your system, which is available from most package managers:
# if you use a conda environment
conda install ffmpeg
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
You can install bmwhisper as follow:
python setup.py install
The following command will transcribe speech in audio files using cpu, using the 'base' model:
bmwhisper demo.wav --model base
To use TPU, firstly, you need to generate the onnx model:
./gen_onnx.sh --model base
# if you want to use kvcache
./gen_onnx.sh --model base --use_kvcache
Then, transform onnx model to bmodel:
./gen_bmodel.sh --model base
# if you want to use kvcache
./gen_bmodel.sh --model base --use_kvcache
# if you want to compare the data when transforming and deploying
./gen_bmodel.sh --model base --compare
Use --inference
button to allow TPU inference mode:
bmwhisper demo.wav --model base --inference