-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A crude way of using OpenAI Whisper for alternative dictation in KaldiAG #73
base: master
Are you sure you want to change the base?
Conversation
Glad you found my code helpful and added it here. The whisper model takes either a wav file or an array(not sure of the format). However, I could not get the model working in a timely manner, so I decided to just write to the system. By using io.BytesIO it should be possible to handle it all in memory. |
…e default dictation
… RPC. Also sending data as binary instead of WAV file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grabbing the parent directory and then pointing to whisper_dictation.py
with absolute path.
import os
pardir = os.path.abspath(os.path.join(__file__, os.pardir))
whisper_server = os.path.abspath(os.path.join(pardir, "whisper_server.py"))
subprocess.Popen([sys.executable, whisper_server])
sys.executable
is needed for the Python executable on Windows as system path cannot be relied on especially with multiple Python. I'm curious to know if this works just as well in Ubuntu
OS agnostic temp path for import tempfile
temp_dir = tempfile.TemporaryDirectory().name
audio_filename = os.path.join(temp_dir,"whisper.wav")
temp_dir.cleanup() # place near the end of `die()` function |
…Windows, thanks to @LexiconCode.
Thanks @LexiconCode for these 2 portability improvements, I've uploaded them now :-) |
@shervinemami I actually don't think these changes are necessary to support using Whisper in KaldiAG. Since the I am adding a somewhat-related note here from gitter: You will likely find alternative dictation to work better for dictation utterances that don't include any "command parts". The problem is that, for the example you posted, KaldiAG tries its best to "cut out" the part where you preface the utterance by spanking "whisper", and only pass the rest of the audio to whisper, but doing that is quite difficult and inexact. You might want to try something like having command that enables a pure dictation rule ( |
This is a fairly crude implementation, including various hard-coded settings for Whisper's "base.en" English model, and probably only currently works on Linux & OS X since it has a hardcoded tmpfile path. But it's good enough to begin playing with.