-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting whisper.cpp? #376
Comments
Hello @tachyonicbytes, Support for OpenAI Whisper has come up before, I think in the Gitter chat room. There are no current plans to support it in Dragonfly, at least not on its own. Shervin Emami (@shervinemami) managed to get it working together with Dragonfly's Kaldi engine last year. He was able to use Whisper, instead of Kaldi, for the dictation parts of grammar rules. If I remember correctly, this improved the recognition accuracy of those parts. See daanzu/kaldi-active-grammar#73 for more on that. In order to use Whisper for the command parts too, it would be necessary to write a dedicated Dragonfly-Whisper engine implementation. However, impressive as Whisper is, its natural language ASR models are quite unsuitable for the typical Dragonfly command phrases defined in speech grammars. Unless I am mistaken, there is no way to trim Whisper's recognition search tree in real time — to have the software strictly consider only those hypotheses which fit active Dragonfly grammars. If it becomes possible to do that, and if commands are recognisable with a high degree of accuracy and speed, then an engine implementation for Whisper might be worth considering. But those are two big ifs! I don't think the folks at OpenAI are capable of such sorcery. :-) |
I went ahead and made an inquiry. Thanks for the verbiage Danesprite. Opening discussion There's an early implementation. "Guided mode" Example |
Thank you for investigating further, Aaron. I was unaware of guided mode. It is a start, but would not be adequate without significant changes. Since this mode takes a flat list of commands, a Dragonfly-Whisper implementation would have to output every possible command phrase to a text file. It would be simple enough to do this for a spec string like As you say in the linked discussion, Dragonfly also needs the ability to activate and deactivate command phrases. Without this, contexts wouldn't work properly. Another issue is that it would not be possible to recognise the dictation parts of commands in the same utterance. This all seems unnecessary to me, really. Dragonfly already has several engine implementations that do these things well. Whisper, in my opinion, is just not the right tool for this type of work. |
I've used Whisper in Dragonfly for dictation while using KaldiAG for commands, and I definitely agree with Danesprite that Whisper isn't suited to command mode even if you're willing to put lots of effort into customising it. Whisper works great at full sentences, it's an excellent choice for long dictation, but struggles with dictating anything less than a few words, so even dictating something as short as "hi how are you?" is very unreliable in Whisper. This gives me the assumption it would really struggle if used specifically for single-word commands. |
Thanks, Shervin. Your point about accuracy for short phrases is important. Whisper's models were not trained for this purpose. @tachyonicbytes, if you haven't already, I would like to suggest that you try out Dragonfly's KaldiAG engine. It is open source and fairly accurate with low-latency. The documentation for it is here. I think you'll find it is good enough. |
Are there any plans to support the OpenAI Whisper automatic speech recognition? How hard it would be to do that? (I am unfamiliar with the codebase).
From a performance stand-point, it seems to be currently one of the best engines, although I wouldn't necessarily trust OpenAI marketing.
From a licensing stand-point, it is FOSS, so it should not be a problem.
The text was updated successfully, but these errors were encountered: