-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing New Engine Support with Whisper Models #383
Comments
I think your request can be put into two different points.
@drmfinlay could speak possibly to the documentation but the code can be found for each engine supported thus far at https://github.com/dictation-toolbox/dragonfly/tree/master/dragonfly/engines Most engines have a middleware outside of dragonfly that handles compiling grammars from dragonfly down to engine specific implementation and specs. Examples of this middleware are Natlink and Kaldi Active Grammar
This has previously been discussed in the following issue. #376 That is to say doesn't mean it can't be done however there doesn't seem to be a clear path that's performant within the whisper API and possibly a limitation within the model itself. |
@NigelHiggs30 |
Hello Nigel, Thank you for opening this issue. I apologise for my late reply. This issue fell off my radar. As @LexiconCode has mentioned above, support for Whisper has been discussed previously. Whisper is impressive, but not useful for everything. It simply is not an appropriate tool for this particular job. I went into the details in #376 and elsewhere (I think). As for the documentation, it is in need of updating. I am not considering the addition of new engines within Dragonfly any more. The engines we have at the moment are quite sufficient, in my opinion. A new engine could be implemented and used externally, however. One should only need to register an engine instance using the |
I've been following this project for several years and previously interacted with it using the built-in Windows speech recognition engine. The core project is impressive, but the limitations were primarily with the speech recognition engines available at that time. I believe today is the time to upgrade this project. Refactoring might be necessary for broader applicability, but the potential of the final product is significant. The primary barrier to wider adoption was the capabilities of the engines used previously. With the advancements in open-source AI and voice-to-text technologies, especially with developments like Whisper models, this project has the potential to reach new heights of performance and usability. Are there any updated documentation or support for integrating new engines, particularly Whisper models? I am considering initiating a pull request to integrate these advancements into the project.
The text was updated successfully, but these errors were encountered: