Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing New Engine Support with Whisper Models #383

Closed
NigelHiggs30 opened this issue Nov 19, 2023 · 3 comments
Closed

Implementing New Engine Support with Whisper Models #383

NigelHiggs30 opened this issue Nov 19, 2023 · 3 comments
Labels
Documentation Issues related to documentation Question Issues that are general questions

Comments

@NigelHiggs30
Copy link

NigelHiggs30 commented Nov 19, 2023

I've been following this project for several years and previously interacted with it using the built-in Windows speech recognition engine. The core project is impressive, but the limitations were primarily with the speech recognition engines available at that time. I believe today is the time to upgrade this project. Refactoring might be necessary for broader applicability, but the potential of the final product is significant. The primary barrier to wider adoption was the capabilities of the engines used previously. With the advancements in open-source AI and voice-to-text technologies, especially with developments like Whisper models, this project has the potential to reach new heights of performance and usability. Are there any updated documentation or support for integrating new engines, particularly Whisper models? I am considering initiating a pull request to integrate these advancements into the project.

@LexiconCode
Copy link
Member

LexiconCode commented Nov 20, 2023

I think your request can be put into two different points.

  1. Is there documentation to support how to integrate new speech recognition engines for dragonfly?

@drmfinlay could speak possibly to the documentation but the code can be found for each engine supported thus far at https://github.com/dictation-toolbox/dragonfly/tree/master/dragonfly/engines

Most engines have a middleware outside of dragonfly that handles compiling grammars from dragonfly down to engine specific implementation and specs. Examples of this middleware are Natlink and Kaldi Active Grammar

  1. How to implement a backend for specifically for Whisper models.

This has previously been discussed in the following issue. #376

That is to say doesn't mean it can't be done however there doesn't seem to be a clear path that's performant within the whisper API and possibly a limitation within the model itself.

@LexiconCode LexiconCode added the Question Issues that are general questions label Nov 20, 2023
@LexiconCode
Copy link
Member

LexiconCode commented Dec 1, 2023

@NigelHiggs30
This looks interesting https://github.com/facebookresearch/seamless_communication

@drmfinlay
Copy link
Member

Hello Nigel,

Thank you for opening this issue. I apologise for my late reply. This issue fell off my radar.

As @LexiconCode has mentioned above, support for Whisper has been discussed previously. Whisper is impressive, but not useful for everything. It simply is not an appropriate tool for this particular job. I went into the details in #376 and elsewhere (I think).

As for the documentation, it is in need of updating. I am not considering the addition of new engines within Dragonfly any more. The engines we have at the moment are quite sufficient, in my opinion. A new engine could be implemented and used externally, however. One should only need to register an engine instance using the register_engine_init() function for things to work properly.

@drmfinlay drmfinlay added the Documentation Issues related to documentation label Apr 22, 2024
drmfinlay added a commit that referenced this issue May 2, 2024
Re: #139, #376, #383.

Add a Q and A on implementing a custom Dragonfly engine externally
and a Q and A on whether Dragonfly will add support for new speech
recognition engines.
drmfinlay added a commit that referenced this issue May 4, 2024
Re: #139, #376, #383.

I've added a section on new engines back into the CONTRIBUTING.rst
file and given criteria for new engine implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to documentation Question Issues that are general questions
Projects
None yet
Development

No branches or pull requests

3 participants