Implementing New Engine Support with Whisper Models #383

NigelHiggs30 · 2023-11-19T20:13:26Z

I've been following this project for several years and previously interacted with it using the built-in Windows speech recognition engine. The core project is impressive, but the limitations were primarily with the speech recognition engines available at that time. I believe today is the time to upgrade this project. Refactoring might be necessary for broader applicability, but the potential of the final product is significant. The primary barrier to wider adoption was the capabilities of the engines used previously. With the advancements in open-source AI and voice-to-text technologies, especially with developments like Whisper models, this project has the potential to reach new heights of performance and usability. Are there any updated documentation or support for integrating new engines, particularly Whisper models? I am considering initiating a pull request to integrate these advancements into the project.

LexiconCode · 2023-11-20T00:34:41Z

I think your request can be put into two different points.

Is there documentation to support how to integrate new speech recognition engines for dragonfly?

@drmfinlay could speak possibly to the documentation but the code can be found for each engine supported thus far at https://github.com/dictation-toolbox/dragonfly/tree/master/dragonfly/engines

Most engines have a middleware outside of dragonfly that handles compiling grammars from dragonfly down to engine specific implementation and specs. Examples of this middleware are Natlink and Kaldi Active Grammar

How to implement a backend for specifically for Whisper models.

This has previously been discussed in the following issue. #376

That is to say doesn't mean it can't be done however there doesn't seem to be a clear path that's performant within the whisper API and possibly a limitation within the model itself.

LexiconCode · 2023-12-01T19:20:33Z

@NigelHiggs30
This looks interesting https://github.com/facebookresearch/seamless_communication

drmfinlay · 2024-02-26T05:39:59Z

Hello Nigel,

Thank you for opening this issue. I apologise for my late reply. This issue fell off my radar.

As @LexiconCode has mentioned above, support for Whisper has been discussed previously. Whisper is impressive, but not useful for everything. It simply is not an appropriate tool for this particular job. I went into the details in #376 and elsewhere (I think).

As for the documentation, it is in need of updating. I am not considering the addition of new engines within Dragonfly any more. The engines we have at the moment are quite sufficient, in my opinion. A new engine could be implemented and used externally, however. One should only need to register an engine instance using the register_engine_init() function for things to work properly.

Re: #139, #376, #383. Add a Q and A on implementing a custom Dragonfly engine externally and a Q and A on whether Dragonfly will add support for new speech recognition engines.

Re: #139, #376, #383. I've added a section on new engines back into the CONTRIBUTING.rst file and given criteria for new engine implementations.

LexiconCode added the Question Issues that are general questions label Nov 20, 2023

drmfinlay added the Documentation Issues related to documentation label Apr 22, 2024

drmfinlay added a commit that referenced this issue May 4, 2024

Amend recently added documentation on new engines

b08fa83

Re: #139, #376, #383. I've added a section on new engines back into the CONTRIBUTING.rst file and given criteria for new engine implementations.

drmfinlay closed this as completed May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing New Engine Support with Whisper Models #383

Implementing New Engine Support with Whisper Models #383

NigelHiggs30 commented Nov 19, 2023 •

edited

Loading

LexiconCode commented Nov 20, 2023 •

edited

Loading

LexiconCode commented Dec 1, 2023 •

edited

Loading

drmfinlay commented Feb 26, 2024

Implementing New Engine Support with Whisper Models #383

Implementing New Engine Support with Whisper Models #383

Comments

NigelHiggs30 commented Nov 19, 2023 • edited Loading

LexiconCode commented Nov 20, 2023 • edited Loading

LexiconCode commented Dec 1, 2023 • edited Loading

drmfinlay commented Feb 26, 2024

NigelHiggs30 commented Nov 19, 2023 •

edited

Loading

LexiconCode commented Nov 20, 2023 •

edited

Loading

LexiconCode commented Dec 1, 2023 •

edited

Loading