Will switching to SeamlessM4Tv2 be better #4

jkfnc · 2023-12-01T16:15:02Z

SeamlessM4Tv2 Released today seems to have all this and translation with streaming support ? Will it be better than Whisper and Coqui ?

KoljaB · 2023-12-01T16:40:36Z

Probably yes. Advancements in this area are made crazy fast, i feel stuff like this expires in like 4 weeks.

jkfnc · 2023-12-04T21:14:19Z

I think it may have to be a flag between whisper and Meta as Seamless M4T V2 is still CC-BY-NC license which is incompatible with your MIT License.

KoljaB · 2023-12-04T21:58:29Z

Damn. You are right and this also counts for the coqui. Need to revoke MIT 2 here asap.

jayakumark · 2023-12-05T02:44:34Z

Probably this one https://huggingface.co/spaces/styletts2/styletts2 can replace Coqui and its MIT.

KoljaB · 2023-12-05T02:52:05Z

Not sure about that. StyleTTS2 is only good in english and can't zero shot voice clone.

jkfnc · 2023-12-22T03:46:35Z

This one seems to capture Tone and emotion is what they claim https://research.myshell.ai/open-voice/zero-shot-cross-lingual-voice-cloning may work for your TurnVoice project

stevenbaert · 2024-02-10T11:36:23Z

Fyi, this is what I found as extra information (Mac):

Use Apple's Metal for GPU Acceleration:
Apple provides the Metal framework for GPU acceleration on macOS. Some machine learning libraries, like TensorFlow and PyTorch, offer ways to leverage Metal for acceleration through third-party initiatives or experimental support.

For PyTorch, there's an experimental project called PyTorch-Metal that aims to bring Metal GPU acceleration to PyTorch on macOS.
For TensorFlow, you might explore Apple's Metal plugin for TensorFlow that enables using Metal for accelerated machine learning operations.

Use PlaidML:
PlaidML is an open-source tensor compiler that can enable deep learning on different types of GPUs, including AMD GPUs found in many Macs. It works with Keras as a backend and can be a way to leverage your Mac's GPU for acceleration.

ajeema · 2024-04-04T22:11:58Z

maybe MeloTTS/OpenVoice would be a good replacement, also distil_whisper

KoljaB · 2024-04-05T01:07:21Z

maybe MeloTTS/OpenVoice would be a good replacement, also distil_whisper

You can alread use distil whisper models. Update your faster whisper to latest version (pip install -U faster-whisper), then change the model to one of the distil supported ones (distil-large-v2, distil-medium.en, distil-small.en) in this line:

recorder = AudioToTextRecorder(model="tiny.en", language="en", spinner=False)

Melo I found to have rather bad quality (so few emotions) and OpenVoice is a research project which does not get updates. So I won't implement those into RealtimeTTS (takes a lot for a TTS engine to be considered for me to make it realtime).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will switching to SeamlessM4Tv2 be better #4

Will switching to SeamlessM4Tv2 be better #4

jkfnc commented Dec 1, 2023

KoljaB commented Dec 1, 2023

jkfnc commented Dec 4, 2023

KoljaB commented Dec 4, 2023

jayakumark commented Dec 5, 2023

KoljaB commented Dec 5, 2023

jkfnc commented Dec 22, 2023 •

edited

Loading

stevenbaert commented Feb 10, 2024

ajeema commented Apr 4, 2024 •

edited

Loading

KoljaB commented Apr 5, 2024

Will switching to SeamlessM4Tv2 be better #4

Will switching to SeamlessM4Tv2 be better #4

Comments

jkfnc commented Dec 1, 2023

KoljaB commented Dec 1, 2023

jkfnc commented Dec 4, 2023

KoljaB commented Dec 4, 2023

jayakumark commented Dec 5, 2023

KoljaB commented Dec 5, 2023

jkfnc commented Dec 22, 2023 • edited Loading

stevenbaert commented Feb 10, 2024

ajeema commented Apr 4, 2024 • edited Loading

KoljaB commented Apr 5, 2024

jkfnc commented Dec 22, 2023 •

edited

Loading

ajeema commented Apr 4, 2024 •

edited

Loading