Erro running on Mac M2 #15

vitorcalvi · 2024-06-05T13:43:55Z

First of all, awsome repo. I've tried all possible instalations combinations, had failed. Any suggests? @KoljaB
Machine: Mac M2

Terminal output:

Using model: xtts
Initializing STT AudioToTextRecorder ...
[2024-06-05 15:39:29.914] [ctranslate2] [thread 1054526] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

Select voice (1-5): 1
This is how voice number 1 sounds like
/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:

(Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
(Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
(Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)
occured trying to synthesize text This is how voice number 1 sounds like
Traceback: Traceback (most recent call last):
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 279, in _synthesize_worker
for i, chunk in enumerate(chunks):
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 643, in inference_stream
gpt_generator = self.gpt.get_generator(
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 603, in get_generator
return self.gpt_inference.generate_stream(
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py", line 186, in generate
model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/transformers/generation/utils.py", line 473, in _prepare_attention_mask_for_generation
torch.isin(elements=inputs, test_elements=pad_token_id).any()
TypeError: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
(Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
(Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
(Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)

Error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:

(Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
(Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
(Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)

Exception in thread Thread-4 (synthesize_worker):
Traceback (most recent call last):
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/text_to_stream.py", line 201, in synthesize_worker
self.engine.synthesize(sentence)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 411, in synthesize
status, result = self.parent_synthesize_pipe.recv()
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Accept voice (y/n):

GPT4o output:

It appears that there are several warnings and errors related to the process of initializing the STT (Speech-to-Text) AudioToTextRecorder and selecting the voice. Here are the issues and their potential resolutions:
Compute Type Warning:

[warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Resolution: This is a warning indicating that the model initially designed to use float16 precision has been converted to float32 because the device or backend doesn't support float16 efficiently. This is usually not a critical issue, but if you want to optimize performance, consider using hardware that supports float16 or adjust the model configuration to use float32 from the start.
Pretrained Model Configuration Warning:

UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
Resolution: Update your code to use a generation configuration file as suggested in the warning. This will ensure compatibility with future versions of the library.
General Synthesis Error:

General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of: * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out) * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out) * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out) occured trying to synthesize text This is how voice number 1 sounds like
Resolution: This error indicates a type mismatch in the function call to isin(). Make sure that the arguments passed to isin() are of the correct type as specified in the error message. The elements should either be both Tensors or one should be a Tensor and the other a Number.

To proceed, you may need to:
Verify and update the model and its configuration to ensure compatibility with the current hardware and software environment.
Make sure that all function calls, particularly those involving Tensors, are using the correct types as expected by the functions.

If you need further assistance or specific code examples to resolve these issues, please provide more details about your setup and the code you're running.

KoljaB · 2024-06-05T13:50:25Z

This is due to new transformers library introducing an incompatibility to Coqui TTS (see here).
Please downgrade to an older transformers version: pip install transformers==4.38.2 or upgrade RealtimeTTS to latest version pip install realtimetts==0.4.1

vitorcalvi · 2024-06-05T14:24:39Z

Thanks for the awnser :)
Tested both solutions, only older transformers version works

Another two issues:
-- > Using model: xtts
Initializing STT AudioToTextRecorder ...
[2024-06-05 16:20:24.534] [ctranslate2] [thread 27773] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

-- Select voice (1-5): 1
This is how voice number 1 sounds like
/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.9/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(

This is due to new transformers library introducing an incompatibility to Coqui TTS (see here). Please downgrade to an older transformers version: pip install transformers==4.38.2 or upgrade RealtimeTTS to latest version pip install realtimetts==0.4.1

KoljaB · 2024-06-05T14:34:09Z

Thank you for feedback. Both warnings are absolutely normal and should not lead to any issues.

vitorcalvi · 2024-06-05T14:36:35Z

@KoljaB thank you. I forget another issue, speech cuts out every 1.5 to 2 seconds. Any suggests?

KoljaB · 2024-06-05T14:37:54Z

You may want to create CoquiEngine with full_sentences=True in the constructor on Mac M2 btw, because most Macs aren't fast enough for realtime synthesis with Coqui TTS (no GPU use possible).

coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", speed=1.0, full_sentences=True)

vitorcalvi · 2024-06-05T14:50:45Z

Works like charm but as you've said, Macs aren't fast enough for RT Syth with Coqui TTS and the machine gots heavy
Mac has Mlx framework and there's another TTS library MeloTTS mentioned on repo below
https://github.com/huwprosser/jarvis-mlx

Thanks brow, see u

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erro running on Mac M2 #15

Erro running on Mac M2 #15

vitorcalvi commented Jun 5, 2024

KoljaB commented Jun 5, 2024

vitorcalvi commented Jun 5, 2024

KoljaB commented Jun 5, 2024

vitorcalvi commented Jun 5, 2024

KoljaB commented Jun 5, 2024

vitorcalvi commented Jun 5, 2024

Erro running on Mac M2 #15

Erro running on Mac M2 #15

Comments

vitorcalvi commented Jun 5, 2024

KoljaB commented Jun 5, 2024

vitorcalvi commented Jun 5, 2024

KoljaB commented Jun 5, 2024

vitorcalvi commented Jun 5, 2024

KoljaB commented Jun 5, 2024

vitorcalvi commented Jun 5, 2024