-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erro running on Mac M2 #15
Comments
This is due to new transformers library introducing an incompatibility to Coqui TTS (see here). |
Thanks for the awnser :) Another two issues: -- Select voice (1-5): 1
|
Thank you for feedback. Both warnings are absolutely normal and should not lead to any issues. |
@KoljaB thank you. I forget another issue, speech cuts out every 1.5 to 2 seconds. Any suggests? |
You may want to create CoquiEngine with full_sentences=True in the constructor on Mac M2 btw, because most Macs aren't fast enough for realtime synthesis with Coqui TTS (no GPU use possible). coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", speed=1.0, full_sentences=True) |
Works like charm but as you've said, Macs aren't fast enough for RT Syth with Coqui TTS and the machine gots heavy Thanks brow, see u |
First of all, awsome repo. I've tried all possible instalations combinations, had failed. Any suggests? @KoljaB
Machine: Mac M2
Terminal output:
Select voice (1-5): 1
This is how voice number 1 sounds like
/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
occured trying to synthesize text This is how voice number 1 sounds like
Traceback: Traceback (most recent call last):
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 279, in _synthesize_worker
for i, chunk in enumerate(chunks):
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 643, in inference_stream
gpt_generator = self.gpt.get_generator(
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 603, in get_generator
return self.gpt_inference.generate_stream(
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py", line 186, in generate
model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/transformers/generation/utils.py", line 473, in _prepare_attention_mask_for_generation
torch.isin(elements=inputs, test_elements=pad_token_id).any()
TypeError: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
Error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
Exception in thread Thread-4 (synthesize_worker):
Traceback (most recent call last):
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/text_to_stream.py", line 201, in synthesize_worker
self.engine.synthesize(sentence)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 411, in synthesize
status, result = self.parent_synthesize_pipe.recv()
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/opt/homebrew/anaconda3/envs/localAIVoiceCHat/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Accept voice (y/n):
GPT4o output:
It appears that there are several warnings and errors related to the process of initializing the STT (Speech-to-Text) AudioToTextRecorder and selecting the voice. Here are the issues and their potential resolutions:
Compute Type Warning:
[warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Resolution: This is a warning indicating that the model initially designed to use float16 precision has been converted to float32 because the device or backend doesn't support float16 efficiently. This is usually not a critical issue, but if you want to optimize performance, consider using hardware that supports float16 or adjust the model configuration to use float32 from the start.
Pretrained Model Configuration Warning:
UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
Resolution: Update your code to use a generation configuration file as suggested in the warning. This will ensure compatibility with future versions of the library.
General Synthesis Error:
General synthesis error: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of: * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out) * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out) * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out) occured trying to synthesize text This is how voice number 1 sounds like
Resolution: This error indicates a type mismatch in the function call to isin(). Make sure that the arguments passed to isin() are of the correct type as specified in the error message. The elements should either be both Tensors or one should be a Tensor and the other a Number.
To proceed, you may need to:
Verify and update the model and its configuration to ensure compatibility with the current hardware and software environment.
Make sure that all function calls, particularly those involving Tensors, are using the correct types as expected by the functions.
If you need further assistance or specific code examples to resolve these issues, please provide more details about your setup and the code you're running.
The text was updated successfully, but these errors were encountered: