-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couqi Engine takes brakes mid sentence to load. #7
Comments
Your GPU should be fast enough for realtime. Is pytorch installed with CUDA? |
Yes i followed everystep of the readme. I may have problem with cuda because my gpu isn't used by llm model also but dont know how to solve it. I use windows |
I guess pytorch has no CUDA support. Please check with: print(torch.cuda.is_available()) If not available, please try to install the latest torch with CUDA version with: pip install torch==2.2.0+cu118 torchaudio==2.2.0+cu118 --index-url https://download.pytorch.org/whl/cu118 (may need to adjust 118 to your CUDA version, this is for CUDA 11.8) To use GPU with LLM under windows you need to compile llama-cpp-python for CUBLAS:
After that install and compile llama-cpp with: pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose After that you can set n_gpu_layers in the creation parameters of llama.cpp to define how many layers of the llm neural network should be offloaded on the GPU. |
I did it and it still does that, and I am also unable to dowland llama_cpp on those set CMAKE_ARGS=-DLLAMA_CUBLAS=on |
What's the result of print(torch.cuda.is_available())? Both torch and llama.cpp have to run with CUDA (GPU supported) to achieve realtime speed. The above installation way for llama.cpp works for on my Windows 10 system, if it fails on yours I'm not sure how I can offer further support. llama.cpp not my library and it can be a complex issue. |
Hello Tom, could you please try (on python 3.1 - I used 3.10.9): pip install torch==2.1.2+cu121 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip install https://github.com/daswer123/deepspeed-windows-wheels/releases/download/11.2/deepspeed-0.11.2+cuda121-cp310-cp310-win_amd64.whl
pip install https://github.com/oobabooga/flash-attention/releases/download/v2.5.6/flash_attn-2.5.6+cu122torch2.1.2cxx11abiFALSE-cp310-cp310-win_amd64.whl
pip install transformers==4.38.2 And then try these wheels for llama.cpp: # llama-cpp-python (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
# llama-cpp-python (CUDA, no tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
# llama-cpp-python (CUDA, tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10" I think this should get you a running llama.cpp version with RealtimeTTS support with CUDA, deepspeed and flash attention. If you then change this line: coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", speed=1.0) into this one: coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", speed=1.0, use_deepspeed=True) you should have a very fast realtime-capable coqui engine. Would be great if you can give me a feedback if that worked. |
Couqi Engine takes brakes mid sentence to load. IT takes sometimes between words or even in the middle of say the word. I tried to adjust setting but nothing works. I use i7 10th and RTX3060 computer.
The text was updated successfully, but these errors were encountered: