Couqi Engine takes brakes mid sentence to load. #7

tomwarias · 2024-02-25T22:12:16Z

Couqi Engine takes brakes mid sentence to load. IT takes sometimes between words or even in the middle of say the word. I tried to adjust setting but nothing works. I use i7 10th and RTX3060 computer.

KoljaB · 2024-02-25T23:11:50Z

Your GPU should be fast enough for realtime. Is pytorch installed with CUDA?

tomwarias · 2024-02-26T16:45:05Z

Yes i followed everystep of the readme. I may have problem with cuda because my gpu isn't used by llm model also but dont know how to solve it. I use windows

KoljaB · 2024-02-26T17:06:12Z

I guess pytorch has no CUDA support. Please check with:

print(torch.cuda.is_available())

If not available, please try to install the latest torch with CUDA version with:

pip install torch==2.2.0+cu118 torchaudio==2.2.0+cu118 --index-url https://download.pytorch.org/whl/cu118

(may need to adjust 118 to your CUDA version, this is for CUDA 11.8)

To use GPU with LLM under windows you need to compile llama-cpp-python for CUBLAS:

Set environment variables:

set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1

Also it may be needed to copy all four MSBuildExtensions files based on your CUDA version (11.8 or 12.3) from:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\extras\visual_studio_integration\MSBuildExtensions

to

C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Microsoft\VC\v170\BuildCustomizations

After that install and compile llama-cpp with:

pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

After that you can set n_gpu_layers in the creation parameters of llama.cpp to define how many layers of the llm neural network should be offloaded on the GPU.

tomwarias · 2024-02-26T22:49:32Z

I did it and it still does that, and I am also unable to dowland llama_cpp on those set CMAKE_ARGS=-DLLAMA_CUBLAS=on

KoljaB · 2024-02-27T07:46:14Z

What's the result of print(torch.cuda.is_available())? Both torch and llama.cpp have to run with CUDA (GPU supported) to achieve realtime speed.

The above installation way for llama.cpp works for on my Windows 10 system, if it fails on yours I'm not sure how I can offer further support. llama.cpp not my library and it can be a complex issue.

KoljaB · 2024-09-24T13:58:52Z

Hello Tom,

could you please try (on python 3.1 - I used 3.10.9):

pip install torch==2.1.2+cu121 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip install https://github.com/daswer123/deepspeed-windows-wheels/releases/download/11.2/deepspeed-0.11.2+cuda121-cp310-cp310-win_amd64.whl
pip install https://github.com/oobabooga/flash-attention/releases/download/v2.5.6/flash_attn-2.5.6+cu122torch2.1.2cxx11abiFALSE-cp310-cp310-win_amd64.whl
pip install transformers==4.38.2

And then try these wheels for llama.cpp:

# llama-cpp-python (CPU only, AVX2)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/cpu/llama_cpp_python-0.2.89+cpuavx2-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"

# llama-cpp-python (CUDA, no tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.89+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

# llama-cpp-python (CUDA, tensor cores)
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp311-cp311-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.11"
https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.89+cu121-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64" and python_version == "3.10"

I think this should get you a running llama.cpp version with RealtimeTTS support with CUDA, deepspeed and flash attention.

If you then change this line:

coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", speed=1.0)

into this one:

coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en", speed=1.0, use_deepspeed=True)

you should have a very fast realtime-capable coqui engine.

Would be great if you can give me a feedback if that worked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Couqi Engine takes brakes mid sentence to load. #7

Couqi Engine takes brakes mid sentence to load. #7

tomwarias commented Feb 25, 2024

KoljaB commented Feb 25, 2024

tomwarias commented Feb 26, 2024

KoljaB commented Feb 26, 2024

tomwarias commented Feb 26, 2024

KoljaB commented Feb 27, 2024

KoljaB commented Sep 24, 2024

Couqi Engine takes brakes mid sentence to load. #7

Couqi Engine takes brakes mid sentence to load. #7

Comments

tomwarias commented Feb 25, 2024

KoljaB commented Feb 25, 2024

tomwarias commented Feb 26, 2024

KoljaB commented Feb 26, 2024

tomwarias commented Feb 26, 2024

KoljaB commented Feb 27, 2024

KoljaB commented Sep 24, 2024