Not utilizing gpu during image generation #428

Shobhit043 · 2024-10-02T17:29:34Z

hi, thank you for providing this code.
i am currently running the model schnell q2 in kaggle notebook but when it start generating the image it always shows 'using cpu backend' and it does not utilize the gpu at all. pls help

input cli:
!/kaggle/working/stable-diffusion.cpp/build/bin/sd
--diffusion-model /kaggle/working/flux1-schnell-q2_k.gguf
--clip_l /kaggle/working/clip_l.safetensors
--t5xxl /kaggle/working/t5xxl_fp8_e4m3fn.safetensors
--vae /kaggle/working/ae.safetensors
-p "A male model standing confidently against a clean white background, wearing a fitted blue t-shirt and stylish black jeans. The model has a friendly smile, with short dark hair, and is posing casually with one hand in his pocket. The lighting is bright and even, highlighting the clothing details and creating a professional e-commerce look."
--cfg-scale 1.0
--sampling-method euler
--rng cuda
--steps 2
-v

verbose:
[DEBUG] stable-diffusion.cpp:180 - Using CPU backend
[INFO ] stable-diffusion.cpp:202 - loading clip_l from '/kaggle/working/clip_l.safetensors'
[INFO ] model.cpp:793 - load /kaggle/working/clip_l.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from '/kaggle/working/clip_l.safetensors'
[INFO ] stable-diffusion.cpp:209 - loading t5xxl from '/kaggle/working/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] model.cpp:793 - load /kaggle/working/t5xxl_fp8_e4m3fn.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from '/kaggle/working/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] stable-diffusion.cpp:216 - loading diffusion model from '/kaggle/working/flux1-schnell-q2_k.gguf'
[INFO ] model.cpp:790 - load /kaggle/working/flux1-schnell-q2_k.gguf using gguf format
[DEBUG] model.cpp:807 - init from '/kaggle/working/flux1-schnell-q2_k.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:223 - loading vae from '/kaggle/working/ae.safetensors'
[INFO ] model.cpp:793 - load /kaggle/working/ae.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from '/kaggle/working/ae.safetensors'
[INFO ] stable-diffusion.cpp:235 - Version: Flux Schnell
[INFO ] stable-diffusion.cpp:266 - Weight type: f16
[INFO ] stable-diffusion.cpp:267 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: q2_K
[INFO ] stable-diffusion.cpp:269 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1046 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1046 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1046 - flux params backend buffer size = 3732.51 MB(RAM) (776 tensors)
[DEBUG] ggml_extend.hpp:1046 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:398 - loading weights
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/clip_l.safetensors
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/t5xxl_fp8_e4m3fn.safetensors
[INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/flux1-schnell-q2_k.gguf
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/ae.safetensors
[INFO ] stable-diffusion.cpp:482 - total params memory size = 13145.92MB (VRAM 0.00MB, RAM 13145.92MB): clip 9318.83MB(RAM), unet 3732.51MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:501 - loading model from '' completed, taking 53.47s
[INFO ] stable-diffusion.cpp:518 - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:572 - finished loaded file
[DEBUG] stable-diffusion.cpp:1378 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1127 - prompt after extract and remove lora: "a lovely cat holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:655 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1132 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1036 - parse 'a lovely cat holding a sign says 'flux.cpp'' to [['a lovely cat holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 256
[DEBUG] ggml_extend.hpp:998 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1155 - computing condition graph completed, taking 39122 ms
[INFO ] stable-diffusion.cpp:1256 - get_learned_condition completed, taking 39127 ms
[INFO ] stable-diffusion.cpp:1279 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1283 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:998 - flux compute buffer size: 397.27 MB(RAM)
|==================================================| 2/2 - 244.56s/it
[INFO ] stable-diffusion.cpp:1315 - sampling completed, taking 490.85s
[INFO ] stable-diffusion.cpp:1323 - generating 1 latent images completed, taking 491.69s
[INFO ] stable-diffusion.cpp:1326 - decoding 1 latents
[DEBUG] ggml_extend.hpp:998 - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:987 - computing vae [mode: DECODE] graph completed, taking 49.77s
[INFO ] stable-diffusion.cpp:1336 - latent 1 decoded, taking 49.77s
[INFO ] stable-diffusion.cpp:1340 - decode_first_stage completed, taking 49.77s
[INFO ] stable-diffusion.cpp:1449 - txt2img completed in 580.60s
save result image to 'output.png'

grauho · 2024-10-02T18:20:09Z

Did you compile with the appropriate settings for your GPU?

Shobhit043 · 2024-10-02T19:08:21Z

!cmake .. -DSD_CUBLAS=ON
!cmake --build . --config Release

is this necessary to enable gpu utilization @grauho?

grauho · 2024-10-02T19:33:15Z

Yes, if you're trying to use it with a CUDA enabled graphics card you do want to build it with:
cmake .. -DSD_CUBLAS=ON
cmake --build . --config Release
as well as making sure you have the rest of the CUDA tool chain set up like it says in the README.

Shobhit043 · 2024-10-02T19:38:50Z

@grauho thanks for the help. By CUDA tool chain set you mean CUDA toolkit?
as im using kaggle notebook CUDA toolkit is properly installed in it.

grauho · 2024-10-02T19:52:37Z

@grauho thanks for the help. By CUDA tool chain set you mean CUDA toolkit? as im using kaggle notebook CUDA toolkit is properly installed in it.

No problem. Yep the CUDA toolkit, and provided it builds without error give it a shot and see if it recognizes the GPU properly.

VarunJoshi10 · 2024-10-04T12:50:03Z

@grauho i have tried
cmake .. -DSD_CUBLAS=ON
cmake --build . --config Release
and it gives error
/home/wiredhikari/flux-api/stable-diffusion.cpp/model.cpp:705:0: required from here /usr/include/c++/13/bits/stl_tree.h:2131:14: internal compiler error: Segmentation fault 2131 | return _Res(__j._M_node, 0); | ^~~~~~~~~~~~~~~~~~~~ 0x75fe5424531f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x75fe5422a1c9 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x75fe5422a28a __libc_start_main_impl ../csu/libc-start.c:360 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <file:///usr/share/doc/gcc-13/README.Bugs> for instructions. gmake[2]: *** [CMakeFiles/stable-diffusion.dir/build.make:76: CMakeFiles/stable-diffusion.dir/model.cpp.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:167: CMakeFiles/stable-diffusion.dir/all] Error 2 gmake: *** [Makefile:136: all] Error 2
the model is not able to use gpu
how can i fix this?

grauho · 2024-10-04T14:57:12Z

@grauho i have tried cmake .. -DSD_CUBLAS=ON cmake --build . --config Release and it gives error /home/wiredhikari/flux-api/stable-diffusion.cpp/model.cpp:705:0: required from here /usr/include/c++/13/bits/stl_tree.h:2131:14: internal compiler error: Segmentation fault 2131 | return _Res(__j._M_node, 0); | ^~~~~~~~~~~~~~~~~~~~ 0x75fe5424531f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x75fe5422a1c9 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x75fe5422a28a __libc_start_main_impl ../csu/libc-start.c:360 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <file:///usr/share/doc/gcc-13/README.Bugs> for instructions. gmake[2]: *** [CMakeFiles/stable-diffusion.dir/build.make:76: CMakeFiles/stable-diffusion.dir/model.cpp.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:167: CMakeFiles/stable-diffusion.dir/all] Error 2 gmake: *** [Makefile:136: all] Error 2 the model is not able to use gpu how can i fix this?

That looks like a different issue, does it build normally without SD_CUBLAS enabled?

shobhit6702 · 2024-10-04T15:04:07Z

@grauho I actually tried running CUBLAS and it took alot of time to compile and in the end said failed.
I think there could be some issue with the code.

VarunJoshi10 · 2024-10-08T21:33:51Z

@grauho i have tried cmake .. -DSD_CUBLAS=ON cmake --build . --config Release and it gives error /home/wiredhikari/flux-api/stable-diffusion.cpp/model.cpp:705:0: required from here /usr/include/c++/13/bits/stl_tree.h:2131:14: internal compiler error: Segmentation fault 2131 | return _Res(__j._M_node, 0); | ^~~~~~~~~~~~~~~~~~~~ 0x75fe5424531f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x75fe5422a1c9 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x75fe5422a28a __libc_start_main_impl ../csu/libc-start.c:360 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <file:///usr/share/doc/gcc-13/README.Bugs> for instructions. gmake[2]: *** [CMakeFiles/stable-diffusion.dir/build.make:76: CMakeFiles/stable-diffusion.dir/model.cpp.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:167: CMakeFiles/stable-diffusion.dir/all] Error 2 gmake: *** [Makefile:136: all] Error 2 the model is not able to use gpu how can i fix this?

That looks like a different issue, does it build normally without SD_CUBLAS enabled?

Yeah it was build perfectly without SD_CUBLAS enabled but after that it is not able to build. And I am not able to run the model on my gpu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not utilizing gpu during image generation #428

Not utilizing gpu during image generation #428

Shobhit043 commented Oct 2, 2024

grauho commented Oct 2, 2024

Shobhit043 commented Oct 2, 2024

grauho commented Oct 2, 2024

Shobhit043 commented Oct 2, 2024

grauho commented Oct 2, 2024

VarunJoshi10 commented Oct 4, 2024

grauho commented Oct 4, 2024

shobhit6702 commented Oct 4, 2024

VarunJoshi10 commented Oct 8, 2024

Not utilizing gpu during image generation #428

Not utilizing gpu during image generation #428

Comments

Shobhit043 commented Oct 2, 2024

grauho commented Oct 2, 2024

Shobhit043 commented Oct 2, 2024

grauho commented Oct 2, 2024

Shobhit043 commented Oct 2, 2024

grauho commented Oct 2, 2024

VarunJoshi10 commented Oct 4, 2024

grauho commented Oct 4, 2024

shobhit6702 commented Oct 4, 2024

VarunJoshi10 commented Oct 8, 2024