-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not utilizing gpu during image generation #428
Comments
Did you compile with the appropriate settings for your GPU? |
!cmake .. -DSD_CUBLAS=ON is this necessary to enable gpu utilization @grauho? |
Yes, if you're trying to use it with a CUDA enabled graphics card you do want to build it with: |
@grauho thanks for the help. By CUDA tool chain set you mean CUDA toolkit? |
No problem. Yep the CUDA toolkit, and provided it builds without error give it a shot and see if it recognizes the GPU properly. |
@grauho i have tried |
That looks like a different issue, does it build normally without SD_CUBLAS enabled? |
@grauho I actually tried running CUBLAS and it took alot of time to compile and in the end said failed. |
Yeah it was build perfectly without SD_CUBLAS enabled but after that it is not able to build. And I am not able to run the model on my gpu. |
hi, thank you for providing this code.
i am currently running the model schnell q2 in kaggle notebook but when it start generating the image it always shows 'using cpu backend' and it does not utilize the gpu at all. pls help
input cli:
!/kaggle/working/stable-diffusion.cpp/build/bin/sd
--diffusion-model /kaggle/working/flux1-schnell-q2_k.gguf
--clip_l /kaggle/working/clip_l.safetensors
--t5xxl /kaggle/working/t5xxl_fp8_e4m3fn.safetensors
--vae /kaggle/working/ae.safetensors
-p "A male model standing confidently against a clean white background, wearing a fitted blue t-shirt and stylish black jeans. The model has a friendly smile, with short dark hair, and is posing casually with one hand in his pocket. The lighting is bright and even, highlighting the clothing details and creating a professional e-commerce look."
--cfg-scale 1.0
--sampling-method euler
--rng cuda
--steps 2
-v
verbose:
[DEBUG] stable-diffusion.cpp:180 - Using CPU backend
[INFO ] stable-diffusion.cpp:202 - loading clip_l from '/kaggle/working/clip_l.safetensors'
[INFO ] model.cpp:793 - load /kaggle/working/clip_l.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from '/kaggle/working/clip_l.safetensors'
[INFO ] stable-diffusion.cpp:209 - loading t5xxl from '/kaggle/working/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] model.cpp:793 - load /kaggle/working/t5xxl_fp8_e4m3fn.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from '/kaggle/working/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] stable-diffusion.cpp:216 - loading diffusion model from '/kaggle/working/flux1-schnell-q2_k.gguf'
[INFO ] model.cpp:790 - load /kaggle/working/flux1-schnell-q2_k.gguf using gguf format
[DEBUG] model.cpp:807 - init from '/kaggle/working/flux1-schnell-q2_k.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:223 - loading vae from '/kaggle/working/ae.safetensors'
[INFO ] model.cpp:793 - load /kaggle/working/ae.safetensors using safetensors format
[DEBUG] model.cpp:861 - init from '/kaggle/working/ae.safetensors'
[INFO ] stable-diffusion.cpp:235 - Version: Flux Schnell
[INFO ] stable-diffusion.cpp:266 - Weight type: f16
[INFO ] stable-diffusion.cpp:267 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: q2_K
[INFO ] stable-diffusion.cpp:269 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1046 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1046 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1046 - flux params backend buffer size = 3732.51 MB(RAM) (776 tensors)
[DEBUG] ggml_extend.hpp:1046 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:398 - loading weights
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/clip_l.safetensors
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/t5xxl_fp8_e4m3fn.safetensors
[INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f8_e4m3 | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/flux1-schnell-q2_k.gguf
[DEBUG] model.cpp:1530 - loading tensors from /kaggle/working/ae.safetensors
[INFO ] stable-diffusion.cpp:482 - total params memory size = 13145.92MB (VRAM 0.00MB, RAM 13145.92MB): clip 9318.83MB(RAM), unet 3732.51MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:501 - loading model from '' completed, taking 53.47s
[INFO ] stable-diffusion.cpp:518 - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:572 - finished loaded file
[DEBUG] stable-diffusion.cpp:1378 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1127 - prompt after extract and remove lora: "a lovely cat holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:655 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1132 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1036 - parse 'a lovely cat holding a sign says 'flux.cpp'' to [['a lovely cat holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:397 - token length: 256
[DEBUG] ggml_extend.hpp:998 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1155 - computing condition graph completed, taking 39122 ms
[INFO ] stable-diffusion.cpp:1256 - get_learned_condition completed, taking 39127 ms
[INFO ] stable-diffusion.cpp:1279 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1283 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:998 - flux compute buffer size: 397.27 MB(RAM)
|==================================================| 2/2 - 244.56s/it
[INFO ] stable-diffusion.cpp:1315 - sampling completed, taking 490.85s
[INFO ] stable-diffusion.cpp:1323 - generating 1 latent images completed, taking 491.69s
[INFO ] stable-diffusion.cpp:1326 - decoding 1 latents
[DEBUG] ggml_extend.hpp:998 - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:987 - computing vae [mode: DECODE] graph completed, taking 49.77s
[INFO ] stable-diffusion.cpp:1336 - latent 1 decoded, taking 49.77s
[INFO ] stable-diffusion.cpp:1340 - decode_first_stage completed, taking 49.77s
[INFO ] stable-diffusion.cpp:1449 - txt2img completed in 580.60s
save result image to 'output.png'
The text was updated successfully, but these errors were encountered: