need help on Vulkan backend #10773

foldl · 2024-12-11T02:11:33Z

foldl
Dec 11, 2024

I am trying to use Vulkan backend in my project chatllm.cpp, and have troubles with mat_mult operator, where w is Q8_0, input & output are F32. The result differs slightly from CPU (w and input are exactly the same).

Dumped data (here, input is just a vector):

Plot of point-wise error:

I think this might be caused by a flag or missing of a function call in my code. @0cc4m would you provide some hints?

foldl · 2024-12-11T02:49:42Z

foldl
Dec 11, 2024
Author

GPU info. I have tried to set GGML_VK_DISABLE_F16, so here we have fp16: 0.

ggml_vk_print_gpu_info(0)
Vulkan0: AMD Radeon 780M Graphics (AMD proprietary driver) | uma: 1 | fp16: 0 | warp size: 64

0 replies

foldl · 2024-12-11T07:34:27Z

foldl
Dec 11, 2024
Author

Oh, I got it. CPU backend will first quantize input to Q8_0, so its precision is lower than Vulkan.

1 reply

0cc4m Dec 11, 2024
Collaborator

It's hard to follow without code snippets. If you have a q8_0 weight and a f32 matrix and you multiply them into another f32 matrix, the result should be mostly identical across backends, apart from a tiny delta from floating point unit differences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need help on Vulkan backend #10773

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

need help on Vulkan backend #10773

foldl Dec 11, 2024

Replies: 2 comments · 1 reply

foldl Dec 11, 2024 Author

foldl Dec 11, 2024 Author

0cc4m Dec 11, 2024 Collaborator

foldl
Dec 11, 2024

Replies: 2 comments 1 reply

foldl
Dec 11, 2024
Author

foldl
Dec 11, 2024
Author

0cc4m Dec 11, 2024
Collaborator