Replies: 2 comments 1 reply
-
GPU info. I have tried to set
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Oh, I got it. CPU backend will first quantize |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to use Vulkan backend in my project chatllm.cpp, and have troubles with
mat_mult
operator, wherew
isQ8_0
,input
&output
areF32
. The result differs slightly from CPU (w
andinput
are exactly the same).Dumped data (here,
input
is just a vector):Plot of point-wise error:
I think this might be caused by a flag or missing of a function call in my code. @0cc4m would you provide some hints?
Beta Was this translation helpful? Give feedback.
All reactions